Operations Sentry — Specification Post
This document amends the original specification with what we now know is correct. Cross-reference with the decision log and the Design Topic decision files under workbooks/notebooks/operations-sentry/decisions/.
Material amendments
Section titled “Material amendments”DT-004: per-request session emission is application-level work
Section titled “DT-004: per-request session emission is application-level work”The original specification, citing the Sentry JVM SDK Release Health documentation, asserted that enableAutoSessionTracking=true would emit one session per request when the SDK detects a Ktor server runtime. This is false at SDK 8.41.0. The SDK emits one session per JVM lifecycle and provides no built-in per-request session emission.
Correction: the operations component installs a SentryRequestSession Ktor application plugin in Main.kt that calls Sentry.startSession() on onCall and Sentry.endSession() on ResponseSent. The plugin is idempotent (pluginOrNull guard) so common-module’s auto-install does not collide. Sessions then flow correctly to the Sentry Release Health tab.
This correction is reflected in dt-004-session-tracking.md (supersession banner) and in the new current-system/oam/sentry-observability.md reference page.
DT-005: PII scrubbing applies to spans too
Section titled “DT-005: PII scrubbing applies to spans too”The original spec scoped beforeSend (event-level) scrubbing only. Investigation during implementation showed that Sentry.captureMessage/captureException events are scrubbed by beforeSend, but performance/APM spans carry their own attributes (request URL, headers if any, span data) that flow through beforeSendTransaction. Without symmetric scrubbing, span data would leak request paths containing identifiers and any header the agent passes through.
Correction: PiiScrubber is installed at both beforeSend and beforeSendTransaction. The allow-list is the same; the scrubbing is symmetric across events and spans.
Salt provisioning is per-partition, not per-environment
Section titled “Salt provisioning is per-partition, not per-environment”The original spec spoke of “the salt” as a single global value. The decision to make it partition-scoped (one of dev, stage, demo, prod each with its own salt) materialised during DT-005 and is now part of the shipped design.
Correction: four Alpha00X-SentryScrubSalt Secrets Manager entries, projected via ESO with the JSON-shaped property: salt indirection. See the infrastructure section of changelog.md for the CFN naming convention.
CFN export naming requires both -API- and -I- prefixes
Section titled “CFN export naming requires both -API- and -I- prefixes”The original spec used the (then-newer) -API- cross-repo prefix uniformly. Implementation discovered that the CDK app needs to consume the export internally at compose time too, which requires the marker-prefixed -I- form.
Correction: every cross-repo partition secret gets two exports: <partition>-API-<name>Arn (for cross-repo Helm chart consumption) and <partition>-I-<name>Arn (for CDK compose-time wiring). Documented in the rewritten how-to.
OTEL_*_EXPORTER=none is required when running the bundled agent without a collector
Section titled “OTEL_*_EXPORTER=none is required when running the bundled agent without a collector”The original spec referenced the bundled sentry-otel-agent.jar as a drop-in but did not call out the OTLP exporter default. In a no-collector environment (i.e. every Arda partition today) the agent fills the pod logs with retry errors targeting http://localhost:4318.
Correction: OTEL_TRACES_EXPORTER, OTEL_METRICS_EXPORTER, OTEL_LOGS_EXPORTER are all explicitly set to none in the operations chart. The Sentry-native exporter inside the agent continues to work.
Frontend env-var values use abbreviated forms
Section titled “Frontend env-var values use abbreviated forms”The original specification assumed the NEXT_PUBLIC_DEPLOY_ENV values were the long forms (development, staging, production) — which is what the Sentry SDK’s own examples use. Amplify deploys actually set the abbreviated forms (DEV, STAGING, PROD).
Correction: the BACKEND_HOSTS map in trace-propagation-targets.ts includes both forms. Alias-parity unit tests guard against regression. See PR #845 commits 03a19aba and 62c350ba.
Minor amendments
Section titled “Minor amendments”AppError.reportable() ergonomics
Section titled “AppError.reportable() ergonomics”The spec sketched AppError.shouldReport(): Boolean. Implementation found that a boolean does not let the error type also attach extra context (e.g. an Internal carrying a wrapped cause might want to report both itself and the cause).
Correction: reportable(): List<Throwable> — returns the list of throwables to capture. Internal.* and Generic return listOf(this); Invocation.* returns emptyList(). Capture sites iterate.
Fire-and-forget coroutine scope lifecycle
Section titled “Fire-and-forget coroutine scope lifecycle”The spec did not enumerate the lifecycle constraint on fire-and-forget paths. Implementation surfaced that the request’s coroutine scope cancels on response, killing the in-flight batch.
Correction: any work that must outlive the request uses CoroutineScope(SupervisorJob() + dispatchers.io) (or equivalent) so cancellation of the request handler does not propagate. The runSuspendingBoundary lives inside a runCatching wrapper so Sentry sees the throwable before the tracker’s terminal-state recording.
Logback appender level
Section titled “Logback appender level”The spec did not pin a level. ERROR was chosen during implementation for signal-to-noise; suggestions.md proposes future WARN-level adoption via a tagged logger pattern.
Section-by-section status
Section titled “Section-by-section status”| Spec section | Status |
|---|---|
| §1 Goal | Unchanged — delivered. |
| §2 Error/exception tracking | Delivered. reportable() ergonomics amended (boolean → list). |
| §3 Performance/tracing | Delivered. OTel-agent exporter caveat added. |
| §4 Release health | Delivered. Major amendment: requires manual SentryRequestSession Ktor plugin (DT-004 supersession). |
| §5 PII scrubbing | Delivered. Amendment: extended to spans via beforeSendTransaction. |
| §6 Component integration | Delivered. Component.build() wires SDK init + StatusPages capture. |
| §7 Logback appender | Delivered. Level pinned to ERROR. |
| §8 Frontend trace propagation | Delivered. Amendment: includes abbreviated env-name aliases. |
| §9 Infrastructure | Delivered. CFN export naming clarified (-API- + -I-); stack id renamed PartitionSecrets to avoid legacy collision. |
| §10 Documentation | Delivered. New architectural reference + rewritten how-to. |
What the spec got right that’s worth highlighting
Section titled “What the spec got right that’s worth highlighting”- The decision to put reportability inside
AppErrorrather than in the HTTP layer. The implementation reinforced this — it cleanly served the boundary, the Logback appender, and the last-resort handler with one policy. - Scoping
accounts-componentadoption (PDEV-533) out of the project from the start. This kept the project shippable in a finite time. - Insisting on per-partition salts rather than a single global. The complexity cost was negligible; the rotation/isolation benefit is real.
- Decoupling FE trace propagation as an independent PR. PR #845 shipped without dependency on the BE waves and unblocked end-to-end verification earlier.
Copyright: © Arda Systems 2025-2026, All rights reserved