Skip to content

Completion Report: Operations Sentry

Completed: 2026-05-19 (all four partitions rolled to operations 2.25.1). Linear umbrella: PDEV-537. Parallel adoption (out of scope): PDEV-533accounts-component.

End-to-end Sentry observability for the operations Kotlin/Ktor backend, wired so every consumer of arda-common 8.3.0 inherits the behaviour:

  • Error and exception tracking with AppError.reportable() policy (Internal.* / Generic reportable; Invocation.* dropped). Captured at request boundaries via runSuspendingBoundary, at background paths via a Logback appender at ERROR, and at a JVM-level last-resort handler before thread death.
  • Performance and tracing (APM) via the bundled Sentry OTel Java Agent for routes, DB queries (Exposed), and outbound HTTP (http.client spans on Documint), combined with Sentry-native sampling. End-to-end trace propagation from the browser through the BFF into the Ktor route, joining a single Sentry trace.
  • Release health via a manual SentryRequestSession Ktor application plugin (the JVM SDK does not emit per-request sessions automatically — see DT-004 supersession). Sessions flow on the Release Health tab tagged by SENTRY_RELEASE=operations@<chart-version>.
  • PII scrubbing at both beforeSend and beforeSendTransaction via the partition-scoped SENTRY_SCRUB_SALT, hashing user identifiers deterministically and applying allow-list redaction to headers, request bodies, and span data.
  • Frontend trace propagation with explicit env-aware tracePropagationTargets covering Amplify’s abbreviated env names (DEV/STAGING/PROD).
  • Infrastructure — per-partition PartitionSecrets CFN stack provisioning SentryScrubSalt in Secrets Manager, with the two CFN export prefixes (-API- for the operations Helm chart and -I- for CDK compose-time wiring).
  • Documentation — new architectural reference current-system/oam/sentry-observability.md, fully rewritten how-to process/craft/operations-and-monitoring/sentry-integration.md, and project roadmap promotion to completed/.
StreamRepositoryPRStatus
Common librarycommon-module#171Merged — arda-common 8.3.0
Component adoptionoperations#172Merged — chart 2.25.1
Infrastructureinfrastructure#459Merged
Frontend trace propagationarda-frontend-app#845Merged
Documentationdocumentation#94Open — wraps this project
Infra naming follow-upinfrastructure#460Open at project close (independent cleanup)

Deployed in order Alpha002-dev → Alpha002-stage → Alpha001-demo → Alpha001-prod on 2026-05-19. Each partition followed the two-phase recipe: (1) amm.sh provisions the PartitionSecrets CFN stack and SentryScrubSalt Secrets Manager entry; (2) the GitHub Actions matrix step rolls the operations Helm release to 2.25.1.

Partitionamm.shoperations 2.25.1Sentry env varsESO scrub-salt
Alpha002-dev
Alpha002-stage
Alpha001-demo
Alpha001-prod

Pod-level smoke verification on every partition confirmed SENTRY_DSN, SENTRY_ENVIRONMENT, SENTRY_RELEASE, SENTRY_SCRUB_SALT (64-char value) populated, both be-sentry-dsn and be-sentry-scrub-salt ExternalSecrets at SecretSynced=True/Ready=True, and both replicas 1/1 Running on the new chart.

  • AC-1 — Boundary error capture: an AppError.Internal.* thrown in a Ktor route appears in Sentry under platform-be with the configured fingerprint, the joined frontend trace ID, and tenant_hash tag. ✅
  • AC-2 — Invocation.* drop: AppError.Invocation.NotFound thrown in a Ktor route produces no Sentry issue. ✅
  • AC-3 — Background capture: a log.error(...) in a scheduled job produces an issue without a request scope. ✅
  • AC-4 — Last-resort capture: an uncaught throwable in a fire-and-forget coroutine reaches the JVM-level handler and produces an issue. ✅ (validated by the CsvUploadService fix and supplementary unit tests).
  • AC-5 — End-to-end trace: a browser-initiated request flows as a single Sentry trace through the BFF into the Ktor route and includes DB spans and outbound http.client spans. ✅ — verified empirically with trace 776c560abd12401a9c4bc2dc869581e9 from a Documint-printing preview deploy of PR #845.
  • AC-6 — PII scrubbed: no plaintext user identifier or restricted header appears in any captured event or span. ✅ — verified with the spans-dataset query in the rewritten how-to.
  • AC-7 — Release health: sessions flow on the Release Health tab tagged operations@2.25.1. ✅ — verified after the manual SentryRequestSession plugin landed.
  • AC-8 — Partition isolation of salts: each partition uses a distinct SENTRY_SCRUB_SALT. ✅ — verified via Secrets Manager and ESO output across all four partitions.

Under byproducts/:

  • changelog.md — what changed by repository.
  • learnings.md — non-obvious lessons (SDK 8 session-emission quirk, OTel agent OTLP suppression, ESO templating, JSDoc trap, fire-and-forget scope).
  • suggestions.md — improvements worth doing but out of scope.
  • alternatives.md — paths evaluated and not taken.
  • skipped.md — scope deferred, with tracking tickets.
  • specification-post.md — what the spec would say in hindsight.

The design artefacts under specification/, plan/, and the decision-log.md remain authoritative for the design record. The full exploration record is in the workbook at workbooks/notebooks/operations-sentry/.

TicketScopeStatus
PDEV-491Documint client-side Sentry HTTP plugin (semantic context)Open
PDEV-533Adopt Sentry instrumentation in accounts-componentOpen (out of scope here from the start)
PDEV-538Remove deprecated SENTRY_ENABLE_AUTO_SESSION_TRACKING once safeOpen
PDEV-541Unify legacy partition-secrets stack with new CDK PartitionSecretsOpen
PDEV-543Review-thread follow-up from operations#172Open
PDEV-544Review-thread follow-up from documentation#94 (blocked-by PDEV-543)Open
  • The Documint client-side Sentry plugin — outbound visibility is already covered by the OTel agent’s generic http.client spans; the plugin would add semantic enrichment. See PDEV-491.
  • A broader audit of fire-and-forget launch { ... } invocations inside Ktor routes. The CSV path was the known leak and is fixed; a wider sweep is candidate scope for a future sprint.
  • An E2E Playwright smoke test that asserts sentry-trace / baggage header propagation. Empirically verified on PR #845 preview; not encoded as a test yet.

The project closed on schedule across all four partitions. Two corrections to the original specification deserve carrying forward:

  1. Sentry JVM SDK 8.x emits no per-request sessions on Ktor. Anyone adopting the SDK on Ktor needs the SentryRequestSession plugin until Sentry publishes an official server-side Ktor plugin.
  2. The bundled OTel Java Agent enables the OTLP HTTP exporter by default. Set OTEL_*_EXPORTER=none whenever there is no collector running, or the pod logs fill with localhost:4318 retry errors.

Both are now baked into the operations chart, into arda-common 8.3.0’s documented usage, and into the rewritten sentry-integration.md how-to.