Skip to content

Operations Sentry — Requirements

Decompose the project goal (see goal.md) into testable requirements. Each requirement has a stable R-NNN identifier, traces to one or more acceptance criteria (AC-N from goal.md) and to one or more Design Topics (DT-NNN in the project workbook), and is written so that the verification artefact (verification.md) can decompose it into mechanical checks.

Requirements are grouped by capability area, mirroring the Desired outcomes section of the goal.

  • R-NNN identifiers are stable. Renumbering breaks traceability across requirements, specification, and verification.
  • MUST statements are hard requirements. Failure to meet them blocks project closure.
  • SHOULD statements are strong defaults. Deviating from them needs an explicit decision recorded in the project’s notes.
  • MAY statements are optional refinements; non-adoption is fine.
  • Each requirement carries a Traces to field listing the AC and DT references.

The Sentry Java SDK initialisation (Sentry.init { … }) MUST live in a single module in common-module. Every Kotlin/Ktor component that consumes common-module MUST inherit the initialisation through the standard component bootstrap (Component.build(...)), with no per-component code required.

  • Traces to: AC-1; DT-002.

The bundled Sentry OpenTelemetry Java agent (io.sentry:sentry-opentelemetry-agent:8.41.0) MUST continue to attach via -javaagent: at JVM startup. The in-process SDK init MUST be idempotent and safe when the agent has already loaded the bundled SDK — Sentry.init { … } re-applies options on the existing Hub rather than triggering a re-initialisation.

  • Traces to: AC-1; DT-001.

The SDK initialisation MUST succeed when the SENTRY_DSN environment variable is empty or unset. No events are emitted in that state. Pod startup MUST NOT depend on Sentry-side reachability.

  • Traces to: AC-10; DT-001, DT-002.

R-104 — Sentry project remains platform-be

Section titled “R-104 — Sentry project remains platform-be”

The backend’s events MUST flow into the existing platform-be Sentry project for the duration of this project. No new Sentry project is provisioned. (Per-component project separation is a PDEV-533 decision when accounts-component adopts.)

  • Traces to: AC-1.

R-201 — AppError.reportable() is the classification primitive

Section titled “R-201 — AppError.reportable() is the classification primitive”

AppError MUST expose a reportable(): List<Throwable> method whose return determines which throwables are sent to Sentry for a given error. The default implementation on the sealed base returns listOf(this). The Invocation branch MUST override to return emptyList(). The Composite data class MUST override to return causes.flatMap { it.reportable() }. Internal and Generic MUST inherit the default.

A bridging extension on Throwable MUST handle the non-AppError case by returning listOf(this).

  • Traces to: AC-2, AC-3; DT-003.

The Ktor StatusPages exception<Throwable> handler installed by common-module/.../component/Component.kt MUST invoke throwable.reportable().forEach { Sentry.captureException(it) } before returning the existing HTTP response. The app.log.warn("Error: …", path, toProcess) call MUST remain in place (it is consumed by the Logback appender per R-801).

  • Traces to: AC-2, AC-3; DT-003, DT-006.

When the capture site iterates a Composite’s reportable() result, each emitted event MUST carry a Sentry tag wrapped_in_composite: <composite.message>. The tag value is subject to beforeSend scrubbing (R-501).

  • Traces to: AC-3; DT-003.

R-204 — Generic and unknown throwables capture

Section titled “R-204 — Generic and unknown throwables capture”

AppError.Generic instances MUST be captured (default inheritance). Non-AppError Throwable instances reaching a boundary MUST be captured as-is. Capture of these classes MAY carry a Sentry tag (e.g., unclassified: true) so they can be filtered in the Sentry UI for code-review follow-up.

  • Traces to: AC-2; DT-003.

The SDK init MUST enable Sentry’s UncaughtExceptionHandlerIntegration (which is on by default in 8.x; the project sets options.enableUncaughtExceptionHandler = true explicitly for clarity). The last-resort path MUST NOT apply the reportable() filter — every uncaught throwable that reaches it is captured. Events from this path MUST carry a tag via: uncaught-handler so they are distinguishable in Sentry from boundary captures.

  • Traces to: AC-7; DT-006.

R-206 — Coroutine-side last-resort capture

Section titled “R-206 — Coroutine-side last-resort capture”

The SDK init MUST install a global CoroutineExceptionHandler (exposed for consumers to install on their root scopes) that performs throwable.reportable().forEach { Sentry.captureException(it) } with a boundary: coroutine tag. The global handler MUST be available on the same common-module package as the SDK init.

  • Traces to: AC-7; DT-006.

common-module MUST expose a runBoundary(label: String, block: () -> T): T helper that wraps block in a try/catch and, on throw, invokes throwable.reportable().forEach { Sentry.captureException(it) } tagged with boundary: batch and job: <label> before re-throwing. Whether the signature is suspending or non-suspending is settled in specification.md based on the operations-side audit of system/batch/.

  • Traces to: AC-2; DT-006.

The backend’s tracesSampleRate MUST be configurable per environment via the existing oam.performance.sentry.tracesSampleRate Helm value. The values delivered by this project: dev = 1.0, stage = 1.0, demo = 0.2, prod = 0.2, local = off. The demo and prod values represent a bump from today’s 0.1.

  • Traces to: AC-5; DT-004.

R-302 — Frontend transaction sampling unchanged

Section titled “R-302 — Frontend transaction sampling unchanged”

The frontend’s tracesSampleRate in arda-frontend-app MUST remain at its current per-environment values (1.0 for dev/stage, 0.2 for prod). This project does not change frontend sampling.

  • Traces to: AC-5.

A request initiated by arda-frontend that is sampled into Sentry on the frontend side MUST appear in the same Sentry trace on platform-be, joining the FE and BE spans under one trace ID. The mechanism (W3C traceparent / Sentry sentry-trace header propagation through the BFF) already works today; this project preserves it.

  • Traces to: AC-5; DT-004 assessment.

R-304 — Frontend tracePropagationTargets is explicit and env-aware

Section titled “R-304 — Frontend tracePropagationTargets is explicit and env-aware”

The three frontend Sentry init paths (src/instrumentation-client.ts, sentry.server.config.ts, sentry.edge.config.ts) MUST set tracePropagationTargets explicitly with an allow-list that includes:

  • The same-origin path patterns the SDK previously inferred (localhost, /^\/(?!\/)/).

  • The /monitoring tunnel route used by withSentryConfig’s tunnelRoute.

  • The environment-specific backend host derived from NEXT_PUBLIC_DEPLOY_ENV and the existing API-client host-discovery mechanism. The project MUST reuse that mechanism rather than introduce a parallel one.

  • Traces to: AC-9; DT-007.

R-401 — Sessions emitted per request via a manual Ktor interceptor

Section titled “R-401 — Sessions emitted per request via a manual Ktor interceptor”

The Sentry JVM SDK 8.41.0 does not expose a SessionMode enum or a sessionMode property on SentryOptions. options.isEnableAutoSessionTracking = true alone emits at most one session per JVM lifecycle, not per request — and the bundled Sentry OTel Java agent does not contribute a session emitter (Sentry publishes no sentry-ktor server plugin; only sentry-ktor-client for HTTP client instrumentation). The documented Java SDK mechanism on a non-Spring server is manual Sentry.startSession() / Sentry.endSession() calls at each request boundary.

Each consuming component MUST install a Ktor application plugin (SentryRequestSession) that wraps every request with startSession/endSession, guarded by Sentry.isEnabled() so it no-ops when SENTRY_DSN is absent. The plugin install MUST use pluginOrNull so the eventual lift into common-module (tracked under PDEV-490) does not require coordinated per-app removal. The SDK init MUST NOT attempt to set sessionMode.

  • Traces to: AC-4; DT-004.

R-402 — Sessions enabled by default with off-switch

Section titled “R-402 — Sessions enabled by default with off-switch”

The SDK init MUST set options.enableAutoSessionTracking = true as the default, with SENTRY_AUTO_SESSION_TRACKING env-var override (true | false). Setting the env var to false MUST disable session emission while leaving exception capture intact.

  • Traces to: AC-4, AC-10; DT-004.

R-403 — Session sampling inherits from trace sampling

Section titled “R-403 — Session sampling inherits from trace sampling”

The Sentry JVM SDK 8.41.0 does not expose a sessionSampleRate property for release-health sessions (only profileSessionSampleRate exists, and that is for continuous profiling — a different feature). Backend sessions are sampled in tandem with traces: a session is emitted iff its enclosing trace is sampled. Therefore the effective session sampling rate per environment equals tracesSampleRate (dev = 1.0, stage = 1.0, demo = 0.2, prod = 0.2, local = off).

The Helm chart MUST NOT add a separate sessions.sampleRate field; the deployment template MUST NOT emit SENTRY_SESSION_SAMPLE_RATE.

  • Traces to: AC-4, AC-5; DT-004.

The Helm chart MUST extend oam.performance.sentry with a sessions.{enabled} sub-object — a single boolean field, no mode or sampleRate. The deployment template MUST emit only the SENTRY_AUTO_SESSION_TRACKING env var when both sentry.enabled and sessions.enabled are true. The SENTRY_SESSION_MODE and SENTRY_SESSION_SAMPLE_RATE env vars are NOT emitted (the JVM SDK does not consume them).

  • Traces to: AC-4, AC-10; DT-004.

R-405 — Release tag preservation and divergence acknowledgement

Section titled “R-405 — Release tag preservation and divergence acknowledgement”

The backend’s release tag MUST continue to be {appName}@{Chart.AppVersion} as set today by the deployment template. The frontend’s release tag scheme (Next.js SDK default) MUST remain unchanged. The two schemes are intentionally independent; this project does not unify them.

  • Traces to: AC-4; DT-004.

R-500 — PII handling and payload scrubbing

Section titled “R-500 — PII handling and payload scrubbing”

R-501 — beforeSend and beforeSendTransaction registration

Section titled “R-501 — beforeSend and beforeSendTransaction registration”

The SDK init MUST register a beforeSend callback on event capture and a beforeSendTransaction callback on transaction capture. Both callbacks MUST implement the scrubbing requirements R-502 through R-507.

  • Traces to: AC-6; DT-005.

When a JWT subject claim is available on the active request, the beforeSend callback MUST set event.user.id to HMAC-SHA-256(salt, JWT-subject) truncated to 16 hex chars. event.user.email, event.user.username, and event.user.ipAddress MUST be unset. The SDK MUST set options.sendDefaultPii = false.

The salt MUST be read from SENTRY_SCRUB_SALT (delivered per-purpose via ESO from AWS Secrets Manager {Infrastructure}-{purpose}-SentryScrubSalt, e.g. Alpha001-prod-SentryScrubSalt). When the salt is empty (local, tests), the callback MUST fall back to a deterministic placeholder (e.g. "no-salt:<no-id>") rather than throw.

  • Traces to: AC-6; DT-005.

R-503 — HTTP body capture with redaction

Section titled “R-503 — HTTP body capture with redaction”

Sentry’s default HTTP request body capture remains enabled. The beforeSend callback MUST apply a regex pass to event.request.data that masks the following patterns with ***:

  • JWT-like substrings (three base64url segments separated by .).

  • AWS access-key ID patterns (AKIA[0-9A-Z]{16}).

  • AWS secret access-key patterns (40-char base64 strings adjacent to access-key IDs in the same string).

  • Sentry DSN-like URLs.

  • Traces to: AC-6; DT-005.

The beforeSend callback MUST replace event.request.headers with the result of an explicit allow-list filter:

  • X-Request-Id — pass through unchanged.

  • X-Forwarded-For — pass through unchanged.

  • X-Tenant-Id — MUST be removed from the header map and replaced by an event.tags["tenant_hash"] entry whose value is HMAC-SHA-256(salt, tenant-id) truncated to 16 hex chars.

  • Every other header (including Authorization, Cookie, Set-Cookie, any custom or auto-emitted header) MUST be removed.

  • Traces to: AC-6; DT-005.

When an AppError’s context: LazyMessage lambda is invoked at capture time, the resulting string MUST pass through the same body-redaction regex set (R-503) before it lands as event extras / context. Implementation MAY apply this at the capture site (before constructing the Sentry event) or in beforeSend (over the constructed event); either is acceptable as long as the redaction is applied.

  • Traces to: AC-6; DT-005.

The beforeSendTransaction callback MUST apply a regex pass over every span’s db.statement attribute (where present) that replaces single-quoted string literals with '?' and numeric literals adjacent to comparison operators (=, <, >, <=, >=, !=) or IN (...) lists with ?. Exposed-generated parameterised statements MUST NOT be modified (they already contain ? for parameters); the redaction activates only on raw-SQL escape hatches.

  • Traces to: AC-6; DT-005.

The beforeSend callback MUST apply a final regex pass over the serialised event JSON looking for the same patterns as R-503 (JWT, AWS access-key, DSN). This is defence-in-depth against an unknown leak path.

  • Traces to: AC-6; DT-005.

Sentry-side project-level data scrubbers MUST be left enabled as defence-in-depth. The code-side policy in beforeSend is authoritative; the Sentry-side configuration MUST NOT be relied upon as the only safety net.

  • Traces to: AC-6; DT-005.

Each component’s Helm chart MUST emit a SENTRY_SCRUB_SALT env var sourced from a K8s Secret be-sentry-scrub-salt in the component’s namespace, with optional: true for fail-soft startup. An ExternalSecret MUST be declared in templates/secrets.yaml mirroring the existing be-sentry-dsn pattern, targeting AWS Secrets Manager item {Infrastructure}-{purpose}-SentryScrubSalt (e.g. Alpha001-prod-SentryScrubSalt). The remote-ref property MUST be salt (the secret stores a JSON object {"salt": "..."}).

The salt secret MUST be created per-partition (purpose) by a new CDK stack PartitionSecrets in infrastructure/src/main/cdk/stacks/purpose/partition-secrets.ts, instantiated inside buildPartition() in apps/Al1x/partition.ts (see goal.md Deliverable 7a–7d). The stack uses Secret.generateSecretString by default (random 64-character string generated by AWS Secrets Manager on first create, preserved thereafter). An optional sentryScrubSaltOverride?: string field on PartitionInfo in platforms.ts lets a known value be supplied from source for any partition that needs one; absent override means CDK generates random.

The salt is per-purpose (prod, demo, dev, stage), not per-Infrastructure. Privacy boundaries enforced by design:

  • Same purpose, multiple components (operations + accounts-component + future services) → same salt → cross-component correlation works.
  • Different purpose within the same Infrastructure (e.g. prod vs demo both in Alpha001) → different salts → cross-purpose correlation deliberately broken.
  • Different Infrastructure → different salts → cross-Infrastructure correlation deliberately broken.

Each component declares its own ExternalSecret pointing at the same AWS-side key for its purpose. Cross-namespace replication is not used (the EKS cluster has no reflector provisioned); ESO independently fetches the same upstream value into each component’s namespace.

The salt is not credential-grade; 1Password mirroring is not required. AWS Secrets Manager is used here for ESO compatibility, not for formal credential discipline.

  • Traces to: AC-6, AC-10; DT-005.

common-module MUST expose a single shared helper that takes a Throwable and a boundary label, invokes reportable(), and captures each result with boundary: <label> and any caller-supplied additional tags. The helper is the implementation primitive that every per-transport wrapper delegates to.

  • Traces to: AC-2; DT-006.

common-module MUST provide per-transport wrappers that each call the shared helper:

  • A Ktor StatusPages wrapper, installed by Component.kt automatically.
  • A runBoundary(label) { … } function for batch / synchronous boundaries.

Wrappers for future transports (gRPC server interceptors, async-worker decorators) are NOT delivered by this project but the helper’s signature MUST not preclude them.

  • Traces to: AC-2; DT-006.

R-603 — Operations-side runBoundary adoption

Section titled “R-603 — Operations-side runBoundary adoption”

operations MUST audit src/main/kotlin/cards/arda/operations/system/batch/ for entry points where work is launched outside the request coroutine scope. Each such entry point MUST be wrapped in runBoundary("<label>") { … }. If the audit finds no out-of-request entry points, the helper ships from common-module and operations adopts no calls in this project. The audit outcome MUST be recorded in specification.md.

  • Traces to: AC-2; DT-006.

R-604 — Last-resort path remains unfiltered

Section titled “R-604 — Last-resort path remains unfiltered”

The JVM uncaught handler (R-205) and the global coroutine handler (R-206) MUST NOT consult reportable() for filtering. An Invocation.* that escapes uncaught is bug-worthy by virtue of having escaped; capture is the right behaviour.

Note: the global CoroutineExceptionHandler in R-206 IS the “boundary” path for coroutines and DOES apply reportable(). The unfiltered path is the JVM thread-level Thread.UncaughtExceptionHandler that fires when a thread is dying. Implementers should preserve this distinction.

  • Traces to: AC-7; DT-006.

Sentry event fingerprinting MUST be set at the capture site rather than in beforeSend. The fingerprint MUST be derived from the AppError subtype (where applicable) plus discriminators read from the subtype’s own fields.

  • Traces to: AC-2; DT-003.

The default fingerprint MUST be:

  • {AppError subtype FQCN} as the first segment.
  • serviceName as a second segment for AppError.ExternalService, AppError.InternalService, AppError.InternalTimeout.
  • operationName as a second segment for AppError.NotImplemented.
  • For non-AppError throwables: the concrete class name + the first non-framework stack frame (caller’s package).

This default is overridable by adding subtype-aware methods on AppError if a future need arises; not delivered by this project.

  • Traces to: AC-2; DT-003.

The SDK init MUST set a per-fingerprint sampleRate such that a hot-spotting error class (e.g. an Internal.Infrastructure during a DB outage producing thousands of events per minute) does not exhaust the Sentry quota. The chosen rate MAY be a constant (e.g. 1.0 with reliance on Sentry’s server-side de-dup) for the initial implementation; per-fingerprint dynamic rates are a possible refinement, not in this project.

  • Traces to: AC-2; DT-003.

R-801 — Sentry Logback appender installed per component

Section titled “R-801 — Sentry Logback appender installed per component”

common-module MUST add a runtime dependency on io.sentry:sentry-logback (version aligned with the OTel agent version pin). The dependency is the entirety of common-module’s Logback contribution; no programmatic appender attach happens at SDK init.

Each consuming component MUST add the Sentry appender to its own logback.xml:

<appender name="SENTRY" class="io.sentry.logback.SentryAppender">
<minimumEventLevel>ERROR</minimumEventLevel>
<minimumBreadcrumbLevel>INFO</minimumBreadcrumbLevel>
</appender>
<!-- ... -->
<appender-ref ref="SENTRY"/>

For this project’s scope, operations/src/main/resources/logback.xml MUST contain this wiring. Future components (e.g. accounts-component under PDEV-533) MUST do the same to participate in Logback forwarding.

DSN-empty fail-soft is handled by SentryAppender itself: when Sentry.init runs with no DSN, the appender becomes a no-op (no Hub, events silently dropped). The XML wiring is therefore safe to leave in place in all environments.

  • Traces to: AC-8; DT-008.

The Sentry Logback appender MUST forward log events at level ERROR and above as Sentry events. The appender MUST also forward log events at any level (WARN, INFO, …) that carry an attached Throwable, since those are explicit signals from the developer.

  • Traces to: AC-8; DT-008.

R-803 — No reportable() filter on log-side

Section titled “R-803 — No reportable() filter on log-side”

The Sentry Logback appender MUST NOT consult reportable() for filtering. If a developer chooses to log.error(...) an Invocation.NotAuthorized, the log-side path captures it; the boundary side drops it. The duplication is accepted.

  • Traces to: AC-8; DT-008.

The Sentry Logback appender MUST forward log events at level INFO and above as Sentry breadcrumbs (attached to the active Hub, included in any event captured later in the same request). This gives reasonable diagnostic context without adding event volume.

  • Traces to: AC-8; DT-008.

When SENTRY_DSN is empty (typical for local and CI test runs), every code path in this project MUST behave as a no-op for Sentry purposes:

  • Sentry.init succeeds without throwing.

  • The Logback appender is not attached.

  • The Ktor StatusPages capture wrapper invokes reportable() but Sentry.captureException is a no-op.

  • The JVM uncaught handler still chains to the previously-installed handler (no behaviour change for thread termination).

  • runBoundary re-throws normally; only the Sentry.captureException call is no-op.

  • Traces to: AC-10; DT-002.

Unit and integration tests for the work in common-module and operations MUST run successfully with no SENTRY_DSN configured. Tests SHOULD use the fail-soft DSN path rather than mock the Sentry SDK; this exercises the same code path as production with a missing DSN.

  • Traces to: AC-10; DT-005.

The new dependencies MUST pass the existing CI gates: org.owasp.dependencycheck in operations, license scanning, Kover coverage thresholds, SonarQube quality gates. If any gate fails on a Sentry-specific finding, the resolution is recorded in the project notes; no gate is silently bypassed.

  • Traces to: AC-1.

A new page MUST be created at documentation/src/content/docs/current-system/oam/sentry-observability.md describing: what Sentry is in the Arda platform, agent + SDK coexistence, capture-path topology (HTTP boundary, Logback appender, JVM uncaught handler, coroutine handler), session/release-health mechanics, and the FE/BE release-tag divergence. Audience: anyone reading the platform’s runtime documentation.

  • Traces to: AC-11; DT-001..DT-008.

documentation/src/content/docs/process/craft/operations-and-monitoring/sentry-integration.md MUST be rewritten as the step-by-step implementer how-to covering: dependencies to add, SDK init wiring, Helm values, runBoundary adoption, Logback appender, PII-scrubbing recipes, and post-deploy verification. The previous stale content is superseded by this rewrite.

  • Traces to: AC-11; DT-001..DT-008.

The architectural page and the how-to MUST cross-reference each other. The how-to MAY also link to the workbook’s decision files for design rationale, with a note that those are internal-process artefacts not published to the documentation site.

  • Traces to: AC-11.

R-1101 — accounts-component compatibility

Section titled “R-1101 — accounts-component compatibility”

Every signature, default value, and public option in common-module’s SDK init MUST be safe for accounts-component to consume unmodified. The common-module PR MUST NOT introduce breaking changes to existing public APIs in the cards.arda.common.lib.* packages.

  • Traces to: AC-1, AC-10; DT-002.

operations MUST continue to build under --include-build ../common-module. The composite build is the local development workflow; CI uses the published arda-common version.

  • Traces to: AC-1.

All existing tests in common-module and operations MUST pass after the changes. New tests for the work delivered by this project are additive.

  • Traces to: AC-1, AC-2, AC-10.

A compact view of which requirements map to which acceptance criteria. Each AC is satisfied by the requirements listed.

ACSatisfying requirements
AC-1 — SDK init transparently consumedR-101, R-102, R-103, R-104, R-903, R-1101, R-1102, R-1103
AC-2 — Internal.* captured; Invocation.* droppedR-201, R-202, R-204, R-602, R-603, R-701, R-702, R-703
AC-3 — Composite recursion to independent eventsR-201, R-202, R-203
AC-4 — Sessions emitted at the configured rateR-401, R-402, R-403, R-404, R-405
AC-5 — Trace continuity FE↔BER-301, R-302, R-303
AC-6 — PII scrubbingR-501–R-509
AC-7 — Last-resort capture worksR-205, R-206, R-604
AC-8 — Logback forwardingR-801, R-802, R-803, R-804
AC-9 — FE tracePropagationTargets explicitR-304
AC-10 — Operational off-switchesR-103, R-402, R-404, R-509, R-901, R-902, R-1101
AC-11 — Documentation currentR-1001, R-1002, R-1003