Skip to content

Operations Sentry — Verification

Translate the project’s acceptance criteria (goal.md) and requirements (requirements.md) into a concrete verification plan. Each AC is verified by a combination of unit tests, an integration / smoke test, and post-deploy Sentry MCP queries. The plan is reproducible: any operator with access to the dev environment and the Sentry MCP can re-run it.

Before any verification fires, these conditions must hold. Failure of any pre-flight item blocks verification of all downstream ACs.

#PrerequisiteHow to confirm
P-1Four AWS Secrets Manager items exist (one per partition), created by the PartitionSecrets CDK stack: Alpha001-prod-SentryScrubSalt, Alpha001-demo-SentryScrubSalt, Alpha002-dev-SentryScrubSalt, Alpha002-stage-SentryScrubSaltaws secretsmanager describe-secret --secret-id Alpha001-prod-SentryScrubSalt --profile Admin-Alpha1 (and equivalents per partition; use --profile Alpha002-Admin for Alpha002).
P-2CloudFormation stacks Alpha001-prod-PartitionSecrets, Alpha001-demo-PartitionSecrets, Alpha002-dev-PartitionSecrets, Alpha002-stage-PartitionSecrets are deployed in their respective accountsaws cloudformation describe-stacks --stack-name Alpha001-prod-PartitionSecrets --profile Admin-Alpha1 (and equivalents). Each stack should emit a *-API-SentryScrubSaltArn CFN export.
P-3common-module PR merged; arda-common publishedgit -C common-module log -1 --oneline origin/main; new version visible in gradle/libs.versions.toml after git pull in operations.
P-4operations PR mergedgit -C operations log -1 --oneline origin/main shows the merge commit.
P-5operations deployed to dev from origin/mainkubectl --context arn:aws:eks:...:Alpha002Dev -n operations get pods shows the new image running.
P-6Sentry DSN reachable from the podPod logs show no DSN errors at startup; SENTRY_DSN env var is populated.
P-7Salt is materialised into the pod (per-purpose; matches the partition the pod runs in)kubectl exec <pod> -- printenv SENTRY_SCRUB_SALT returns a non-empty value; cross-check that it matches the AWS-side value of the partition’s {Infrastructure}-{purpose}-SentryScrubSalt secret.
P-8arda-frontend-app PR with explicit tracePropagationTargets is merged and deployedAmplify preview / prod deploy of the merged commit; spot-check by inspecting the deployed bundle’s Sentry config.

Note: the previous P-2 (1Password mirrors) was removed once the salt’s threat model clarified — it is not credential-grade and does not require 1Password vault mirroring. The CFN-stack existence check (now P-2) was added when the salt moved into a per-partition CDK stack rather than being an external prerequisite.

Unit tests live in common-module/lib/src/test/kotlin/cards/arda/common/lib/runtime/observability/ (or sibling). All tests run on Kotest + MockK + JUnit per the repository’s testing conventions.

TestVerifies
AppError.Internal.Implementation.reportable() returns listOf(this)R-201
AppError.Internal.Infrastructure.reportable() returns listOf(this)R-201
Every other concrete Internal.* subtype returns listOf(this)R-201
AppError.Invocation.NotFound.reportable() returns emptyList()R-201
Every concrete Invocation.* subtype returns emptyList()R-201
AppError.Generic.reportable() returns listOf(this)R-201
AppError.Composite(causes = [Internal.X, Invocation.Y]).reportable()[Internal.X]R-201, R-202
AppError.Composite carrying a nested Composite flat-maps recursivelyR-201
RuntimeException("…").reportable() (bridging extension) returns listOf(this)R-201, R-204
TestVerifies
scrubEvent replaces event.user.id with OpaqueId.opaqueId(originalId)R-502
scrubEvent unsets event.user.email, username, ipAddressR-502
scrubEvent body with eyJ… JWT-like substring redacts to ***R-503
scrubEvent body with AKIAEXAMPLE… access-key pattern redacts to ***R-503
scrubEvent headers map: X-Request-Id passes through, Authorization removedR-504
scrubEvent X-Tenant-Id: T-12345 removed from headers; event.tags["tenant_hash"] set to opaqueId("T-12345")R-504
scrubEvent AppError.context lambda output containing a fake JWT is redactedR-505
scrubTransaction span with db.statement = "SELECT * FROM x WHERE flag = 'on'"flag = '?'R-506
scrubTransaction already-parameterised statement unchangedR-506
scrubEvent defensive backstop catches a hand-crafted event extra containing a JWTR-507
TestVerifies
opaqueId("user-1") is deterministic across calls when the salt is setR-502
opaqueId("user-1") differs when the salt differsR-502
opaqueId("user-1") returns the placeholder when the salt is emptyR-502
opaqueUser(user) zeroes email / username / ipAddressR-502
TestVerifies
filter with X-Request-Id, X-Forwarded-For passes them throughR-504
filter with X-Tenant-Id invokes putTenantHashAsTag and excludes the headerR-504
filter removes Authorization, Cookie, Set-Cookie, every other headerR-504
filter is case-insensitive on header namesR-504
TestVerifies
Throwable.captureViaReportable("http", "route" to "/x") with an Internal.* calls Sentry.captureException once with the boundary + route tagsR-202, R-602
Same with an Invocation.* does NOT call Sentry.captureExceptionR-202
Same with a Composite carrying two Internal.* causes calls Sentry.captureException twice, each with the wrapped_in_composite tagR-203
runBoundary("nightly-rollup") { throw … } calls captureException with boundary=batch, job=nightly-rollup and re-throwsR-207, R-602
runBoundary("…") { 42 } returns 42 without calling SentryR-207
TestVerifies
applyFingerprint(scope, Internal.Implementation(...)) sets fingerprint to [FQCN]R-702
applyFingerprint(scope, Internal.InternalService("svcA", …)) sets fingerprint to [FQCN, "svcA"]R-702
applyFingerprint(scope, RuntimeException("…")) includes class name + first frameR-702
TestVerifies
SentryInit.init() with SENTRY_DSN="" succeeds without throwing (no programmatic appender attach; XML appender remains wired but no-ops via “no Hub” fallback)R-103, R-901
SentryInit.init() with SENTRY_AUTO_SESSION_TRACKING=false sets options.isEnableAutoSessionTracking = falseR-402
SentryInit.init() does NOT attempt to set options.sessionMode (the property does not exist on SentryOptions in 8.41.0); test that the init compiles and runs without referring to a SessionMode symbolR-401
SentryInit.init() called twice in the same JVM is idempotent (no second listener registered)R-102

Smoke tests fire against a deployed operations pod in the dev environment. The chosen trigger route is POST /v1/kanban/(authenticate auth)/kanban-card/details because:

  • It already exhibits observable behaviour in Sentry (it is the route currently producing the auto-detected N+1 Query Issue on platform-be).
  • It is exercised by the existing api-test Bruno suite.
  • It has a well-understood failure mode that can be deliberately triggered.

The smoke test runs the api-test Bruno suite (per the run-operations-api-test skill) against the deployed pod, then queries Sentry via the MCP for the expected event shapes.

  1. Pre-flight. Confirm prerequisite items P-1 through P-7. Note the pod’s release tag from kubectl exec <pod> -- printenv SENTRY_RELEASE (e.g. operations@1.5.3).
  2. Baseline. Query the Sentry MCP for platform-be Issues in the last 5 minutes to establish a baseline of background noise.
  3. Fire the test traffic. Run the api-test suite. For each AC below, run the targeted fixture (described per-AC).
  4. Wait 60 seconds for Sentry’s ingest delay.
  5. Verify. Run the Sentry MCP queries listed per-AC. Each query has an expected result shape; deviations indicate a failed verification.
  6. Tear down. No cleanup needed (Sentry events stay; no other state mutated).

Re-used queries (replace placeholders as needed):

# Issues in platform-be in the last N minutes
mcp__claude_ai_Sentry__search_issues(
organizationSlug='arda-systems',
projectSlugOrId='platform-be',
query='firstSeen:-Nmin',
sort='date'
)
# Spans in platform-be matching a trace
mcp__claude_ai_Sentry__search_events(
organizationSlug='arda-systems',
projectSlug='platform-be',
dataset='spans',
query='trace:<TRACE_ID>',
fields=['span.op','transaction','status'],
statsPeriod='1h'
)
# Sessions for a release
mcp__claude_ai_Sentry__search_events(
organizationSlug='arda-systems',
projectSlug='platform-be',
dataset='replays', # adjust to the sessions dataset name when confirmed at impl time
query='release:operations@<VERSION> environment:alpha002-dev',
statsPeriod='1h'
)

Goal: A new Kotlin component built on common-module with oam.performance.sentry.enabled: true and a valid DSN emits Sentry events at the configured sample rate with no per-component code.

Procedure:

  1. Confirm via kubectl exec <operations-pod> -- printenv | grep SENTRY that SENTRY_DSN, SENTRY_ENVIRONMENT, SENTRY_RELEASE, SENTRY_TRACES_SAMPLE_RATE, SENTRY_AUTO_SESSION_TRACKING, SENTRY_SCRUB_SALT are all set. (SENTRY_SESSION_MODE and SENTRY_SESSION_SAMPLE_RATE are intentionally absent — see DT-004.)
  2. Inspect pod start-up logs: should show no Sentry init errors; should NOT show “DSN not set” warnings (which would indicate the env var didn’t reach the JVM).
  3. Confirm that no operations-side code under src/main/kotlin/cards/arda/operations/runtime/ calls Sentry.init or imports io.sentry.* directly. The init must come solely through common-module’s Component.build(...).

Pass condition: All env vars present, no init errors, no Sentry.init calls in operations.

AC-2 — Internal.* captured; Invocation.* dropped

Section titled “AC-2 — Internal.* captured; Invocation.* dropped”

Goal: A deliberate AppError.Internal.Implementation raised in any Ktor route produces a Sentry Issue tagged boundary: http with the request route. An AppError.Invocation.NotAuthorized raised on the same route produces no Sentry event.

Procedure:

  1. Internal capture. Trigger the kanban-card detail route with a request payload known to produce an AppError.Internal.Implementation (specific fixture is documented in the api-test suite as kanban-internal-trigger). Note the response’s X-Request-Id header for correlation.
  2. Wait 60 s. Query Sentry MCP:
    mcp__claude_ai_Sentry__search_issues(
    organizationSlug='arda-systems',
    projectSlugOrId='platform-be',
    query=f'tag:call_id:{X_REQUEST_ID}',
    sort='date'
    )
  3. Invocation drop. Trigger the same route with a payload that produces an AppError.Invocation.NotAuthorized (Bruno fixture kanban-invocation-trigger). Note the new X-Request-Id.
  4. Wait 60 s. Re-query Sentry MCP with the new X-Request-Id.

Pass condition: Step 2 returns exactly one Issue with boundary: http tag, route tag matching the called path, stack trace ending at the application code. Step 4 returns zero results.

AC-3 — Composite recursion to independent events

Section titled “AC-3 — Composite recursion to independent events”

Goal: A Composite carrying one Internal.Infrastructure and one Invocation.GeneralValidation produces exactly one Sentry event (the Infrastructure cause), tagged wrapped_in_composite: <composite.message>.

Procedure:

  1. Trigger a Bruno fixture (kanban-composite-trigger) that the implementer arranges to produce the right composite shape.
  2. Wait 60 s. Query Sentry MCP for events with the wrapped_in_composite tag set, scoped to the request’s call-id.
  3. Inspect the matching event: its underlying throwable class must be AppError.Internal.Infrastructure, not AppError.Composite. No second event for the validation cause should exist.

Pass condition: Exactly one event matching the call-id and tag; underlying type is Internal.Infrastructure; no companion event for the validation cause.

AC-4 — Sessions emitted at the configured rate

Section titled “AC-4 — Sessions emitted at the configured rate”

Goal: The Release Health tab for the deployed release shows a non-zero session count proportional to traffic in non-local environments. Sessions inherit the trace sample rate (one session emitted per sampled trace) — there is no separate session-sample-rate knob on the JVM SDK.

Procedure:

  1. Run a known traffic batch (e.g. the full api-test Bruno suite) against the dev pod for at least 2 minutes.
  2. Compute the expected session count: requests sent × tracesSampleRate (1.0 in dev) ≈ session count.
  3. Wait 60 s, then visit arda-systems.sentry.io → platform-be → Releases → operations@<VERSION> → Health tab in the browser. Record the displayed session count.
  4. Alternative MCP-based check (preferred for reproducibility):
    mcp__claude_ai_Sentry__search_events(
    organizationSlug='arda-systems',
    projectSlug='platform-be',
    dataset='spans', # adjust to sessions dataset at impl time
    query=f'release:operations@{VERSION} environment:alpha002-dev',
    fields=['count()'],
    statsPeriod='1h'
    )
  5. In demo / prod after the deploy lands there, repeat with the prod rate of 0.2 and an expected proportional count.

Pass condition: Session count is within ±20% of expected.

Goal: A request initiated by arda-frontend that is sampled into Sentry on the frontend side appears in the same Sentry trace on platform-be, joining the FE and BE spans under one trace ID.

Procedure:

  1. From a browser on dev.arda.cards (or the appropriate dev URL), perform an action that triggers a known backend call (e.g. open a kanban view). Capture the trace ID from the FE’s Sentry-side instrumentation (or via the browser network tab: the request to the BFF carries a sentry-trace header).
  2. Query Sentry MCP for spans matching that trace ID across both projects:
    mcp__claude_ai_Sentry__search_events(
    organizationSlug='arda-systems',
    dataset='spans',
    query=f'trace:{TRACE_ID}',
    fields=['project','transaction','span.op'],
    statsPeriod='1h'
    )

Pass condition: The result contains spans from BOTH arda-frontend and platform-be. The FE spans include http.client ops calling the BFF; the BE spans include http.server ops on /v1/... routes.

Goal: A deliberate error triggered with a request body containing a fake JWT, a fake email, and an X-Tenant-Id header produces a Sentry event where: user.id is the opaque hash, no plaintext email or JWT remains in any field, the X-Tenant-Id header is absent from request.headers, a tenant_hash tag carries the hashed value, and db.statement shows the parameterised form.

Test fixture (Bruno): sentry-pii-smoke — a request to the kanban-card detail route with:

  • Header Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJ0ZXN0LXVzZXItMTIzIn0.AAAA (fake JWT, subject test-user-123).
  • Header X-Tenant-Id: T-99999.
  • Body: {"email": "alice@example.com", "diagnostic_token": "AKIAIOSFODNN7EXAMPLE"} and an item-id known to trigger an Internal.Implementation.

Procedure:

  1. Fire the fixture. Capture the response’s X-Request-Id.

  2. Wait 60 s. Open the resulting Issue in Sentry (or query MCP for events with this call_id).

  3. Inspect the event payload:

    FieldExpected
    user.id16-hex-char value; not equal to test-user-123
    user.emailabsent
    user.usernameabsent
    user.ip_addressabsent
    request.headers["Authorization"]absent
    request.headers["X-Tenant-Id"]absent
    request.headers["X-Request-Id"]present, unchanged
    tags["tenant_hash"]16-hex-char value; not equal to T-99999
    request.data (body)contains *** in place of eyJhbGciOiJIUzI1NiJ9.… and AKIAIOSFODNN7EXAMPLE
    request.data (body)may contain alice@example.com (email is not a default-scrubbed pattern; revisit in a future iteration if it becomes a concern)
    Any span’s db.statementcontains ? placeholders only; no literal values

Pass condition: All assertions in the table above hold.

Goal: A deliberate Thread.start { throw RuntimeException(...) } outside any boundary handler produces a Sentry event tagged via: uncaught-handler before the thread dies.

Procedure:

This AC requires an ad-hoc test hook because production code doesn’t normally Thread.start arbitrarily. Two options:

  • A: Add a temporary kotlin-test integration test under operations that spins up an EmbeddedServer with the same Component.build(...) wiring and then Thread { throw RuntimeException("uncaught-test") }.start(). Assert that a Sentry event is captured via the test-mode hub. (Requires injecting a test SDK transport; can be deferred to a later refinement.)
  • B: A one-off manual smoke: SSH into the dev pod (via kubectl exec -it), launch jshell inside the JVM, and new Thread(() -> { throw new RuntimeException("uncaught-test"); }).start(). Wait, then query Sentry MCP for events with via: uncaught-handler.

Recommended: B for project closure; A may be added later as ongoing regression coverage.

Pass condition: A Sentry event appears within 60 s of the Thread.start call, tagged via: uncaught-handler, with the RuntimeException("uncaught-test") message.

Goal: A log.error(...) call from a code path that catches and continues (does not throw to a boundary) produces a Sentry event. The same exception caught at a request boundary and reaching the StatusPages handler produces an event that Sentry groups into the same Issue.

Pre-flight (wiring check):

The Sentry Logback appender is wired per component (per DT-008) via XML, not programmatically. Confirm the wiring is in place in the deployed pod:

kubectl exec <operations-pod> -- cat /app/resources/logback.xml | grep -E 'SentryAppender|appender-ref ref="SENTRY"'

Both an <appender name="SENTRY" class="io.sentry.logback.SentryAppender"> definition and an <appender-ref ref="SENTRY"/> on the root logger must appear. Missing wiring is a deploy-time gap (not a Sentry-side issue) and must be fixed before the rest of the procedure can pass.

Procedure:

  1. Log-side capture. Use a deliberate api-test fixture (sentry-logback-smoke) that exercises a code path the implementer arranged to log-and-continue an AppError. Wait 60 s.
  2. Query Sentry MCP for events with the request’s call-id; expect at least one event.
  3. Duplication grouping. Use a fixture (sentry-duplication-smoke) that triggers a code path captured by BOTH the StatusPages boundary AND the Logback appender (e.g. Internal.Implementation reaches StatusPages, and app.log.warn(...) runs with the same throwable). Wait 60 s.
  4. Query Sentry MCP for the resulting Issue(s). Confirm that the two underlying events end up in the same Issue (i.e. Sentry’s fingerprinting groups them).

Pass condition: Pre-flight wiring check passes. Step 2 returns at least one event from the log path. Step 4 returns one Issue containing two events (not two Issues each with one event). If Sentry groups them into two parallel Issues, that’s a known limitation acceptable for project closure but flagged as a refinement candidate.

AC-9 — FE tracePropagationTargets are explicit

Section titled “AC-9 — FE tracePropagationTargets are explicit”

Goal: Explicit env-aware tracePropagationTargets in all three Sentry init paths in arda-frontend-app and the env-specific backend host is included for each environment.

Procedure:

  1. In each of the three init files (src/instrumentation-client.ts, sentry.server.config.ts, sentry.edge.config.ts), inspect the Sentry.init({...}) call. Each must include a tracePropagationTargets: key with a non-default array.
  2. Run the helper from the browser console in the deployed dev frontend: Sentry.getClient().getOptions().tracePropagationTargets. Confirm the array includes "localhost", the same-origin regex, "/monitoring", and the dev backend host.
  3. Repeat for the prod deployment with the prod backend host.

Pass condition: All three files set the option explicitly. Browser-side inspection shows the expected array per environment.

Goal: Setting oam.performance.sentry.enabled: false in Helm disables all backend Sentry behaviour; setting SENTRY_AUTO_SESSION_TRACKING=false disables sessions while leaving exception capture intact; an empty SENTRY_DSN results in fail-soft no-op behaviour.

Procedure (run on a throwaway sandbox; do NOT change dev/stage/demo/prod):

  1. Global off. Deploy a sandbox operations pod with oam.performance.sentry.enabled: false. Trigger the sentry-pii-smoke fixture. Confirm: no env vars (SENTRY_*) on the pod; no events appear in Sentry for the request.
  2. Sessions off, exceptions on. Deploy with sentry.enabled: true AND SENTRY_AUTO_SESSION_TRACKING=false (override). Trigger kanban-internal-trigger. Confirm: Issue appears in Sentry; session count does not increment for the new release.
  3. Empty DSN. Deploy with sentry.enabled: true but the be-sentry-dsn secret pointing at a non-existent AWS Secrets Manager item (ESO will fail to materialise; the pod’s SENTRY_DSN env will be empty). Confirm: pod starts; no Sentry events captured; logs show no Sentry init errors.

Pass condition: Each of the three knobs behaves as documented.

Goal: Two pages are in place: an architectural reference at current-system/oam/sentry-observability.md and an implementer how-to at process/craft/operations-and-monitoring/sentry-integration.md (rewritten from the previous stale content).

Procedure:

  1. Inspect the merged documentation repository on main. Confirm both files exist with the expected paths.
  2. The architectural page must cover: agent + SDK coexistence, capture-path topology (with PlantUML diagram), session/release-health mechanics, FE/BE release-tag divergence, PII scrubbing posture.
  3. The how-to must cover: dependencies to add, SDK init wiring, Helm values, runBoundary adoption, Logback appender notes, PII-scrubbing test recipes, post-deploy verification using the Sentry MCP.
  4. Run make pr-checks (or the equivalent CI gate) on the documentation site; confirm no link breaks.
  5. Smoke-render the documentation site (make dev then visit the pages) to confirm content renders correctly.

Pass condition: Both files exist, contain the required sections, render without errors, pass pr-checks.

A compact view of which test artefact covers each requirement.

R-IDUnit testsSmoke / integrationPost-deploy MCP queries
R-101SentryInit idempotency testAC-1
R-102SentryInit idempotency testAC-1
R-103SentryInit DSN-empty testAC-10
R-104AC-1 (env vars present)
R-201AppError.reportable() testsAC-2, AC-3AC-2 Issue count query
R-202BoundaryCapture testsAC-2AC-2 Issue inspection
R-203BoundaryCapture composite testAC-3AC-3 tag inspection
R-204AppError.reportable() non-AppError testAC-7AC-7 event verification
R-205AC-7 (manual jshell smoke)AC-7 event with via:uncaught-handler
R-206(manual test of CoroutineExceptionHandlerFactory.global)AC-7 (coroutine variant if exercised)
R-207BoundaryCapture.runBoundary tests(only if batch audit finds work)
R-301..R-303AC-4, AC-5AC-4 session count; AC-5 trace span query
R-304AC-9AC-9 deployed-bundle inspection
R-401..R-405SentryInit session option testsAC-4AC-4 session count query
R-501..R-509PiiScrubber, OpaqueId, HeadersAllowList testsAC-6AC-6 event inspection
R-601, R-602BoundaryCapture testsAC-2
R-603(impl-time audit; report findings)
R-604(manual test of unfiltered path)AC-7AC-7 event tag check
R-701, R-702Fingerprinting testsAC-2 (Issue grouping inspection)AC-2 Issue count vs unique event class
R-703AC-2 (under sustained load)AC-2 Issue event count
R-801..R-804(no unit test — wiring is XML)AC-8 pre-flight + AC-8 procedureAC-8 Issue / event inspection
R-901, R-902All unit tests run with SENTRY_DSN=""AC-10 (empty-DSN deploy)
R-903CI pipeline
R-1001..R-1003AC-11
R-1101..R-1103All common-module and operations tests passCI pipeline

The project closes when:

  1. All ACs verified. Each AC’s “Pass condition” above holds in the dev environment. Demo and prod verification is repeated when those deploys land.
  2. PRs merged in sequence per specification.md: infrastructurecommon-moduleoperations. The arda-frontend-app PR and the documentation PR can land in any order relative to the others, with the constraint that the documentation PR should not land before the infrastructure PR (the how-to references the salt-secret name).
  3. CHANGELOG entries present in each repo (direct-edit for common-module, operations, arda-frontend-app; PR-body for documentation).
  4. Workbook moved to roadmap/completed/operations-sentry/. The promote-to-roadmap skill handles the curation; the workbook source under workbooks/notebooks/operations-sentry/ stays as the source-of-process record.
  5. PDEV-533 remains open as a separate, parallel track for accounts-component adoption. The closure of this project does NOT close PDEV-533.
  6. All worktrees removed, local branches deleted, after the PRs merge.

Verification artefacts to produce during implementation

Section titled “Verification artefacts to produce during implementation”

For traceability and re-runnability, the implementer produces:

  • A short report in documentation/src/content/docs/roadmap/in-progress/operations-sentry/verification-report.md (or appended here) capturing the actual pre-flight values, smoke-test outcomes, and any deviations from the expected results. Promoted to roadmap/completed/ alongside the goal at closure.
  • A set of Bruno fixtures (kanban-internal-trigger, kanban-invocation-trigger, kanban-composite-trigger, sentry-pii-smoke, sentry-logback-smoke, sentry-duplication-smoke) in the api-test repository under a new sub-collection. Reusable as ongoing regression coverage.
  • A canonical Sentry-side test release tag (e.g. operations@<version>-smoke-N) for each verification run, so historical runs are queryable in Sentry.