Skip to content

Operations Sentry — Project Plan

Execution plan for the implementation phase of the operations-sentry project. Translates the requirements (requirements.md), specification (specification.md), and verification (verification.md) into a sequenced task list with per-phase verification gates and PR-landing choreography.

The plan assumes:

  • Solo single-session execution — one implementer (the author / the agent operating under the author’s direction), proceeding through the phases linearly. No agent-team coordination.
  • Composite build for local developmentoperations builds against common-module via --include-build ../common-module during the implementation; arda-common is only published to GitHub Packages between the common-module PR merge and the operations PR opening.
  • All worktrees on jmpicnic/operations-sentry under projects/operations-sentry-worktrees/. Five worktrees in play: workbooks, common-module, infrastructure, operations, documentation. (arda-frontend-app is an independent track without a project-local worktree; the DT-007 change is done in the main arda-frontend-app/ clone.)
  • Phase 1 (common-module) commits one-per-module. Each new file in runtime/observability/ and each modified existing file gets its own commit so the branch reads as a sequence of focused changes. The final PR can squash if desired, but the branch itself is incrementally reviewable.
  • CHANGELOG entries at PR-open time, not per-phase. Each repo’s CHANGELOG (or PR-body ## CHANGELOG for documentation) is written when the PR is opened, drawing from the consolidated branch history. Follow workspace/instructions/claude/rules/changelog.md — one bullet per coherent change, intent + outcome only, no file lists, no commit recitations. Section order within a release: Changed/RemovedAdded/DeprecatedFixed/Security.
  • Bruno fixtures intentionally skipped. AC verification relies on unit tests with mocks for wiring correctness, plus manual smoke (curl + Sentry MCP) for the deploy-time ACs. The api-test repository is not modified by this project.
#ItemStatus
0.1workbooks worktree on jmpicnic/operations-sentry (exploration commit 6b5da7b)
0.2common-module worktree on jmpicnic/operations-sentry, at origin/main
0.3operations worktree on jmpicnic/operations-sentry, at origin/main
0.4documentation worktree on jmpicnic/operations-sentry, at origin/main
0.5infrastructure worktree on jmpicnic/operations-sentry, at origin/main
0.6Project-level CLAUDE.md at projects/operations-sentry-worktrees/CLAUDE.md
0.7Spec set complete: goal.md, decision-log.md, specification/{analysis,requirements,specification,verification}.md
0.8Linear tickets filed: PDEV-533 (accounts-component adoption, parallel track), PDEV-535 (Sentry-side org configuration verification)

The dependency graph between phases. Solo execution proceeds in any topologically-valid order; the diagram shows what blocks what.

PlantUML diagram

Phase 1 — common-module: SDK + observability primitives

Section titled “Phase 1 — common-module: SDK + observability primitives”

Goal. Ship the runtime/observability package, the AppError.reportable() method, and the Ktor-side capture wiring. After this phase, common-module builds and tests pass; operations (in composite build) sees the new symbols.

Path (under projects/operations-sentry-worktrees/common-module/)Change
gradle/libs.versions.tomlAdd sentry-version = "8.41.0", libs.sentry, libs.sentry.logback.
lib/build.gradle.ktsimplementation(libs.sentry), implementation(libs.sentry.logback).
lib/src/main/kotlin/cards/arda/common/lib/runtime/observability/SentryInit.ktNEW — Sentry.init { … } with options per specification.md.
…/observability/BoundaryCapture.ktNEW — Throwable.captureViaReportable, runBoundary, runSuspendingBoundary, captureFromKtorBoundary.
…/observability/PiiScrubber.ktNEW — scrubEvent (beforeSend), scrubTransaction (beforeSendTransaction).
…/observability/OpaqueId.ktNEW — HMAC helper + opaqueUser.
…/observability/HeadersAllowList.ktNEW — deny-by-default header filter with X-Tenant-Id → tenant_hash rewrite.
…/observability/Redactor.kt, …/observability/DbStatementRedactor.ktNEW — regex sets for body / DB statement redaction.
…/observability/Fingerprinting.ktNEW — capture-site fingerprint formula.
…/observability/CoroutineExceptionHandlerFactory.ktNEW — global handler exposed for consumers.
lib/src/main/kotlin/cards/arda/common/lib/lang/errors/AppError.ktAdd open fun reportable(): List<Throwable> = listOf(this) on the sealed base, override on Invocation (emptyList()) and Composite (causes.flatMap { it.reportable() }). Add bridging extension fun Throwable.reportable(): List<Throwable>.
lib/src/main/kotlin/cards/arda/common/lib/component/Component.ktAt build(...) start: call SentryInit.init(). Replace the StatusPages install body’s app.log.warn(...) with the equivalent that also invokes captureFromKtorBoundary(toProcess, call.request.path(), callId) before the existing log call. Log line preserved.
lib/src/test/kotlin/cards/arda/common/lib/runtime/observability/NEW — unit tests per verification.md unit-test plan.
lib/src/test/kotlin/cards/arda/common/lib/lang/errors/AppErrorReportableTest.ktNEW — covers every AppError subtype’s reportable().
CHANGELOG.mdDeferred to PR-open time (Phase 6). Direct-edit per workspace/instructions/claude/rules/changelog.md.
Terminal window
make -C /Users/jmp/code/arda/projects/operations-sentry-worktrees/common-module build
make -C /Users/jmp/code/arda/projects/operations-sentry-worktrees/common-module clqLint

Both must pass. Unit tests run as part of build. Kover threshold (whatever the repo enforces today) must hold.

One commit per module file inside runtime/observability/ (so SentryInit.kt, BoundaryCapture.kt, PiiScrubber.kt, OpaqueId.kt, HeadersAllowList.kt, Redactor.kt, DbStatementRedactor.kt, Fingerprinting.kt, CoroutineExceptionHandlerFactory.kt each get their own commit). Modifications to AppError.kt and Component.kt each get their own commit. The build.gradle.kts / libs.versions.toml change is one commit. Tests can be batched per module-under-test, in commits that immediately follow the production code they exercise. The result: a jmpicnic/operations-sentry branch in common-module/ that reads as a sequence of focused, individually-reviewable changes.

  • All files above committed on jmpicnic/operations-sentry in common-module/, one commit per module per the cadence above.
  • make build passes locally.
  • CHANGELOG entry deferred to PR-open time (Phase 6).
  • Branch ready to push.

Phase 2 — infrastructure: PartitionSecrets stack

Section titled “Phase 2 — infrastructure: PartitionSecrets stack”

Goal. Add the PartitionSecrets stack and wire it into the per-partition app composer. After this phase, cdk synth for each partition produces a ${partitionPrefix}-PartitionSecrets stack with a SentryScrubSalt resource.

Path (under projects/operations-sentry-worktrees/infrastructure/)Change
src/main/cdk/stacks/purpose/partition-secrets.tsNEW — PartitionSecrets extends cdk.Stack per specification.md.
src/main/cdk/platforms.tsAdd sentryScrubSaltOverride?: string to PartitionInfo and Partition. Plumb through the constructor. ENVIRONMENTS entries default to omitting the field.
src/main/cdk/apps/Al1x/partition.tsImport partitionSecrets; inside buildPartition() instantiate new partitionSecrets.PartitionSecrets(app, ${partitionPrefix}-PartitionSecrets, { locator: partition.locator, sentryScrubSaltOverride: partition.sentryScrubSaltOverride, ...partialStackProps }) and call .publish(). No addDependency. Place the instantiation early (it depends on no other stack).
CHANGELOG.mdDeferred to PR-open time (Phase 6).
Terminal window
make -C /Users/jmp/code/arda/projects/operations-sentry-worktrees/infrastructure build
npm run --prefix /Users/jmp/code/arda/projects/operations-sentry-worktrees/infrastructure synth:Al1x-Alpha001-prod 2>&1 | tail -20

build runs the TypeScript compile + lint + tests. The synth confirms CFN template generation for Alpha001-prod-PartitionSecrets works; spot-check the generated template under cdk.out/ for the AWS Secrets Manager resource. Repeat the synth for Alpha001-demo, Alpha002-dev, Alpha002-stage to confirm all four partition templates render.

  • All files above committed.
  • make build passes locally; CDK synth for all four partitions produces a *-Secrets stack with a SentryScrubSalt resource.
  • CHANGELOG entry deferred to PR-open time (Phase 6).
  • Branch ready to push.

Phase 3 — arda-frontend-app: tracePropagationTargets

Section titled “Phase 3 — arda-frontend-app: tracePropagationTargets”

Goal. Explicit env-aware tracePropagationTargets configuration in the three Sentry init paths. Independent of all backend work.

This is the only piece of work that lives outside the projects/operations-sentry-worktrees/ tree — it is done in the main arda-frontend-app/ clone or in a separately-created worktree (caller’s choice).

PathChange
arda-frontend-app/src/lib/sentry/trace-propagation-targets.ts (or similar)NEW — helper exporting tracePropagationTargets(env: string): Array<string | RegExp>. Reads the existing API-client host-discovery mechanism to fill in the per-env backend host.
arda-frontend-app/src/instrumentation-client.tsAdd tracePropagationTargets: tracePropagationTargets(env) to Sentry.init.
arda-frontend-app/sentry.server.config.tsSame.
arda-frontend-app/sentry.edge.config.tsSame.
arda-frontend-app/CHANGELOG.mdDeferred to PR-open time (Phase 6).
Terminal window
make -C /Users/jmp/code/arda/arda-frontend-app ci

Or whatever the repo’s pre-push gate is. The ci target covers lint, typecheck, tests, and any FE-side VRT.

  • All files above committed on jmpicnic/operations-sentry in the FE clone.
  • make ci passes.
  • CHANGELOG entry deferred to PR-open time (Phase 6).

Phase 4 — operations: Helm + logback + arda-common bump + runBoundary audit

Section titled “Phase 4 — operations: Helm + logback + arda-common bump + runBoundary audit”

Goal. Adopt the new common-module SDK init via composite build, wire the Helm changes, update logback.xml, and audit system/batch/ for any out-of-request work that should use runBoundary. After this phase, operations builds against the local common-module via composite build and runs locally with helmInstallToLocal.

Path (under projects/operations-sentry-worktrees/operations/)Change
gradle/libs.versions.tomlUpdate arda-common-version to the version that the common-module PR will publish. (During local dev, the composite build picks up the local common-module automatically regardless of this version; the line is updated for the merge state.)
src/main/helm/values.yamlAdd oam.performance.sentry.sessions: { enabled } sub-object — single key, no mode or sampleRate (the JVM SDK does not expose those). Default enabled: false; per-env files turn it on.
src/main/helm/values-dev.yaml, values-stage.yamlsessions.enabled: true. No tracesSampleRate change.
src/main/helm/values-demo.yaml, values-prod.yamlsessions.enabled: true. Bump tracesSampleRate from "0.1" to "0.2" (sessions automatically inherit this rate).
src/main/helm/values-local.yaml, values-kyle.yamlNo change.
src/main/helm/templates/deployment.yamlWhen sentry.enabled && sessions.enabled, emit SENTRY_AUTO_SESSION_TRACKING. When sentry.enabled, also emit SENTRY_SCRUB_SALT via secretKeyRef to be-sentry-scrub-salt, key salt, optional: true.
src/main/helm/templates/secrets.yamlAdd the ExternalSecret for be-sentry-scrub-salt next to the existing be-sentry-dsn declaration; remoteRef.key is printf "%s-%s-SentryScrubSalt" .Values.global.infrastructure .Values.global.purpose, property: salt. The upstream secret is partition-scoped (Alpha002-dev-SentryScrubSalt), not namespace-scoped — using .Release.Namespace here would resolve to dev-operations and miss the AWS secret entirely.
src/main/resources/logback.xmlAdd <appender name="SENTRY" class="io.sentry.logback.SentryAppender"> with minimumEventLevel=ERROR, minimumBreadcrumbLevel=INFO. Add <appender-ref ref="SENTRY"/> on the root logger.
src/main/kotlin/cards/arda/operations/system/batch/**Audit. Look for any code path that launches work outside the Ktor request coroutine scope (GlobalScope.launch, Thread().start, scheduler callbacks). If the audit finds nothing, no code change in system/batch/; the helper ships from common-module for future use. If the audit finds out-of-request entry points, STOP and prompt the user before mass-wrapping — the scope of the change may warrant a separate PR or a design call. Record audit outcome in the PR description either way.
CHANGELOG.mdDeferred to PR-open time (Phase 6).
Terminal window
# Composite build against the local common-module
make -C /Users/jmp/code/arda/projects/operations-sentry-worktrees/operations build
# Helm lint
make -C /Users/jmp/code/arda/projects/operations-sentry-worktrees/operations lint

Then a local install + smoke test:

Terminal window
op run --env-file /Users/jmp/code/arda/projects/operations-sentry-worktrees/operations/1Password.env \
-- ./gradlew -p /Users/jmp/code/arda/projects/operations-sentry-worktrees/operations helmInstallToLocal

Confirm the pod starts and reports no Sentry-init errors in logs. With SENTRY_DSN empty (local), expect the SDK init to log a single “no DSN” warning then succeed; no Sentry events emitted.

  • All files above committed.
  • make build, make lint pass.
  • Local install confirms pod starts cleanly.
  • Audit outcome for system/batch/ recorded in the PR description.
  • gradle/libs.versions.toml arda-common-version matches the version common-module will publish.
  • CHANGELOG entry deferred to PR-open time (Phase 6).

Phase 5 — documentation: architectural page + rewritten how-to

Section titled “Phase 5 — documentation: architectural page + rewritten how-to”

Goal. Land Deliverables #6 (architectural reference) and #6b (implementer how-to). The architectural page consolidates the design intent from the spec set into a single published page; the how-to is the implementer-facing recipe.

Path (under projects/operations-sentry-worktrees/documentation/)Change
src/content/docs/current-system/oam/sentry-observability.mdNEW — architectural reference. Sections: Sentry’s role in the platform, agent + SDK coexistence, capture-path topology (reuse the PlantUML diagram from specification.md), session and release-health mechanics, FE/BE release-tag divergence, PII scrubbing posture, salt scoping (per-partition).
src/content/docs/process/craft/operations-and-monitoring/sentry-integration.mdREWRITE — implementer how-to. Sections: dependencies to add, SDK init wiring, Helm values to set, runBoundary adoption recipe, Logback appender XML wiring (the snippet from DT-008), PII-scrubbing test recipes, post-deploy verification using the Sentry MCP, Sentry org-side configuration section (toggles referenced in PDEV-535).
PR body## CHANGELOG section per documentation/CLAUDE.md PR-body model.
Terminal window
make -C /Users/jmp/code/arda/projects/operations-sentry-worktrees/documentation pr-checks

The pr-checks target runs link checking, preview build, smoke tests. All must pass before the PR is opened.

  • Both files committed.
  • make pr-checks passes.
  • PR body has a ## CHANGELOG section.

PR landing order matters for the deploy phase, but most PRs can be opened in any order. The minimum landing order is:

PlantUML diagram

Concrete sequence (solo single-session friendly)

Section titled “Concrete sequence (solo single-session friendly)”
  1. Write CHANGELOG entries per repo just before the push. Re-read workspace/instructions/claude/rules/changelog.md. For each repo, draft one consolidated entry that captures the intent and outcome of all commits on the branch — not a per-commit recitation. Section ordering within a release: Changed/Removed first, then Added/Deprecated, then Fixed/Security. For the back-end repos (common-module, infrastructure, operations, arda-frontend-app), this means editing CHANGELOG.md directly. For documentation, the entry goes in the PR body’s ## CHANGELOG block.
  2. Push all five branches (common-module, infrastructure, operations, documentation, arda-frontend-app) to their respective remotes.
  3. Open the common-module PR. Title and description per the repo’s PR template. Wait for CI green.
  4. Open the infrastructure PR in parallel with step 2. Wait for CI green.
  5. Open the documentation PR in parallel with step 2. Wait for pr-checks CI green. PR body carries ## CHANGELOG.
  6. Open the arda-frontend-app PR in parallel. Wait for CI green.
  7. Merge common-module PR first. The post-merge changelog-assembly workflow publishes a new arda-common version. Take note of the version string from the GitHub Release.
  8. Update operations’s gradle/libs.versions.toml to the new published arda-common-version. Push the update to the operations branch.
  9. Open the operations PR. Wait for CI green.
  10. Merge infrastructure PR. This does NOT automatically deploy; that’s Phase 7.
  11. Merge documentation PR.
  12. Merge arda-frontend-app PR. Amplify preview / deploy on merge.
  13. Hold the operations PR merge until Phase 7 has run for dev. This is the safest order — the operations deploy needs the salt secrets to exist, which Phase 7 ensures.

Use the release-lifecycle skill if any sequencing constraint becomes non-obvious during the operation (especially the version-bump step 7).

After PRs are merged. Each partition deploy is independent; verification happens against the dev partition first, then promotes through stage / demo / prod.

7.1 — Per-partition infrastructure deploys

Section titled “7.1 — Per-partition infrastructure deploys”

Apply the PartitionSecrets stack via amm.sh for each of the four partitions. The order does not matter; doing dev first is conventional for the verification step that follows.

Terminal window
# In each partition's AWS profile (Alpha002-Admin for dev/stage, Admin-Alpha1 for demo/prod)
./amm.sh Al1x Alpha002 dev deploy # → creates Alpha002-dev-PartitionSecrets stack + Alpha002-dev-SentryScrubSalt
./amm.sh Al1x Alpha002 stage deploy
./amm.sh Al1x Alpha001 demo deploy
./amm.sh Al1x Alpha001 prod deploy

Per-partition completion gate: aws secretsmanager describe-secret --secret-id Alpha002-dev-SentryScrubSalt --profile Alpha002-Admin returns a populated entry, and aws cloudformation describe-stacks --stack-name Alpha002-dev-PartitionSecrets --profile Alpha002-Admin shows the stack as CREATE_COMPLETE / UPDATE_COMPLETE with the Alpha002-dev-API-SentryScrubSaltArn CFN export populated.

7.2 — Operations deploy to dev + AC verification

Section titled “7.2 — Operations deploy to dev + AC verification”

Merge the held operations PR. CI runs and the deploy pipeline applies the operations Helm chart to dev. Once the pod is running, walk the AC verification procedures from verification.md, in this order:

ACTest
AC-1Pod started; env vars present (SENTRY_*); no init errors in logs; no Sentry.init calls in operations source.
AC-2Trigger kanban-internal-trigger. Confirm Sentry Issue with boundary: http, route tag, expected stack. Trigger kanban-invocation-trigger. Confirm no Sentry event.
AC-3Trigger kanban-composite-trigger. Confirm one event with wrapped_in_composite tag; underlying type is Internal.Infrastructure; no companion event.
AC-4Run traffic; observe Release Health tab; session count proportional to traffic at the configured rate.
AC-5Browse the deployed FE, trigger a known backend call, capture trace ID, query Sentry for spans in both projects.
AC-6Run sentry-pii-smoke fixture. Inspect event: opaque user.id, no PII, headers filtered, tenant_hash tag present, db.statement parameterised.
AC-7kubectl exec -it + jshell, raise an uncaught throwable on a fresh thread; confirm Sentry event tagged via: uncaught-handler.
AC-8Pre-flight: kubectl exec + grep logback.xml for SENTRY appender. Then run sentry-logback-smoke and sentry-duplication-smoke fixtures.
AC-9Browser-side: read Sentry.getClient().getOptions().tracePropagationTargets; confirm expected array.
AC-10Throwaway sandbox deploys to verify each off-switch (global off, sessions off, empty DSN).
AC-11Browse the deployed documentation site; confirm both pages exist and render.

Bruno fixtures to create as part of 7.2 if they don’t exist yet (in api-test/ repo):

  • kanban-internal-trigger — payload that induces an AppError.Internal.Implementation.
  • kanban-invocation-trigger — payload that induces an AppError.Invocation.NotAuthorized.
  • kanban-composite-trigger — payload that induces a Composite with one Internal.Infrastructure cause.
  • sentry-pii-smoke — payload with fake JWT, fake email, X-Tenant-Id header.
  • sentry-logback-smoke — exercises a log-and-continue code path.
  • sentry-duplication-smoke — exercises StatusPages + log path for the same throwable.

These are reusable as ongoing regression coverage; check them in under a new sub-collection on the operations-sentry api-test branch (separate Bruno PR if api-test is its own repo).

Record the outcome of each AC in a small verification-report.md alongside verification.md for traceability.

Repeat the operations deploy through the deploy pipeline for Alpha002-stage, Alpha001-demo, Alpha001-prod. Re-run a subset of the ACs (1, 4, 6) per environment. AC-5 (FE↔BE trace continuity) is most worth re-running in prod once real traffic is flowing.

After all ACs pass in prod:

  1. Promote roadmap content from roadmap/in-progress/operations-sentry/ to roadmap/completed/operations-sentry/ via the promote-to-roadmap skill. The workbook source stays under workbooks/notebooks/operations-sentry/ (source-of-process, not deleted).
  2. PDEV-535 (Sentry org-side configuration verification) is handled by the user during the infrastructure PR review window, not gated by project closure. The user walks through PDEV-535’s three scope items (Data Scrubbers / Default Scrubbers / Scrub IP Addresses on arda-systems.sentry.io), documents the toggle states on the how-to page (Deliverable #6b), and marks the ticket Done. This work is parallel to project closure; the project does not block on it.
  3. PDEV-533 remains open — accounts-component adoption is on its own timeline. Do not close the parent project waiting for PDEV-533.
  4. Remove worktrees under projects/operations-sentry-worktrees/:
    Terminal window
    for repo in workbooks common-module operations documentation infrastructure; do
    git -C /Users/jmp/code/arda/$repo worktree remove /Users/jmp/code/arda/projects/operations-sentry-worktrees/$repo
    done
  5. Delete local branches after PR merges:
    Terminal window
    for repo in workbooks common-module operations documentation infrastructure; do
    git -C /Users/jmp/code/arda/$repo branch -D jmpicnic/operations-sentry
    done
  6. Confirm the arda-frontend-app branch and worktree are also cleaned up (separate from the workspace tree).

Surfaced risks that the implementer should watch for during execution.

RiskProbabilityImpactMitigation
Sentry JVM SDK 8.41.0 property names differ from sketches (e.g. enableAutoSessionTracking vs isEnableAutoSessionTracking)MediumLowVerify against the SDK at code-write time; per-property fix.
Composite outer-context tag value (wrapped_in_composite: <message>) carries PIILowLowRoute the tag through PiiScrubber’s redaction in beforeSend (one extra map.forEach).
Programmatic appender attach unintentionally regressed (some past commit attached one)LowLowgrep for SentryAppender references in code before write; per-component XML is the documented path.
runBoundary audit reveals a long list of out-of-request work in system/batch/LowMediumIf found, scope-creep risk — surface to user before mass-wrapping; might warrant a separate PR.
Per-partition infrastructure deploy fails for one partition (e.g. CFN stack-name collision)LowMediumStop the rollout for that partition; investigate; do not skip a partition (would leave a silent gap for that purpose).
Sentry-side org configuration not in place at AC-6 time (PDEV-535 incomplete)MediumLowAC-6 is code-side scrubbing verification; org-side scrubbers are defence in depth. If PDEV-535 lags, AC-6 still passes; the project closure waits for PDEV-535 separately.
api-test Bruno fixtures take longer than expectedLowLowThe verification ACs can be exercised manually via curl + Sentry MCP if Bruno fixtures slip. Bruno is the preferred form for ongoing regression but not strictly required for first-time verification.
  • PDEV-533 (accounts-component adoption) — separate track, separate plan owned by whoever picks up that ticket.
  • Future alerting project — Sentry alert routes, SLO-based notifications, slow-route paging. Deferred per the project workbook’s “What we deliberately left open” section (in the workbooks repository, not published to this site).
  • EU-region Sentry instance, encrypted-at-rest event content, data-subject-deletion runbook — all deferred per the project’s out-of-scope list.
  • Sentry Logs ingest on the backend — possible follow-up, not in scope.
  • SandboxKyle002 — deprecated; not configured.
  • goal.md — the project’s framing and acceptance criteria.
  • decision-log.md — the eight Design Topics plus the PartitionSecrets cross-cutting design topic.
  • analysis.md — current-state survey and constraints.
  • requirements.md — testable requirements (R-NNN).
  • specification.md — code-level shape and PR sequencing diagram.
  • verification.md — per-AC procedures and the prerequisite checklist.
  • Linear: PDEV-533, PDEV-535.
  • Workbook (internal-process record, not published): workbooks/notebooks/operations-sentry/.