Skip to content

Implementation Plan: PDEV-490 Operations Performance Improvements

Author: Claude Opus for jmpicnic | Date: 2026-05-18 | Status: Draft

Implementation Plan: PDEV-490 Operations Performance Improvements

Section titled “Implementation Plan: PDEV-490 Operations Performance Improvements”

Single-agent execution roadmap for PDEV-490. This document does not redefine the work — it sequences and gates the tasks already specified in the specification and the verification matrix. Read the spec first; this plan tells you when to do each task, what must be true before you start it, and what must be true after you finish it.

  1. Read goal, analysis, requirements, specification, and verification in that order. They are the source-of-truth for what changes. This plan is the source-of-truth for when and how to sequence.
  2. Work the plan top-to-bottom. Each step has a precondition gate (must be true before starting) and a postcondition gate (must be true to advance). Do not advance past an unmet gate.
  3. Cross-reference each task to its spec section by anchor link. When in doubt about what changes in a task, follow the link.
  4. Verification acceptance criteria are tracked in verification.md. Update the Status column in that file as each AC moves from PendingVerified.
  • Single-agent. One operator (human + AI pair) drives all waves in sequence. No multi-agent parallelism.
  • Order: Wave 2 → (Wave 1+3 ‖ Wave 4) → Wave 5. Wave 2 (common-module release) ships first. Waves 1 and 3 collapse into a single combined operations PR; this PR and Wave 4 (documentation) can proceed in parallel — both depend on Wave 2’s spec stability, neither depends on the other’s merge. Wave 5 (dev failover test) runs after Wave 4 merges.
  • Why Waves 1 and 3 collapsed. Wave 1’s Flyway migration (V007, CREATE INDEX CONCURRENTLY) is the codebase’s first non-transactional migration. The common-module Wave 2 release ships .mixed(true) on DbMigration so the test container can apply the heterogeneous migration tree. Wave 1 therefore needs the Wave 2 commonModule pin bump (to get .mixed(true) for CI green), and Wave 3 brings the application.conf jdbcUrl flip that opts into wrapper behavior. Coupling them into one operations PR keeps the operations deploy atomic.
  • Merge gate on the operations PR: must not merge until common-module Wave 2’s release artifact has been published to the Gradle package registry that operations/gradle/libs.versions.toml resolves against. Work-in-progress is unblocked by using Gradle includeBuild('../common-module') locally to test the operations PR against the in-progress Wave 2 worktree (uncommitted local-only settings.gradle.kts change).

The five waves and their merge dependencies. The diagram captures the merge ordering — work-in-progress on Wave 3 and Wave 4 can overlap with their precondition wave’s review cycle as discussed in the goal Constraint #1.

PlantUML diagram

Verify these conditions are true before starting any wave. Each is a hard precondition for the plan; failure means stop and resolve.

#CheckCommand / sourceExpected
P1Documentation worktree contains the locally-committed consistency passgit -C documentation log --oneline -1Commit 479c60c or its successor on this branch
P2Operations / common-module / documentation worktrees existls /Users/jmp/code/arda/projects/product-slow-responses-worktrees/operations/, common-module/, documentation/ directories present
P3All three worktrees up-to-date with origin/maingit -C <worktree> fetch origin && git -C <worktree> rev-list --left-right --count HEAD...origin/mainLHS arbitrary (local commits OK); RHS = 0 (no upstream-only commits we don’t have)
P4Sentry baseline captured for Alpha001-prodanalysis.md § Measured baselineTable populated with 2026-05-19 figures
P5Linear PDEV-490 set to “In Progress”Linear UI / MCP mcp__claude_ai_Linear__save_issue with state: "In Progress"Manual step at the start of Phase 0
P6No active dependencies blocking the plan (PDEV-479, PDEV-488, PDEV-498, PDEV-500 all shipped)Goal § ContextAll four prerequisites shipped

Reference: specification.md § Phase 0.

Step 0.1 — Branch creation across worktrees

Section titled “Step 0.1 — Branch creation across worktrees”

The three code-repo worktrees are currently on the legacy jmpicnic/product-slow-responses branch. Create the PDEV-490 branches off fresh origin/main. Do this once, at the start of the project; subsequent waves reuse these branches.

For each of operations, common-module, documentation:

Terminal window
WT=/Users/jmp/code/arda/projects/product-slow-responses-worktrees/<repo>
git -C "$WT" fetch origin main
git -C "$WT" status --short # must be clean
git -C "$WT" checkout -b product-slow-pdev-490 origin/main
git -C "$WT" rev-parse --abbrev-ref HEAD # confirms branch

Precondition gate: P3 above (worktrees up-to-date) AND each worktree is clean (git status --short empty).

Postcondition gate: all three worktrees on branch product-slow-pdev-490 with git rev-parse HEAD matching origin/main of each repo.

Note on the documentation worktree. It already carries the locally-committed PDEV-490 spec artefacts on the legacy branch. Before creating the new branch, cherry-pick or merge the documentation commits onto the fresh product-slow-pdev-490 branch. Concretely: git -C documentation log --oneline jmpicnic/product-slow-responses ^origin/main lists the commits to move; the plan’s intent is that the new branch carries all of: the existing goal/analysis/requirements/spec/verification, today’s consistency-pass commit, and the upcoming plan + Task 4.1 / 4.2 / 4.3 changes.

Reference: specification.md § Task 0.2.

The baseline was captured on 2026-05-19 and is recorded in analysis.md § Measured baseline. No re-capture needed. Confirm the table is present and intact.

Postcondition gate: baseline table visible in analysis.md, dated 2026-05-19, covering both routes across Alpha001-prod / Alpha002-stage / Alpha002-dev.

Note. Wave 1 (DB-side: migrations + K12 cleanup) and Wave 3 (consumer wiring: common-module pin bump + application.conf) collapse into a single operations PR. Tasks are listed under their original wave headings (Wave 1 below, Wave 3 further down) for review clarity and traceability against the spec phases, but they all live in the same branch with one combined CHANGELOG. The PR is gated at merge time on Wave 2’s release artifact being published.

References: specification.md § Phase 1a, § Phase 1b, § Phase 3.

Branch: product-slow-pdev-490 in the operations worktree.

  • Pre-flight P1–P6 are true.
  • Step 0.1 done — operations worktree on product-slow-pdev-490 at origin/main.
  • make -C /Users/jmp/code/arda/projects/product-slow-responses-worktrees/operations clean build passes on the bare branch (sanity check that origin/main is healthy).
  • For local development: Gradle includeBuild('../common-module') is added to operations/settings.gradle.kts (uncommitted, local-only) so the operations build resolves against the in-progress common-module worktree. This lets the operations PR be authored before Wave 2 is published.
  • For merge: Wave 2’s release artifact has been published to the Gradle package registry that operations/gradle/libs.versions.toml resolves against.
#ActionSpec referenceVerification
W1.1Resolve next-available V* numbers in kanban-module and item-module migration treesTask 1b.1 notels operations/src/main/resources/{resources/kanban,reference/item}/database/migrations/
W1.2Author V007__kanban_card_bitemporal_indexes.sql and its sidecar .confTask 1a.1AC-IDX-K1
W1.3First-of-kind Flyway sidecar validationTask 1a.1 validation stepmake -C operations build picks up sidecar without error (intermediate gate; no AC)
W1.4Author V015__item_bitemporal_indexes.sql and its sidecar .confTask 1b.1AC-IDX-I1
W1.5Apply the coupled K12 cleanup to ServiceImpl.kt (cardsForItem) — flag flip AND flatMap/when removal in one commitTask 1a.2AC-K12-1
W1.6Tests: zero-row and multi-row cardsForItem integration testsTask 1a.3AC-K12-2, AC-K12-3
W1.6bWave 3 tasks (W3.1, W3.2, W3.3, W3.4 — see Wave 3 section below) execute here as part of the same PR.(covered by Wave 3 task rows)
W1.7Author the combined operations CHANGELOG.md entry covering both Wave 1 and Wave 3 scopeTask 1a.4, Task 3.5AC-CHG-1 (operations row)
W1.8Local pre-push gate: make -C operations clean build + lint + test + coverage (via includeBuild until Wave 2 published)dev-workflowsAC-BLD-2, AC-BLD-3
W1.9Push branch, open PRgh / GitHub MCPPR opened against origin/main
W1.10Run pr-steward until checks pass and reviewer comments resolvepr-steward workspace skillAll CI green, all threads resolved
W1.11Wait until Wave 2 release artifact is published; revert local includeBuild change in settings.gradle.kts; re-run CI against the published artifactGradle resolution succeeds against the published artifact
W1.12Merge(operator action)Merged to origin/main
W1.13Post-merge on dev: EXPLAIN ANALYZE on the bitemporal SELECTs; pg_stat_statements snapshot to confirm COUNT is gone; wrapper-initialisation pod log checkverification.mdAC-IDX-K2, AC-IDX-K3, AC-IDX-I2, AC-SQL-1, AC-SQL-2, AC-K12-4 (= AC-SQL-3 cross-reference), AC-RW-1, AC-RW-2
  • Combined operations PR merged to origin/main.
  • operations deployed to dev with the new migrations applied (Flyway migration_info shows V007 and V015 — or their re-numbered equivalents — as applied) and pods running on jdbc:aws-wrapper: URLs.
  • AC-IDX-K1, AC-IDX-K2, AC-IDX-K3, AC-IDX-I1, AC-IDX-I2, AC-K12-1, AC-K12-2, AC-K12-3, AC-K12-4, AC-SQL-1, AC-SQL-2, AC-SQL-3, AC-JDBC-6, AC-JDBC-7, AC-RT-4, AC-RW-1, AC-RW-2, AC-RW-3, AC-FO-4, AC-AUD-1, AC-AUD-2, AC-BLD-2, AC-CHG-1 (operations row) marked Verified in verification.md.

Wave 2 — common-module release (ships first)

Section titled “Wave 2 — common-module release (ships first)”

Reference: specification.md § Phase 2.

Branch: product-slow-pdev-490 in the common-module worktree.

Note. Wave 2 is the project’s first merge. The operations PR (Waves 1+3 combined) gates on Wave 2’s release artifact being published.

  • Pre-flight P1–P6 are true.
  • common-module worktree on product-slow-pdev-490 at origin/main (Step 0.1).
  • make -C common-module clean build passes on the bare branch.
#ActionSpec referenceVerification
W2.1Add aws-jdbc-wrapper dependency in libs.versions.toml + build.gradle.ktsTask 2.1AC-JDBC-1
W2.2DataSource wiring (jdbcUrl scheme, driver class, plugin pipeline, Aurora tuning, exceptionOverrideClassName)Task 2.2AC-JDBC-2, AC-JDBC-3, AC-JDBC-4, AC-JDBC-5
W2.3AppError.Transient sealed branch + three subtypesTask 2.3AC-TX-1
W2.4Extend normalizeToAppError for the three wrapper exception classes (including cause-chain unwrapping for ExposedSQLException)Task 2.4AC-TX-2
W2.5StatusPages renderer: HTTP 503 + Retry-After: 2 for AppError.TransientTask 2.5AC-TX-3, AC-TX-4
W2.6PoolConfig.maxAttempts / backoffMs + retry loop at inTransactionAsync / inTransactionSyncTask 2.6AC-RT-1, AC-RT-2, AC-RT-3
W2.7Remove decorative tenantId.index("TENANT_ID_INDEX") from AbstractScopedUniverseTask 2.7AC-DEC-1, AC-DEC-2
W2.7bAdd .mixed(true) to DbMigration.kt’s FluentConfiguration chainTask 2.7bAC-DEC-3
W2.7cAdd DataSource.close() pool-tracking + invoke from ContainerizedPostgres.stop() (test-infra pool leak fix)Task 2.7cAC-DEC-4
W2.8Integration tests: forced transient → HTTP 503 + Retry-After; one-shot transient absorbed by retry; DataSource additivity test (both scheme branches)Task 2.8AC-TX-3 (forced), AC-RT-2 (absorbed), AC-JDBC-3, AC-JDBC-3b, AC-JDBC-4, AC-JDBC-5
W2.9Author common-module CHANGELOG entry; call out jdbcUrl scheme break in ChangedTask 2.9AC-CHG-1 (common-module row)
W2.10Local pre-push gate: make -C common-module clean build + lint + test + coveragedev-workflowsAC-BLD-1, AC-BLD-3
W2.11Push branch, open PR, run pr-stewardgh / GitHub MCPPR opened, all CI green, threads resolved
W2.12Merge(operator action)Merged to origin/main
W2.13Tag release at the merge commit per Task 2.10(operator action)Release tag present; common-module release artifact published to the Gradle package registry
  • Wave 2 PR merged to origin/main in common-module.
  • Release artifact published — operations/gradle/libs.versions.toml can resolve the new common-module version.
  • AC-JDBC-1, AC-JDBC-2, AC-JDBC-3, AC-JDBC-3b (additivity test), AC-JDBC-4, AC-JDBC-5, AC-TX-1, AC-TX-2, AC-TX-3, AC-TX-4, AC-RT-1, AC-RT-2, AC-RT-3, AC-DEC-1, AC-DEC-2, AC-DEC-3 (mixed=true), AC-DEC-4 (pool teardown), AC-BLD-1, AC-CHG-1 (common-module row) marked Verified in verification.md.

Wave 3 — consumer wiring (ships in the combined operations PR)

Section titled “Wave 3 — consumer wiring (ships in the combined operations PR)”

Reference: specification.md § Phase 3.

Note. Wave 3 tasks ship in the same operations PR as Wave 1 (branch product-slow-pdev-490, not a separate stacked branch). Listed separately for review clarity. The combined PR’s merge gate is described in the Wave 1+3 § Precondition gate above.

#ActionSpec referenceVerification
W3.1Bump commonModule version pin in operations/gradle/libs.versions.toml to the Wave 2 release versionTask 3.1AC-JDBC-6
W3.2Update operations/src/main/resources/application.conf: switch dataSource.jdbcUrl to jdbc:aws-wrapper:postgresql://… scheme; add dataSource.pool.maxAttempts = 2 and dataSource.pool.backoffMs = 300Task 3.2AC-JDBC-7, AC-RT-4
W3.3make -C operations clean build; run all testsTask 3.3AC-BLD-2 (Wave 3 row), AC-RW-3, AC-FO-4
W3.4Re-confirm the operations-side SQLException-handler audit (zero hits); reference tenant_id audit table in PR descriptionTask 3.4AC-AUD-1, AC-AUD-2
W3.5Author Wave 3 CHANGELOG entry; call out HTTP 500 → HTTP 503 status-code remap for transient SQL failuresTask 3.5AC-CHG-1 (operations Wave 3 row)
W3.6Combined PR steps (push, pr-steward, merge gate, merge) are covered by the Wave 1+3 combined PR sequence: see W1.9 – W1.11 above.
W3.10Post-merge: monitor pod startup logs for wrapper initialisation; confirm read-only transactions land on reader endpoint via wrapper loggingverification.md § Read/write splittingAC-RW-1, AC-RW-2

Closure of Wave 3 work is folded into the Wave 1+3 combined PR’s postcondition gate above. Wave-3-specific ACs (AC-JDBC-6, AC-JDBC-7, AC-RT-4, AC-RW-1, AC-RW-2, AC-RW-3, AC-FO-4, AC-AUD-1, AC-AUD-2) are listed there.

Reference: specification.md § Phase 4.

Branch: product-slow-pdev-490 in the documentation worktree.

  • Specification stable. May proceed in parallel with the Wave 1+3 combined operations PR — does not need to wait for that PR’s merge.
  • Wave 4 must merge before Wave 5 (dev failover test) executes — the runbook is the test procedure.
  • Wave 4 does not need to wait for Wave 3 merge. However, if Wave 3 ships behavior that diverges from the spec, Wave 4 may need a follow-up commit to reconcile the runbook with reality (acceptable per goal Constraint #3).
#ActionSpec referenceVerification
W4.1Site pages: AWS JDBC Wrapper architecture; bitemporal composite indexing pattern; Flyway-authoritative-for-indexes convention; HTTP 503 + Retry-After error contractTask 4.1AC-DOC-1, AC-DOC-2, AC-DOC-3, AC-DOC-4
W4.2Runbooks: Aurora synthetic-failover test (dev) procedure; AWS JDBC Wrapper deploy notes; Aurora wrapper troubleshootingTask 4.2AC-DOC-5 (covers all three runbooks), AC-DOC-6
W4.3Update the goal / spec / plan to move PDEV-490 from in-progress to completed/deferred to project closure, NOT done in Wave 4. Wave 4 keeps the docs under in-progress/ so the PR is reviewable as a single self-contained change.(project closure step)
W4.4PR-body CHANGELOG section (per documentation repo convention)Task 4.3AC-CHG-1 (documentation row)
W4.5Local pre-push gate: make -C documentation pr-checksdev-workflowsAC-BLD-4
W4.6Push branch, open PRgh / GitHub MCPPR opened
W4.7Run pr-stewardAll CI green, threads resolved
W4.8Merge(operator action)Merged to origin/main
  • Wave 4 PR merged to origin/main in documentation.
  • AC-DOC-1, AC-DOC-2, AC-DOC-3, AC-DOC-4, AC-DOC-5, AC-DOC-6, AC-BLD-4, AC-CHG-1 (documentation row) marked Verified in verification.md.
  • The synthetic-failover runbook is reachable from the documentation site at its published URL.

Reference: specification.md § Phase 5.

Operational gate; no PR.

  • Wave 3 merged AND deployed to dev (operations component is running on the new wrapper).
  • Wave 4 merged (runbook is the test procedure).
#ActionSpec referenceVerification
W5.1Pre-test setup: confirm dev cluster topology; start steady-load probe against operations-dev; start Sentry timerTask 5.1Probe running, baseline 5xx rate ≈ 0
W5.2Trigger Aurora failover via the procedure documented in the runbookTask 5.2Failover initiated
W5.3Observe: HTTP 5xx window length, HTTP 503 vs HTTP 500 distribution, wrapper failover-detection latencyTask 5.3AC-FO-1, AC-FO-2, AC-FO-3
W5.4Record outcome in verification.md and post a closure comment on PDEV-509Task 5.4PDEV-509 closed
  • 5xx window on dev ≤ 5 seconds (NFR-010).
  • HTTP 503 dominates the 5xx window; HTTP 500 ≈ 0 (NFR-011).
  • Failover detection latency ≤ ~5 s (NFR-012).
  • AC-FO-1, AC-FO-2, AC-FO-3 marked Verified in verification.md.

After Wave 5 passes on dev, promote operations and the new common-module release through demo → stage → prod using the standard per-environment soak windows. Each environment carries a 24–48 hour soak per Arda release lifecycle convention.

EnvironmentSoak windowVerification
Alpha002-devcovered by Wave 5 + standard dev burn-inNFR-010, NFR-011, NFR-012
Alpha001-demo24 h soakNo new 5xx spike attributable to the deploy
Alpha002-stage24 h soakNo new 5xx spike; Sentry transaction-duration trend on cardsForItem matches dev’s improvement shape
Alpha001-prod48 h soak + 7-day rolling-window checkNFR-001 (cardsForItem p50 ≤ 250 ms / p95 ≤ 1,000 ms over rolling 7-day window) — AC-PERF-1

AC-PERF-2 (listWithDetails p95, ties to NFR-002) is explicitly conditional on PDEV-489 also shipping the front-end consolidation; PDEV-490 alone is not expected to satisfy NFR-002’s p95. AC-PERF-2 stays at Pending (conditional on PDEV-489) in verification.md until both projects have shipped.

When all Waves 1–5 have closed and promotion through prod has completed:

#ActionOwnerVerification
C1Confirm all entries in verification.md are Verified (or Verified (pending …) with an explicit out-of-scope dependency)OperatorMatrix scan
C2Move the project from documentation/src/content/docs/roadmap/in-progress/product-slow-responses/pdev-490/ to …/roadmap/completed/product-slow-responses/pdev-490/; update frontmatter maturity from draft to published and the project’s roadmap statusOperatorFiles moved, links unbroken (re-run make -C documentation pr-checks after the move)
C3Close Linear PDEV-490 with a summary comment linking to the four PRs and the verification matrixOperatorPDEV-490 status Done
C4Close Linear PDEV-509 (absorbed by PDEV-490)OperatorPDEV-509 status Done with cross-reference to PDEV-490
C5Confirm Linear PDEV-534 (transactionIsolation evaluation) and PDEV-536 (redundant item indexes) remain open as the agreed follow-upsOperatorBoth tickets in Triage or Backlog, not closed
C6Remove the PDEV-490 worktrees and delete the local branchesOperatorgit -C <main-clone> worktree list no longer shows the PDEV-490 worktrees
C7Notice on workbook: append a short close-out note to workbooks/notebooks/product-slow-responses/pdev-490/ recording that the project shipped, with pointers to the four merged PRsOperatorNote present

Per-wave rollback is the standard “revert merge” path:

WaveRollback actionRisk
W1Revert Wave 1 PR; Flyway forward-only — re-issue DROP INDEX CONCURRENTLY IF EXISTS … as a new migration. Do not retroactively delete migration files from the tree.Medium — composite index drops are non-blocking but reintroduce the wasted-COUNT regression on cardsForItem if the K12 cleanup is also reverted; spec couples them.
W2Revert Wave 2 PR; consumers (operations Wave 3 not yet merged) are unaffected. Republish the previous common-module release if Wave 3 has already pinned forward.High once Wave 3 has merged — Wave 3 depends on Wave 2’s API shape; rolling back Wave 2 requires also reverting Wave 3.
W3Revert Wave 3 PR; pods redeploy on the legacy jdbc:postgresql://… scheme using the pre-bump common-module pin.Low — fully reversible without DB-side rollback.
W4Revert Wave 4 PR; doc pages disappear from the site. No functional impact.Low.
W5The dev failover test is informational. Failure means stop promotion, file a bug ticket against the wrapper integration, and triage. No rollback action needed against Waves 1–4.N/A.

If any wave’s postcondition gate fails, stop, capture the diagnostic state (logs, Sentry, pg_stat_statements), and resolve before advancing. Do not advance past a failed gate by skipping verification.

None remaining at plan time — the four pre-implementation questions identified during specification authoring (AppError wrapping pattern, StatusPages Retry-After injection, item composite index choice, Flyway sidecar syntax) were resolved on 2026-05-19. See specification.md § Open implementation questions for the resolved set.

If new questions arise during execution, record them in specification.md § Open implementation questions as new entries, not in this plan.


Copyright: (c) Arda Systems 2025-2026, All rights reserved