Phase 4 — Run Choreography
1. Overview
Section titled “1. Overview”Phase 4 is decomposed into five runs, sequenced as a 5-PR rollout: run-1, run-2, run-3, run-6, run-7. Numbers 4 and 5 are intentionally vacant — the original run-3-stage-rollout, run-4-demo-rollout, and run-5-prod-rollout plans were consolidated into a single run-3-operator-cascade per DQ-R1-026; the numbers are kept to avoid churning existing references in PR #462’s CHANGELOG entry, specification.md, and verification.md. Each run is launched independently via the launch-team skill; the user is the choreographer between runs (operator gates, Postmark Compliance reply, per-partition cdk diff review). The decomposition rationale is in evaluation.md; the per-run plans are in runs/.
| Run | Branch / PR | Scope | Working dir | AWS impact |
|---|---|---|---|---|
run-1-workspace-refactors | jmpicnic/email-integration-phase-4 (PR #455) | G-A: construct generalisation, byte-identity guard, accessor, reserved-words extension, helper extraction | phase-4/infrastructure (+ docs verification entry) | Synth-only + Root read-only diff |
run-2-dev-rollout | jmpicnic/email-integration-phase-4-run-2 (PR #462; based on Run-1’s branch, rebases onto main when Run-1 merges) | G-B+C+D code for all four active partitions (dev, stage, demo, prod) — platforms.ts carries all four mail blocks; PartitionEmailStack + Pre-Deploy CLI + amm.sh step land on main. No partition deploys happen in this run — operator-deploys move to Run-3 per DQ-R1-026. | phase-4/infrastructure-run-2 | Synth-only at this run’s boundary; resource-touching happens in Run-3 |
run-3-operator-cascade | jmpicnic/email-integration-phase-4-run-3 (base main, auto-retargets when Run-2 merges) | Operator-driven cascade across dev → T-O4 Postmark Compliance reply → {stage || demo} → prod. One PR captures CHANGELOG + accumulated cdk.context.json updates from all four partition deploys + any code fixes that emerge during execution. Execution log in runs/run-3-operator-cascade/execution-log.md. | phase-4/infrastructure-run-3 | Resource-touching in Alpha002 (dev, stage sub-zones, secrets, roles) and Alpha001 (demo, prod sub-zones, secrets, roles). Production deploy included. |
run-6-drift-workflow | jmpicnic/email-integration-phase-4-run-6 (base main; soft dep — needs ≥ 1 partition live to exercise) | G-E: runtime-platform-drift.yml + driver + shared utility extraction | phase-4/infrastructure-run-6 | None |
run-7-documentation | jmpicnic/email-integration-phase-4 (documentation worktree; consolidates docs commits from Runs 1-3, 6) | G-F: current-system retrofit, rotation runbook, secret-delivery-pattern.md content fill, docs CHANGELOG | phase-4/documentation | None |
The infra runs form a mostly-linear PR series: Run-1 → Run-2 → Run-3, plus Run-6 stacked off main after Run-2 ships dev. Run-3 itself is sequential at the cascade level (dev first, then stage and demo may interleave, then prod last) but produces a single PR rather than three.
Per the project-decomposition skill, this document captures the cross-run choreography: the sequencing graph, the operator-gate handoffs, and the artifact dependencies that the per-run project-plan.md files alone cannot express.
2. Setup phase — harness prompt minimisation
Section titled “2. Setup phase — harness prompt minimisation”Before launching run-1, perform the following one-time setup to minimise harness permission prompts triggered by bash command-shape variants during validation. Each is a sunk cost paid once; the savings compound across the seven runs and the per-partition validate-exit.sh invocations.
2.1 .claude/settings.local.json allowlist patches
Section titled “2.1 .claude/settings.local.json allowlist patches”Add the following patterns to the project-level settings (preferred) or to the user’s ~/.claude/settings.json. Group them under a clear comment so future maintainers see why they were added.
{ "permissions": { "allow": [ "Bash(bash */validate-exit.sh*)", "Bash(dig *)", "Bash(aws cloudformation describe-stacks*)", "Bash(aws cloudformation get-template*)", "Bash(aws cloudformation list-exports*)", "Bash(aws secretsmanager describe-secret*)", "Bash(aws iam get-role*)", "Bash(aws sts get-caller-identity*)", "Bash(gh pr view *)", "Bash(gh pr checks *)", "Bash(gh run view *)", "Bash(gh workflow run *)", "Bash(git -C * *)", "Bash(npm --prefix * *)", "Bash(make -C * *)" ] }}Verify before launching run-1 by running bash plan/runs/run-1-workspace-refactors/validate-exit.sh and confirming zero ad-hoc permission prompts during execution.
2.2 Wrapper script conventions
Section titled “2.2 Wrapper script conventions”Every validate-exit.sh follows the same shape to keep harness-visible bash invocations uniform:
#!/usr/bin/env bashset -euo pipefail
PASS=0FAIL=0TOTAL=<n>
check() { local desc="$1" cmd="$2" expected="$3" if result=$(eval "$cmd" 2>&1); then if [[ "$result" == *"$expected"* ]]; then echo "PASS: $desc"; ((PASS++)) else echo "FAIL: $desc (expected '$expected', got '$result')"; ((FAIL++)) fi else echo "FAIL: $desc (command failed: $result)"; ((FAIL++)) fi}
# Entry / Exit checks below ...
[[ $FAIL -eq 0 ]] && echo "ALL CHECKS PASSED" || { echo "SOME CHECKS FAILED"; exit 1; }The agent invokes the script with one Bash tool call; the script internally runs all dig / aws / gh checks without each one being a separately permission-gated harness call.
2.3 Command-shape standards
Section titled “2.3 Command-shape standards”The following standards apply across all run plans, validate-exit scripts, and operator runbooks. They are enforced both by the new ESLint rules landed in PR #454 (no-cd-in-shell, no-aws-profile-prefix) and by reviewer convention.
git -C <absolute-path> <subcommand>— nevercd <path> && git ....npm --prefix <absolute-path> run <script>— nevercd <path> && npm ....make -C <absolute-path> <target>— nevercd <path> && make ....aws --profile <name> <command>— neverAWS_PROFILE=<name> aws ....- Absolute worktree paths inside script bodies; positional args for partition / infrastructure / profile.
2.4 Setup exit criteria
Section titled “2.4 Setup exit criteria”.claude/settings.local.json(or equivalent) contains the patterns above.bash plan/runs/run-1-workspace-refactors/validate-exit.sh --dry-run(if supported) completes without prompts.- No
cd <path>form appears in anyvalidate-exit.sh(verifiable bygrep -rn '^cd ' plan/runs/).
Once these criteria hold, run-1 may launch.
3. Run-sequence DAG
Section titled “3. Run-sequence DAG”The 5-run dependency graph mirrors analysis.md § 13.1 lifted from group level to run level. Hard edges block the downstream run until the upstream run is merged and exit criteria pass; soft edges allow parallel authoring but verification still serialises.
The diagram below shows the run dependencies. run-1 is a hard prerequisite for run-2 (the construct generalisation + tools/lib/ helpers are imported by run-2’s code) and a soft prerequisite for run-6 (drift driver also imports from tools/lib/). run-2 (code for all four partitions) is a hard prerequisite for run-3 (operator cascade), which then deploys the four partitions in the partial-order dev → {stage || demo} → prod per DQ-R1-021 — that ordering now lives inside Run-3 (see § 4.1 below) rather than between three separate runs. run-6 is a soft dependency from run-2 onwards (drift probes need at least one partition live, which Run-3’s first cascade entry provides). run-7 documentation lands last so the docs reflect what was built.
The diagram below shows the five Phase-4 runs and their dependency edges. Solid arrows are hard dependencies that block until the upstream run merges and verifies; dashed arrows are soft dependencies that allow parallel authoring with serialised post-merge verification.
4. Operator gates between runs
Section titled “4. Operator gates between runs”These are the human-in-the-loop steps the user performs to advance from one run to the next. Each gate is the natural pause point that drove the decomposition recommendation in evaluation.md.
| Between | Operator action | Required artefact |
|---|---|---|
| run-1 → run-2 | Run T-O2 (Root no-drift verification) against deployed RootConfiguration. Confirm empty cdk diff. Record in verification sign-off table. | Empty cdk diff output captured |
| run-2 → run-3 | Review cdk diff for each of the four {infra}-{partition}-Email stacks in PR #462’s description; approve before merge. Confirm the four mail blocks in platforms.ts cover dev / stage / demo / prod and the kyle partition is excluded. | PR #462 merged to main; reviewer approval recorded |
| run-3 → run-6 | At least the dev partition entry of Run-3’s cascade verified end-to-end (CFN exports populated; Postmark Console verified). Drift workflow probes need at least one partition live to exercise — opening Run-6 before that returns no useful data. | Run-3 execution-log § dev verification cleared |
| run-3 → (end of partition rollout) | All four partition cascade entries verified end-to-end (Postmark Console verified for all four; dig checks green; CFN exports populated). T-O4 outcome recorded (arda-nonprod approval received OR more-evidence-needed path documented). Run-3 infra PR merged with the accumulated CHANGELOG, cdk.context.json, and any code fixes. | Run-3 cascade-summary table all green; PR merged |
| run-6 → run-7 | Manually trigger runtime-platform-drift.yml via workflow_dispatch. Confirm no spurious issue opened. Sign off T-O8. | First-run workflow log captured |
| run-7 → completion | make pr-checks green on the documentation PR; technical-writer review findings addressed. | Docs PR merged |
4.1 Within-run cascade (Run-3 only)
Section titled “4.1 Within-run cascade (Run-3 only)”Run-3’s cascade is sequenced inside the single PR, not between runs. The operator advances through the four partitions in the partial-order dev → {stage || demo} → prod per DQ-R1-021. Each cascade entry is a self-contained block in the execution log:
| Cascade entry | Operator action | Required artefact |
|---|---|---|
(cascade entry) dev | Pre-flight (T-O1-dev) → ./amm.sh --profile Alpha002-Admin Alpha002 dev (T-O5) → verify dig + Postmark Console + CFN exports. Record cdk.context.json block for partitionMail:Alpha002:dev. | Execution-log § dev verification cleared; V-OPS-005-dev populated |
| (between entries) T-O4 | Reply to Postmark Compliance ticket #11236089 with dev.ardamails.com verified-domain evidence. Wait for Postmark response (approved OR more-evidence-needed). | Email artefact captured; arda-nonprod approval status recorded |
(cascade entry) stage | Pre-flight (T-O1-stage) → ./amm.sh --profile Alpha002-Admin Alpha002 stage (T-O5) → verify. Independent of demo; either order valid. | Execution-log § stage verification cleared; V-OPS-005-stage populated |
(cascade entry) demo | Pre-flight (T-O1-demo) → switch AWS profile to Admin-Alpha1 → ./amm.sh --profile Admin-Alpha1 Alpha001 demo (T-O6) → verify. Independent of stage. | Execution-log § demo verification cleared; V-OPS-005-demo populated |
| (between entries) Production deploy confirmation | Both stage and demo verified. Operator records explicit production-deploy confirmation in the execution log before opening the prod entry. | Execution-log § production-deploy-confirmation signed off |
(cascade entry) prod | Pre-flight (T-O1-prod) → ./amm.sh --profile Admin-Alpha1 Alpha001 prod (T-O7) → verify with extra care (production deploy). The --profile flag is required for every partition — amm.sh’s default profile-derivation (Admin-${infrastructure}) does not match either Phase-4 profile name. | Execution-log § prod verification cleared; V-OPS-005-prod populated |
A partition cascade entry that fails mid-execution is captured in the execution log and the operator addresses the cause (code fix lands in the same PR; environmental issue addressed and partition re-attempted). The cascade resumes from the failed partition; successfully-deployed prior partitions are not rolled back (per-partition isolation per DQ-R1-021 still holds at the resource level). See project-plan.md § Retreat path for the full procedure.
run-6 (drift workflow) can be authored in parallel with the Run-3 cascade once the dev cascade entry is verified — drift probes need at least one partition’s state, which Run-3’s first entry provides. Run-6 does not wait for Run-3’s cascade to complete; its PR can land any time after Run-3’s dev entry is verified.
5. Artifact dependencies
Section titled “5. Artifact dependencies”| Producer | Artefact | Consumer(s) | Form |
|---|---|---|---|
| run-1 | Generalised AllowCreatingNSRecordsRole construct (renamed) + postmarkCredentialOpReference accessor + tools/lib/* helpers + reserved-words list entries | run-2 (CDK code imports + tools script) | TypeScript imports + module exports |
| run-2 | PartitionEmailStack class file + apps/Al1x/partition.ts instantiation + register-partition-mail-signature.ts entry script + amm.sh partition-mail step + four mail blocks in platforms.ts covering dev / stage / demo / prod | run-3 (operator deploys consume this code; no further code-only PRs are needed) | Source files on main |
| run-3 | Per-partition live infrastructure for all four active partitions: {partition}.ardamails.com zones + NS-delegations + SPF + DMARC + DKIM + Return-Path records + two SM secrets per partition + two IAM roles per partition | Drift workflow (run-6) probes; Phase 5b consumes via CFN exports | Live AWS resources |
| run-3 | Postmark Sender Signatures: dev.ardamails.com and stage.ardamails.com on PostmarkNonProd; demo.ardamails.com and prod.ardamails.com on PostmarkProd | Phase 5b email module sends through these Signatures | Postmark API state |
| run-3 | cdk.context.json populated with one partitionMail:<infra>:<partition> block per partition (DKIM selector, DKIM public key, Return-Path target) | Future cdk synth runs (CI + local); subsequent re-runs of amm.sh are idempotent against this state | Committed file on main |
| run-3 | Postmark Compliance ticket #11236089 outcome captured (T-O4) | REQ-OPS-004 satisfied — arda-nonprod approval status known | Email-thread artefact in execution log |
| run-6 | runtime-platform-drift.yml workflow + driver + extracted tools/lib/drift/ helpers | Scheduled drift checks; future runtime-platform drift checks beyond email; corporate-drift regression-tested with extracted helpers | Workflow + module exports |
| run-7 | Filled-in secret-delivery-pattern.md + per-partition mail pages in current-system/runtime/ + Postmark-service multi-Signature updates + encryption-key rotation runbook + docs CHANGELOG entry | Future maintainers; Phase 5b authors; operators | Markdown pages |
6. Rollback semantics
Section titled “6. Rollback semantics”Each run’s failure mode and recovery path:
| Run | Failure mode | Recovery |
|---|---|---|
| run-1 | T-I2 byte-identity test fails on PR → cannot merge | Diagnose construct change → re-author T-I1 → re-run test. No deployed AWS state to roll back. |
| run-1 | T-O2 post-merge Root drift detected | Stop. Investigate Root drift cause (Phase 4 construct change OR external drift unrelated). Resolve before any partition deploy. |
| run-2 | Per-partition synth test fails | Diagnose the failing partition; fix PartitionEmailStack / platforms.ts / apps/Al1x/partition.ts; re-run tests. No deployed AWS state. |
| run-2 | Reviewer flags an issue in cdk diff for one of the four partitions | Fix in the same PR; re-trigger synth + tests. Code is not merged until all four partitions’ synthesised templates are reviewed. |
| run-3 | Pre-Deploy CLI step (register-partition-mail-signature.ts) fails on a partition | Idempotent re-run after fixing root cause; no partial AWS state at that partition. Capture in execution-log § <partition> / Notes. |
| run-3 | cdk deploy fails on a partition | CFN rolls back the stack. Investigate; re-run amm.sh for that partition once cause resolved. Prior partitions are unaffected (per-partition isolation per DQ-R1-021). Capture in execution-log § <partition> / Notes. |
| run-3 | T-O4 Postmark Compliance reply doesn’t unlock arda-nonprod after dev | Operator follows Postmark Support’s direction; may need to provision additional Signatures or wait for review. Update REQ-OPS-004 documented assumption. The stage cascade entry waits until the outcome is clear; demo and prod cascade entries are not blocked (different Postmark account). |
| run-3 | A partition deploy surfaces a code-level issue | Fix lands in the same Run-3 infra PR (no separate PR cycle). Capture in execution-log § Code-fixes-that-surfaced. Re-run that partition’s amm.sh once the fix is on the branch. |
| run-3 | Non-recoverable partition failure within cycle (e.g., extended Postmark back-and-forth blocks stage) | Operator may close Run-3 with the partition unverified; PR captures only the verified partitions. Open a follow-up run for the unverified ones, documented explicitly in the execution log. |
| run-6 | Drift workflow fails on first scheduled run | Inspect logs; tune probe thresholds; re-run via workflow_dispatch. No AWS state change. |
| run-7 | make pr-checks fails | Fix the offending page; re-run locally. No AWS state change. |
7. Phase 4 completion criteria
Section titled “7. Phase 4 completion criteria”Phase 4 is complete when all of the following hold:
- Runs 1, 2, 3, 6, 7 each have their
validate-exit.shexit 0. - All four active partition mail sub-zones live and delegated (REQ-PART-001..006 satisfied per
../design/verification.mdV-PART-001..005). - Each partition’s Postmark Sender Signature registered and verified (REQ-PART-007..010, V-PART-007..010).
- Per-partition encryption-key SM secret exists with
RemovalPolicy.RETAIN(REQ-PART-014, V-PART-014). - Per-partition Postmark account-token SM secret exists, populated via δ.1 (REQ-PART-011, V-PART-011).
- All six
-API-CFN exports per partition (REQ-PART-002, 012, 015, 018, 020 + zone-name). - Both per-partition IAM roles exist (REQ-PART-017, 019; V-PART-017, 019).
-
arda-nonprodPostmark account approval received OR Postmark reply requesting more evidence captured (REQ-OPS-004). -
runtime-platform-drift.ymlhas completed at least one successful scheduled run (REQ-CI-001, V-CI-001). - Root account’s
RootDnsStackproduces byte-identical CFN post-Phase-4 (REQ-IAC-002, V-IAC-002). - Documentation deliverables landed (REQ-DOC-001..004, V-DOC-001..004).
- Operator sign-off table in
../design/verification.mdfully populated. - All five PRs merged to
mainon their respective repositories (Run-3’s single cascade PR closes what was originally three).
8. Agent skills consulted
Section titled “8. Agent skills consulted”Orchestration metadata for the team-lead spawning agents per run. Skills below are consulted on demand, not bulk-loaded.
Skill names below correspond to workspace skill directories under workspace/instructions/claude/skills/<name>/. Some have public documentation pages on the Arda docs site; most are agent-only and resolved via the resolve-doc-page.sh helper at skill-load time.
| Skill | Loaded by | Used in |
|---|---|---|
cdk-infrastructure | devops-engineer | All CDK construct / stack / app work (runs 1–6) |
typescript-coding | devops-engineer | tools/ scripts and tools/lib/ helpers (runs 1, 2, 6) |
unit-tests-infra | devops-engineer | CDK Template-matcher test surface (runs 1, 2, 6) |
path-conventions | All personas | Cross-system doc links and relative paths |
document-writing | technical-writer | New current-system/ pages and runbook (run 7) |
pr-steward | All personas | Landing each PR (runs 1–7) |
project-decomposition | Team-lead / user | Authoring this plan/ tree (already applied) |
9. Continuous-improvement observer
Section titled “9. Continuous-improvement observer”The Team Lead spawns the CI Observer at project start. The observer collects observations through runs 1–7 — repeated errors, slow iterations, friction patterns, deviations from the plan — and produces continuous-improvement-proposal.md at the project root at close. The proposal feeds the improvement-analyzer skill (run by the user, post-Phase-4) which decides which proposals become skill updates, agent updates, or template changes.
The CI Observer runs in the background for the full Phase 4 lifecycle. Each run’s validate-exit.sh is a natural data point: when a check fails, the observer notes the cause (script bug? convention violation? agent confusion?) and tags it.
10. Project closure (post-run-7)
Section titled “10. Project closure (post-run-7)”Once Phase 4 completion criteria (§ 7) hold, perform the lifecycle wrap-up:
10.1 Implementation byproducts
Section titled “10.1 Implementation byproducts”Author the following files under 4-runtime-platform-updates/implementation/ before retiring the project directory:
| File | Description |
|---|---|
learnings.md | Durable insights from Phase 4 implementation — patterns, surprises, codebase lessons that should outlive the project. |
suggestions.md | Forward-looking improvements for Phase 5a / 5b and beyond — work that surfaced as worth doing but is out of Phase 4 scope. |
phase-a-deploy.md | Run-1 outcomes; Root no-drift verification result. |
phase-b-deploy.md | Run-2 code-rollout outcomes — cdk diff summary across the four {infra}-{partition}-Email stacks; reviewer approval; merge record. |
phase-b-cascade-outcomes.md | Run-3 operator cascade outcomes; consolidated record of dev / stage / demo / prod deploys. Distilled from runs/run-3-operator-cascade/execution-log.md into a durable summary. |
phase-c-deploy.md | Run-6 outcomes; first scheduled drift run results. |
phase-d-deploy.md | Run-7 outcomes; documentation review findings. |
continuous-improvement-proposal.md (project root) | Output of the CI Observer + improvement-analyzer consolidating structural improvements identified throughout Phase 4. |
10.2 Move project to roadmap/completed/
Section titled “10.2 Move project to roadmap/completed/”Per the project lifecycle convention, move the 4-runtime-platform-updates/ directory:
git -C /Users/jmp/code/arda/documentation mv \ src/content/docs/roadmap/completed/email-integration/4-runtime-platform-updates \ src/content/docs/roadmap/completed/email-integration/4-runtime-platform-updatesUpdate any inbound references in roadmap/completed/email-integration/ and verify make pr-checks still passes.
10.3 Worktree cleanup
Section titled “10.3 Worktree cleanup”Remove the Phase 4 worktrees and local branches once all PRs are merged:
# Remove all Phase 4 infrastructure run worktrees + the documentation worktree.# Per DQ-R1-026, Run-3 is the only operator-deploy run; numbers 4 and 5 are vacant.for n in 2 3 6; do git -C /Users/jmp/code/arda/infrastructure worktree remove \ /Users/jmp/code/arda/projects/email-integration-worktrees/phase-4/infrastructure-run-$n git -C /Users/jmp/code/arda/infrastructure branch -d jmpicnic/email-integration-phase-4-run-$ndonegit -C /Users/jmp/code/arda/infrastructure worktree remove \ /Users/jmp/code/arda/projects/email-integration-worktrees/phase-4/infrastructure # Run-1 worktreegit -C /Users/jmp/code/arda/infrastructure branch -d jmpicnic/email-integration-phase-4git -C /Users/jmp/code/arda/documentation worktree remove \ /Users/jmp/code/arda/projects/email-integration-worktrees/phase-4/documentationgit -C /Users/jmp/code/arda/documentation branch -d jmpicnic/email-integration-phase-4rmdir /Users/jmp/code/arda/projects/email-integration-worktrees/phase-4The phase-5a/* and phase-5b/* worktrees stay in place — they continue with their own phase work.
10.4 Cross-link from Phase 5b
Section titled “10.4 Cross-link from Phase 5b”Verify that ../../5b-email-module/pre-existing-decisions.md’s references to Phase 4 resolve correctly post-move. The cross-links to DQ-R1-019, DQ-R1-020, DQ-R1-023, and the per-partition -API- exports must point at the new roadmap/completed/ location.
11. References
Section titled “11. References”evaluation.md— decomposition assessment + recommendation.- Per-run plans:
runs/run-1-workspace-refactors/project-plan.md,runs/run-2-dev-rollout/project-plan.md,runs/run-3-operator-cascade/project-plan.md,runs/run-6-drift-workflow/project-plan.md,runs/run-7-documentation/project-plan.md. ../design/specification.md— Phase 4 task contract.../design/analysis.md— capability decomposition + group-level DAG.../design/verification.md— operator sign-off table.../../decision-log.md— DQ-R1-021 partition order, DQ-R1-022 operator surface, DQ-R1-026 (consolidation rationale for collapsing original Runs 3/4/5 into Run-3 operator cascade).process/craft/analysis-and-design/project-decomposition.md— canonical decomposition skill.
Copyright: (c) Arda Systems 2025-2026, All rights reserved
Copyright: © Arda Systems 2025-2026, All rights reserved