Skip to content

Phase 4 — Run Choreography

Phase 4 is decomposed into five runs, sequenced as a 5-PR rollout: run-1, run-2, run-3, run-6, run-7. Numbers 4 and 5 are intentionally vacant — the original run-3-stage-rollout, run-4-demo-rollout, and run-5-prod-rollout plans were consolidated into a single run-3-operator-cascade per DQ-R1-026; the numbers are kept to avoid churning existing references in PR #462’s CHANGELOG entry, specification.md, and verification.md. Each run is launched independently via the launch-team skill; the user is the choreographer between runs (operator gates, Postmark Compliance reply, per-partition cdk diff review). The decomposition rationale is in evaluation.md; the per-run plans are in runs/.

RunBranch / PRScopeWorking dirAWS impact
run-1-workspace-refactorsjmpicnic/email-integration-phase-4 (PR #455)G-A: construct generalisation, byte-identity guard, accessor, reserved-words extension, helper extractionphase-4/infrastructure (+ docs verification entry)Synth-only + Root read-only diff
run-2-dev-rolloutjmpicnic/email-integration-phase-4-run-2 (PR #462; based on Run-1’s branch, rebases onto main when Run-1 merges)G-B+C+D code for all four active partitions (dev, stage, demo, prod) — platforms.ts carries all four mail blocks; PartitionEmailStack + Pre-Deploy CLI + amm.sh step land on main. No partition deploys happen in this run — operator-deploys move to Run-3 per DQ-R1-026.phase-4/infrastructure-run-2Synth-only at this run’s boundary; resource-touching happens in Run-3
run-3-operator-cascadejmpicnic/email-integration-phase-4-run-3 (base main, auto-retargets when Run-2 merges)Operator-driven cascade across dev → T-O4 Postmark Compliance reply → {stage || demo}prod. One PR captures CHANGELOG + accumulated cdk.context.json updates from all four partition deploys + any code fixes that emerge during execution. Execution log in runs/run-3-operator-cascade/execution-log.md.phase-4/infrastructure-run-3Resource-touching in Alpha002 (dev, stage sub-zones, secrets, roles) and Alpha001 (demo, prod sub-zones, secrets, roles). Production deploy included.
run-6-drift-workflowjmpicnic/email-integration-phase-4-run-6 (base main; soft dep — needs ≥ 1 partition live to exercise)G-E: runtime-platform-drift.yml + driver + shared utility extractionphase-4/infrastructure-run-6None
run-7-documentationjmpicnic/email-integration-phase-4 (documentation worktree; consolidates docs commits from Runs 1-3, 6)G-F: current-system retrofit, rotation runbook, secret-delivery-pattern.md content fill, docs CHANGELOGphase-4/documentationNone

The infra runs form a mostly-linear PR series: Run-1 → Run-2 → Run-3, plus Run-6 stacked off main after Run-2 ships dev. Run-3 itself is sequential at the cascade level (dev first, then stage and demo may interleave, then prod last) but produces a single PR rather than three.

Per the project-decomposition skill, this document captures the cross-run choreography: the sequencing graph, the operator-gate handoffs, and the artifact dependencies that the per-run project-plan.md files alone cannot express.

2. Setup phase — harness prompt minimisation

Section titled “2. Setup phase — harness prompt minimisation”

Before launching run-1, perform the following one-time setup to minimise harness permission prompts triggered by bash command-shape variants during validation. Each is a sunk cost paid once; the savings compound across the seven runs and the per-partition validate-exit.sh invocations.

2.1 .claude/settings.local.json allowlist patches

Section titled “2.1 .claude/settings.local.json allowlist patches”

Add the following patterns to the project-level settings (preferred) or to the user’s ~/.claude/settings.json. Group them under a clear comment so future maintainers see why they were added.

{
"permissions": {
"allow": [
"Bash(bash */validate-exit.sh*)",
"Bash(dig *)",
"Bash(aws cloudformation describe-stacks*)",
"Bash(aws cloudformation get-template*)",
"Bash(aws cloudformation list-exports*)",
"Bash(aws secretsmanager describe-secret*)",
"Bash(aws iam get-role*)",
"Bash(aws sts get-caller-identity*)",
"Bash(gh pr view *)",
"Bash(gh pr checks *)",
"Bash(gh run view *)",
"Bash(gh workflow run *)",
"Bash(git -C * *)",
"Bash(npm --prefix * *)",
"Bash(make -C * *)"
]
}
}

Verify before launching run-1 by running bash plan/runs/run-1-workspace-refactors/validate-exit.sh and confirming zero ad-hoc permission prompts during execution.

Every validate-exit.sh follows the same shape to keep harness-visible bash invocations uniform:

#!/usr/bin/env bash
set -euo pipefail
PASS=0
FAIL=0
TOTAL=<n>
check() {
local desc="$1" cmd="$2" expected="$3"
if result=$(eval "$cmd" 2>&1); then
if [[ "$result" == *"$expected"* ]]; then
echo "PASS: $desc"; ((PASS++))
else
echo "FAIL: $desc (expected '$expected', got '$result')"; ((FAIL++))
fi
else
echo "FAIL: $desc (command failed: $result)"; ((FAIL++))
fi
}
# Entry / Exit checks below ...
[[ $FAIL -eq 0 ]] && echo "ALL CHECKS PASSED" || { echo "SOME CHECKS FAILED"; exit 1; }

The agent invokes the script with one Bash tool call; the script internally runs all dig / aws / gh checks without each one being a separately permission-gated harness call.

The following standards apply across all run plans, validate-exit scripts, and operator runbooks. They are enforced both by the new ESLint rules landed in PR #454 (no-cd-in-shell, no-aws-profile-prefix) and by reviewer convention.

  • git -C <absolute-path> <subcommand> — never cd <path> && git ....
  • npm --prefix <absolute-path> run <script> — never cd <path> && npm ....
  • make -C <absolute-path> <target> — never cd <path> && make ....
  • aws --profile <name> <command> — never AWS_PROFILE=<name> aws ....
  • Absolute worktree paths inside script bodies; positional args for partition / infrastructure / profile.
  • .claude/settings.local.json (or equivalent) contains the patterns above.
  • bash plan/runs/run-1-workspace-refactors/validate-exit.sh --dry-run (if supported) completes without prompts.
  • No cd <path> form appears in any validate-exit.sh (verifiable by grep -rn '^cd ' plan/runs/).

Once these criteria hold, run-1 may launch.

The 5-run dependency graph mirrors analysis.md § 13.1 lifted from group level to run level. Hard edges block the downstream run until the upstream run is merged and exit criteria pass; soft edges allow parallel authoring but verification still serialises.

The diagram below shows the run dependencies. run-1 is a hard prerequisite for run-2 (the construct generalisation + tools/lib/ helpers are imported by run-2’s code) and a soft prerequisite for run-6 (drift driver also imports from tools/lib/). run-2 (code for all four partitions) is a hard prerequisite for run-3 (operator cascade), which then deploys the four partitions in the partial-order dev → {stage || demo} → prod per DQ-R1-021 — that ordering now lives inside Run-3 (see § 4.1 below) rather than between three separate runs. run-6 is a soft dependency from run-2 onwards (drift probes need at least one partition live, which Run-3’s first cascade entry provides). run-7 documentation lands last so the docs reflect what was built.

The diagram below shows the five Phase-4 runs and their dependency edges. Solid arrows are hard dependencies that block until the upstream run merges and verifies; dashed arrows are soft dependencies that allow parallel authoring with serialised post-merge verification.

PlantUML diagram

These are the human-in-the-loop steps the user performs to advance from one run to the next. Each gate is the natural pause point that drove the decomposition recommendation in evaluation.md.

BetweenOperator actionRequired artefact
run-1 → run-2Run T-O2 (Root no-drift verification) against deployed RootConfiguration. Confirm empty cdk diff. Record in verification sign-off table.Empty cdk diff output captured
run-2 → run-3Review cdk diff for each of the four {infra}-{partition}-Email stacks in PR #462’s description; approve before merge. Confirm the four mail blocks in platforms.ts cover dev / stage / demo / prod and the kyle partition is excluded.PR #462 merged to main; reviewer approval recorded
run-3 → run-6At least the dev partition entry of Run-3’s cascade verified end-to-end (CFN exports populated; Postmark Console verified). Drift workflow probes need at least one partition live to exercise — opening Run-6 before that returns no useful data.Run-3 execution-log § dev verification cleared
run-3 → (end of partition rollout)All four partition cascade entries verified end-to-end (Postmark Console verified for all four; dig checks green; CFN exports populated). T-O4 outcome recorded (arda-nonprod approval received OR more-evidence-needed path documented). Run-3 infra PR merged with the accumulated CHANGELOG, cdk.context.json, and any code fixes.Run-3 cascade-summary table all green; PR merged
run-6 → run-7Manually trigger runtime-platform-drift.yml via workflow_dispatch. Confirm no spurious issue opened. Sign off T-O8.First-run workflow log captured
run-7 → completionmake pr-checks green on the documentation PR; technical-writer review findings addressed.Docs PR merged

Run-3’s cascade is sequenced inside the single PR, not between runs. The operator advances through the four partitions in the partial-order dev → {stage || demo} → prod per DQ-R1-021. Each cascade entry is a self-contained block in the execution log:

Cascade entryOperator actionRequired artefact
(cascade entry) devPre-flight (T-O1-dev) → ./amm.sh --profile Alpha002-Admin Alpha002 dev (T-O5) → verify dig + Postmark Console + CFN exports. Record cdk.context.json block for partitionMail:Alpha002:dev.Execution-log § dev verification cleared; V-OPS-005-dev populated
(between entries) T-O4Reply to Postmark Compliance ticket #11236089 with dev.ardamails.com verified-domain evidence. Wait for Postmark response (approved OR more-evidence-needed).Email artefact captured; arda-nonprod approval status recorded
(cascade entry) stagePre-flight (T-O1-stage) → ./amm.sh --profile Alpha002-Admin Alpha002 stage (T-O5) → verify. Independent of demo; either order valid.Execution-log § stage verification cleared; V-OPS-005-stage populated
(cascade entry) demoPre-flight (T-O1-demo) → switch AWS profile to Admin-Alpha1./amm.sh --profile Admin-Alpha1 Alpha001 demo (T-O6) → verify. Independent of stage.Execution-log § demo verification cleared; V-OPS-005-demo populated
(between entries) Production deploy confirmationBoth stage and demo verified. Operator records explicit production-deploy confirmation in the execution log before opening the prod entry.Execution-log § production-deploy-confirmation signed off
(cascade entry) prodPre-flight (T-O1-prod) → ./amm.sh --profile Admin-Alpha1 Alpha001 prod (T-O7) → verify with extra care (production deploy). The --profile flag is required for every partition — amm.sh’s default profile-derivation (Admin-${infrastructure}) does not match either Phase-4 profile name.Execution-log § prod verification cleared; V-OPS-005-prod populated

A partition cascade entry that fails mid-execution is captured in the execution log and the operator addresses the cause (code fix lands in the same PR; environmental issue addressed and partition re-attempted). The cascade resumes from the failed partition; successfully-deployed prior partitions are not rolled back (per-partition isolation per DQ-R1-021 still holds at the resource level). See project-plan.md § Retreat path for the full procedure.

run-6 (drift workflow) can be authored in parallel with the Run-3 cascade once the dev cascade entry is verified — drift probes need at least one partition’s state, which Run-3’s first entry provides. Run-6 does not wait for Run-3’s cascade to complete; its PR can land any time after Run-3’s dev entry is verified.

ProducerArtefactConsumer(s)Form
run-1Generalised AllowCreatingNSRecordsRole construct (renamed) + postmarkCredentialOpReference accessor + tools/lib/* helpers + reserved-words list entriesrun-2 (CDK code imports + tools script)TypeScript imports + module exports
run-2PartitionEmailStack class file + apps/Al1x/partition.ts instantiation + register-partition-mail-signature.ts entry script + amm.sh partition-mail step + four mail blocks in platforms.ts covering dev / stage / demo / prodrun-3 (operator deploys consume this code; no further code-only PRs are needed)Source files on main
run-3Per-partition live infrastructure for all four active partitions: {partition}.ardamails.com zones + NS-delegations + SPF + DMARC + DKIM + Return-Path records + two SM secrets per partition + two IAM roles per partitionDrift workflow (run-6) probes; Phase 5b consumes via CFN exportsLive AWS resources
run-3Postmark Sender Signatures: dev.ardamails.com and stage.ardamails.com on PostmarkNonProd; demo.ardamails.com and prod.ardamails.com on PostmarkProdPhase 5b email module sends through these SignaturesPostmark API state
run-3cdk.context.json populated with one partitionMail:<infra>:<partition> block per partition (DKIM selector, DKIM public key, Return-Path target)Future cdk synth runs (CI + local); subsequent re-runs of amm.sh are idempotent against this stateCommitted file on main
run-3Postmark Compliance ticket #11236089 outcome captured (T-O4)REQ-OPS-004 satisfied — arda-nonprod approval status knownEmail-thread artefact in execution log
run-6runtime-platform-drift.yml workflow + driver + extracted tools/lib/drift/ helpersScheduled drift checks; future runtime-platform drift checks beyond email; corporate-drift regression-tested with extracted helpersWorkflow + module exports
run-7Filled-in secret-delivery-pattern.md + per-partition mail pages in current-system/runtime/ + Postmark-service multi-Signature updates + encryption-key rotation runbook + docs CHANGELOG entryFuture maintainers; Phase 5b authors; operatorsMarkdown pages

Each run’s failure mode and recovery path:

RunFailure modeRecovery
run-1T-I2 byte-identity test fails on PR → cannot mergeDiagnose construct change → re-author T-I1 → re-run test. No deployed AWS state to roll back.
run-1T-O2 post-merge Root drift detectedStop. Investigate Root drift cause (Phase 4 construct change OR external drift unrelated). Resolve before any partition deploy.
run-2Per-partition synth test failsDiagnose the failing partition; fix PartitionEmailStack / platforms.ts / apps/Al1x/partition.ts; re-run tests. No deployed AWS state.
run-2Reviewer flags an issue in cdk diff for one of the four partitionsFix in the same PR; re-trigger synth + tests. Code is not merged until all four partitions’ synthesised templates are reviewed.
run-3Pre-Deploy CLI step (register-partition-mail-signature.ts) fails on a partitionIdempotent re-run after fixing root cause; no partial AWS state at that partition. Capture in execution-log § <partition> / Notes.
run-3cdk deploy fails on a partitionCFN rolls back the stack. Investigate; re-run amm.sh for that partition once cause resolved. Prior partitions are unaffected (per-partition isolation per DQ-R1-021). Capture in execution-log § <partition> / Notes.
run-3T-O4 Postmark Compliance reply doesn’t unlock arda-nonprod after devOperator follows Postmark Support’s direction; may need to provision additional Signatures or wait for review. Update REQ-OPS-004 documented assumption. The stage cascade entry waits until the outcome is clear; demo and prod cascade entries are not blocked (different Postmark account).
run-3A partition deploy surfaces a code-level issueFix lands in the same Run-3 infra PR (no separate PR cycle). Capture in execution-log § Code-fixes-that-surfaced. Re-run that partition’s amm.sh once the fix is on the branch.
run-3Non-recoverable partition failure within cycle (e.g., extended Postmark back-and-forth blocks stage)Operator may close Run-3 with the partition unverified; PR captures only the verified partitions. Open a follow-up run for the unverified ones, documented explicitly in the execution log.
run-6Drift workflow fails on first scheduled runInspect logs; tune probe thresholds; re-run via workflow_dispatch. No AWS state change.
run-7make pr-checks failsFix the offending page; re-run locally. No AWS state change.

Phase 4 is complete when all of the following hold:

  • Runs 1, 2, 3, 6, 7 each have their validate-exit.sh exit 0.
  • All four active partition mail sub-zones live and delegated (REQ-PART-001..006 satisfied per ../design/verification.md V-PART-001..005).
  • Each partition’s Postmark Sender Signature registered and verified (REQ-PART-007..010, V-PART-007..010).
  • Per-partition encryption-key SM secret exists with RemovalPolicy.RETAIN (REQ-PART-014, V-PART-014).
  • Per-partition Postmark account-token SM secret exists, populated via δ.1 (REQ-PART-011, V-PART-011).
  • All six -API- CFN exports per partition (REQ-PART-002, 012, 015, 018, 020 + zone-name).
  • Both per-partition IAM roles exist (REQ-PART-017, 019; V-PART-017, 019).
  • arda-nonprod Postmark account approval received OR Postmark reply requesting more evidence captured (REQ-OPS-004).
  • runtime-platform-drift.yml has completed at least one successful scheduled run (REQ-CI-001, V-CI-001).
  • Root account’s RootDnsStack produces byte-identical CFN post-Phase-4 (REQ-IAC-002, V-IAC-002).
  • Documentation deliverables landed (REQ-DOC-001..004, V-DOC-001..004).
  • Operator sign-off table in ../design/verification.md fully populated.
  • All five PRs merged to main on their respective repositories (Run-3’s single cascade PR closes what was originally three).

Orchestration metadata for the team-lead spawning agents per run. Skills below are consulted on demand, not bulk-loaded.

Skill names below correspond to workspace skill directories under workspace/instructions/claude/skills/<name>/. Some have public documentation pages on the Arda docs site; most are agent-only and resolved via the resolve-doc-page.sh helper at skill-load time.

SkillLoaded byUsed in
cdk-infrastructuredevops-engineerAll CDK construct / stack / app work (runs 1–6)
typescript-codingdevops-engineertools/ scripts and tools/lib/ helpers (runs 1, 2, 6)
unit-tests-infradevops-engineerCDK Template-matcher test surface (runs 1, 2, 6)
path-conventionsAll personasCross-system doc links and relative paths
document-writingtechnical-writerNew current-system/ pages and runbook (run 7)
pr-stewardAll personasLanding each PR (runs 1–7)
project-decompositionTeam-lead / userAuthoring this plan/ tree (already applied)

The Team Lead spawns the CI Observer at project start. The observer collects observations through runs 1–7 — repeated errors, slow iterations, friction patterns, deviations from the plan — and produces continuous-improvement-proposal.md at the project root at close. The proposal feeds the improvement-analyzer skill (run by the user, post-Phase-4) which decides which proposals become skill updates, agent updates, or template changes.

The CI Observer runs in the background for the full Phase 4 lifecycle. Each run’s validate-exit.sh is a natural data point: when a check fails, the observer notes the cause (script bug? convention violation? agent confusion?) and tags it.

Once Phase 4 completion criteria (§ 7) hold, perform the lifecycle wrap-up:

Author the following files under 4-runtime-platform-updates/implementation/ before retiring the project directory:

FileDescription
learnings.mdDurable insights from Phase 4 implementation — patterns, surprises, codebase lessons that should outlive the project.
suggestions.mdForward-looking improvements for Phase 5a / 5b and beyond — work that surfaced as worth doing but is out of Phase 4 scope.
phase-a-deploy.mdRun-1 outcomes; Root no-drift verification result.
phase-b-deploy.mdRun-2 code-rollout outcomes — cdk diff summary across the four {infra}-{partition}-Email stacks; reviewer approval; merge record.
phase-b-cascade-outcomes.mdRun-3 operator cascade outcomes; consolidated record of dev / stage / demo / prod deploys. Distilled from runs/run-3-operator-cascade/execution-log.md into a durable summary.
phase-c-deploy.mdRun-6 outcomes; first scheduled drift run results.
phase-d-deploy.mdRun-7 outcomes; documentation review findings.
continuous-improvement-proposal.md (project root)Output of the CI Observer + improvement-analyzer consolidating structural improvements identified throughout Phase 4.

Per the project lifecycle convention, move the 4-runtime-platform-updates/ directory:

Terminal window
git -C /Users/jmp/code/arda/documentation mv \
src/content/docs/roadmap/completed/email-integration/4-runtime-platform-updates \
src/content/docs/roadmap/completed/email-integration/4-runtime-platform-updates

Update any inbound references in roadmap/completed/email-integration/ and verify make pr-checks still passes.

Remove the Phase 4 worktrees and local branches once all PRs are merged:

Terminal window
# Remove all Phase 4 infrastructure run worktrees + the documentation worktree.
# Per DQ-R1-026, Run-3 is the only operator-deploy run; numbers 4 and 5 are vacant.
for n in 2 3 6; do
git -C /Users/jmp/code/arda/infrastructure worktree remove \
/Users/jmp/code/arda/projects/email-integration-worktrees/phase-4/infrastructure-run-$n
git -C /Users/jmp/code/arda/infrastructure branch -d jmpicnic/email-integration-phase-4-run-$n
done
git -C /Users/jmp/code/arda/infrastructure worktree remove \
/Users/jmp/code/arda/projects/email-integration-worktrees/phase-4/infrastructure # Run-1 worktree
git -C /Users/jmp/code/arda/infrastructure branch -d jmpicnic/email-integration-phase-4
git -C /Users/jmp/code/arda/documentation worktree remove \
/Users/jmp/code/arda/projects/email-integration-worktrees/phase-4/documentation
git -C /Users/jmp/code/arda/documentation branch -d jmpicnic/email-integration-phase-4
rmdir /Users/jmp/code/arda/projects/email-integration-worktrees/phase-4

The phase-5a/* and phase-5b/* worktrees stay in place — they continue with their own phase work.

Verify that ../../5b-email-module/pre-existing-decisions.md’s references to Phase 4 resolve correctly post-move. The cross-links to DQ-R1-019, DQ-R1-020, DQ-R1-023, and the per-partition -API- exports must point at the new roadmap/completed/ location.


Copyright: (c) Arda Systems 2025-2026, All rights reserved