Skip to content

Run-2 Plan Review 1

This review audits:

  • project-plan.md (Run-2: dev partition rollout)
  • validate-exit.sh (Run-2 exit-criteria validation script)

against the surrounding Phase 4 documents — the choreography, the design specification, the decision log entries (especially DQ-R1-020, DQ-R1-021, DQ-R1-022), the requirements and verification matrices, and the workspace conventions captured in the workspace CLAUDE.md files. The goal is to surface inconsistencies, ambiguities, completeness gaps, and script-quality bugs that would force an implementer to invent answers during Run-2 authoring.

Findings are grouped by category and prioritised:

  • Blocking — must resolve before Run-2 authoring can proceed unambiguously
  • Should resolve — clarifies intent; not strictly blocking
  • Polish — script-quality / minor cleanup

The right-most column is intentionally blank for operator guidance, to be filled in as decisions are made.

#SeverityFindingRecommended actionGuidance
A1Should resolveRun-2 plan line 12 reads “Branch / PR: jmpicnic/email-integration-phase-4 (infra) → PR phase-4-G-B-C-D-dev”. jmpicnic/email-integration-phase-4 is already in use by Run-1 / PR #455. The actual Run-2 worktree was set up on jmpicnic/email-integration-phase-4-run-2 (per operator decision).Update the Branch / PR line in the Run-2 project-plan to reference the actual jmpicnic/email-integration-phase-4-run-2 branch. Note the stacked-PR base (Run-1’s branch → main, retargeting after Run-1 merges).Agreed, update other run plans to follow the new convention too.
A2Should resolveT-D1 row claims it covers “V-PART-001..020 for dev, V-IAC-003..008, V-CLI-001..005”. After Run-1 landed (and the local c02db23 docs commit flipped statuses to “complete (Run-1)”), the following are already done: V-PART-007, V-IAC-001, V-IAC-002, V-IAC-007, V-CLI-003. V-IAC-003 was pre-existing from Phase 2 (CFN-name preservation). Without filtering these out, the implementer will re-author rows that already exist or appear to skip work that is actually already done.Restate the T-D1 scope as the explicit V-check set Run-2 adds (V-PART-001..006, V-PART-008..020 dev-specific, V-IAC-004..006, V-IAC-008, V-CLI-001, V-CLI-002, V-CLI-004, V-CLI-005). Cross-reference the verification.md table directly to avoid drift.Ageed
A3BlockingPre-merge vs post-merge deploy ambiguity. § “Worktree strategy” says: “Pre-merge operator verification of amm.sh Alpha002 dev happens against the PR branch locally before merge (recommended).” § “Exit criteria” says: “Post-merge operator: ./amm.sh Alpha002 dev runs end-to-end.” Spec T-O3 walks the deploy as a post-merge operator step. The contradiction matters because pre-merge amm.sh creates real AWS resources in Alpha002 before the source of truth lands on main.Resolve to post-merge only. Reframe any pre-merge step to read-only checks (cdk diff against the PR branch’s synth) — never amm.sh, which writes. Update both the Worktree-strategy paragraph and the Exit-criteria block to match.Agreed with the option for operator to override and execute deployments pre-merge.
A4Should resolveStack-id casing not pinned. Plan exit criterion: “CFN stack name is Alpha002-dev-Email (lowercase partition). validate-exit.sh hardcodes Alpha002-dev-Email. Spec T-I4 references the image-storage.ts precedent; the canonical form is ${infrastructure}-${purpose}-Email. Should be explicit that the partition is lowercased and applies to all four (prod, demo, stage, dev).Pin the canonical form in T-I4 (“CFN stack id: ${infrastructure}-${purpose.toLowerCase()}-Email”) and add a test in partition-email.test.ts that asserts the stackName property for at least two partition fixtures.Agreed
#SeverityFindingRecommended actionGuidance
B1BlockingEntry criterion dmarc-reports@arda.cards mailbox is healthy (T-O1 one-time-per-rollout check)” — “healthy” is not operationally defined. Send a test message? Confirm the mailbox exists in Workspace admin? Verify someone monitors it? Without a defined verification step the operator either invents one or skips.Define the check explicitly. Suggested: send a synthetic DMARC report (via swaks or equivalent) to dmarc-reports@arda.cards and confirm receipt within 60 seconds. Or: a one-time confirmation via Workspace admin console that the mailbox exists and has a designated owner. Pin the exact verification in T-O1 in specification.md.Agreed, send email and confirm by opertor. Time window is 10’ not 60’
B2Should resolveSpec T-I10 step 1 says: “The partition is parameterized: amm.sh follows its existing convention for partition selection (the implementer should match the existing partitionSecrets step’s interface — likely a --partition <name> flag or positional arg).” validate-exit.sh hardcodes ./amm.sh Alpha002 dev (positional). Workspace memory feedback_amm_sh_for_deploys says ./amm.sh <infra> with no partitions deploys infra-only (skips partition loop)” — implying the canonical shape is positional with partition optional.Pin the interface in T-I10 explicitly: ./amm.sh <infrastructure> [<partition>] (two positionals, partition optional). Remove the “likely a flag” hedge.Agreed
B3Should resolveSpec T-O3 step 1: “Commit the updated cdk.context.json (the file is committed per DQ-R1-014; this commit lands in the PR or as a follow-up).” “In the PR or as a follow-up” is two materially different choices. If in the PR, reviewers see the post-prepare-step Postmark public values (DKIM selector, key, return-path) in the diff. If a follow-up, the PR can’t be fully verified by cdk synth because the context values aren’t present.Pin: the pre-deploy step is run before the PR is opened (or as part of the PR’s first commit), and cdk.context.json updates are committed as part of the per-partition PR. This makes the PR diff complete and self-verifying.Agrred
B4Should resolveT-O3 step 1: “Surface the cdk diff summary in the PR description; user confirmation before proceeding.” The exact text of the operator confirmation prompt is not defined; nor is what counts as “confirmation” (typed Yes/y? acknowledgement in PR review?).Pin the operator confirmation as a prompt in amm.sh that requires typing yes (literal) before invoking cdk deploy. Match the existing convention in corporate-cli deploy’s Deploy these changes to the Corporate stacks? [y/N] prompt or supersede it explicitly.Agreed
#SeverityFindingRecommended actionGuidance
C1Should resolveCross-account NS-delegation flow not in the plan. T-I4 row reads “Author PartitionEmailStack (three-interface pattern, validateProps, six -API- exports)” — it does not mention the cross-account NS-delegation back to PlatformRoot’s ardamails.com zone via WriteNSRecordsToUpstreamDns. This is the critical seam between Run-1’s generalised construct and Run-2’s deploy. Spec T-I4 implies it via the imports list; the plan should call it out as a first-class deliverable.Add a bullet under T-I4 in the plan calling out the cross-account NS-delegation: the partition-email stack writes its zone’s NS records into PlatformRoot’s ardamails.com via WriteNSRecordsToUpstreamDns, assuming the generalised AllowCreatingDnsRecordsRole from Run-1 in stsAssumeRole trust mode.Agreed
C2BlockingThe six -API- exports are described with two different naming conventions across artefacts. validate-exit.sh grep checks TS-variable identifiers like partitionMailZoneIdAPI, partitionMailZoneNameAPI, emailPostmarkAccountTokenArnAPI, emailEncryptionKeyArnAPI, emailDnsProvisioningRoleArnAPI, emailEncryptionKeyFallbackRoleArnAPI (suffix-API camelCase). The phase-4 design lists CFN-export-name strings like ${publishingPrefix}-API-PartitionMailZoneId, ${publishingPrefix}-API-PartitionMailZoneName, etc. (infix -API- PascalCase). Both can coexist — TS variable name vs CFN export name — but the consumer (operations Helm chart, Phase 5b) references the CFN names, and an implementer can synthesise the wrong export-name string without realising.Pin both in specification.md § T-I4 in one authoritative table: TS variable identifier ↔ CFN export-name string ↔ what consumes it. Cross-reference from exports.md. validate-exit.sh should check the CFN-export string (not just the TS variable name) — e.g., by running cdk synth and asserting findOutputs("*", {Export: {Name: ...}}) for each of the six.Agreed, first verify by code inspection or aws cli the exact conventions followed for other resources
C3Should resolveThe PartitionId type added in Run-1 (in platform/postmark-service.ts) is not mentioned as a dependency in the Run-2 plan. Multiple Run-2 files will need to import it (the partition-email stack, the instance configs, the entry script).Add a “Run-1 outputs consumed” sub-section under T-I4 / T-I5 / T-I8 enumerating: PartitionId (type), postmarkCredentialOpReference (function), AllowCreatingDnsRecordsRole + TRUST_PRINCIPAL_KIND (construct + discriminator), tools/lib/postmark-client.ts, tools/lib/op-resolver.ts, tools/lib/logger.ts.Agreed
C4Should resolveNo explicit “Sender Signature only — not Server” assertion. Phase 4 deliverable row 9 and the design make clear: Phase 4 creates per-partition Sender Signatures, not Postmark Servers (the PostmarkServer thin-wrapper from Phase 3 is not used in Phase 4; Phase 5b will introduce per-tenant Servers). An implementer following Phase 3’s corporate-cli prepare pattern might add a Server.Add an explicit “Out of scope” bullet to T-I8: “Do NOT create a Postmark Server — Phase 4 only creates a Sender Signature.” Cross-reference exports.md § 10 (which already says the PostmarkServer thin-wrapper is not used by Phase 4).Agreed, even stronger wording than “Out of Scope”, like “The design specifies that servers are created by Tenant at system Runtime, not at Provisioning time”
C5Should resolveRun-2’s downstream artefacts not enumerated. Plan says “Run-2 lands once for the whole phase” but does not list what concretely Runs 3 / 4 / 5 inherit (the stack class, the entry script, the amm.sh step) vs what they add (only the per-partition instance config).Add a “Downstream consumers” sub-section to the Run-2 plan listing: (a) artefacts Run-3 / Run-4 / Run-5 reuse as-is; (b) artefacts those runs only parameterise; (c) the operator gates that separate them. Cross-reference choreography § “Artifact dependencies”.Agreed.

D. Script-quality bugs in validate-exit.sh

Section titled “D. Script-quality bugs in validate-exit.sh”
#SeverityFindingRecommended actionGuidance
D1Polishset -euo pipefail combined with ((PASS++)) (which returns the prior value of PASS as the arithmetic result, evaluating to 0 on the first PASS and triggering set -e early-exit). Same bug Run-1’s script had before our fix.Apply the same fix Run-1 received: drop set -e (keep -uo pipefail) and switch counter increments to PASS=$((PASS+1)) / FAIL=$((FAIL+1)).Agreed., propagate to later runs if applicable too.
D2PolishFive expected="" “always-pass” checks at lines 55, 59, 67, 82, 83, 95. Each uses grep -c … against an empty expected substring; the empty substring matches anything including "0", so the check passes whether or not the artefact exists. Affects: amm.sh references register-partition-mail-signature.ts; stack defines six -API- export keys; T-D1: V-PART-001 entry present; Hosted zone $ZONE NS-delegated via root; Postmark Sender Signature verified for $ZONE.Tighten each check to use a && echo <sentinel> pattern with a non-empty expected="<sentinel>", same as Run-1’s hardened script.Agreed
D3PolishTOTAL is not defined; the summary line shows Passed: N without a denominator. Run-1’s script defines TOTAL=15.Set TOTAL to the count of check calls and emit Passed: $PASS / $TOTAL and Failed: $FAIL / $TOTAL.Agreed
D4PolishPost-merge SM-secret check uses a fragile fallback: aws secretsmanager describe-secret … --query DeletionPolicy --output text 2>/dev/null || aws secretsmanager describe-secret … --query ARN --output text. DeletionPolicy is not a field on describe-secret (it is a CloudFormation attribute on the parent stack template, not on the live SM resource). The first branch always errors silently and the fallback to ARN is what actually runs.Either query the parent CFN stack via aws cloudformation describe-stack-resource … --logical-resource-id EmailEncryptionKey and assert DeletionPolicy: Retain, or assert only that the SM secret exists (and defer the RETAIN policy check to the unit test on the construct).Asset that the secret exists
D5PolishNo cdk synth stack-name assertion. Plan exit criterion line 56 says cdk synth … produces a valid template; CFN stack name is Alpha002-dev-Email — but validate-exit.sh does not run cdk synth or check the stack name.Add a check that runs cdk synth --context partition=dev and asserts the stack is emitted with the expected name. Or document that this gate is operator-driven and remove the wording from the exit criteria.Add cdk synth ... check.
D6PolishThe SandboxKyle002 rejection check appears in the description of the register-partition-mail-signature.ts test (line 58) but is not asserted. Line 50: “register-partition-mail-signature.test.ts present” only checks file presence. Spec T-I8 says the script must reject SandboxKyle002 as an <infrastructure> argument with a PDEV-438 reference.Add a grep-based check that the test file contains a SandboxKyle002 rejection case, or invoke the script with SandboxKyle002 <whatever> and assert non-zero exit + the expected usage-block output.Eliminate any reference to SandboxKyle002. and its partitions. They are retired.

E. Choreography / cross-document reconciliation

Section titled “E. Choreography / cross-document reconciliation”
#SeverityFindingRecommended actionGuidance
E1Should resolveWorktree mapping in choreography.md § 4 says PRs #1, #2, #5, #6, #7 share phase-4/infrastructure/ and only Runs 3 + 4 need task-specific worktrees. The operator decision (separate Run-2 worktree to keep Run-1 / PR #455 undisturbed) means Run-2 also has a task-specific worktree at phase-4/infrastructure-run-2.Update choreography.md § 4 “Worktree mapping” to reflect the actual layout. Either add Run-2 as a third sequential worktree, or document the deviation as a one-off because Run-1 was still in review when Run-2 started.Agreed
E2Should resolveT-O4 timing ambiguity. Choreography places T-O4 as the operator gate between Run-2 and Run-3 (it unlocks arda-nonprod for Run-3’s stage Sender Signature). Run-2 plan lists it as one of Run-2’s own tasks (table row 9). Both are right — it happens at the boundary — but an implementer might think the email-thread artefact lands inside the Run-2 PR diff.Reword Run-2 plan T-O4 row to: “Operator-driven; sent after dev deploy is signed off. Closes Run-2 from a sign-off standpoint AND unblocks Run-3 (stage). Email-thread artefact, not a code change.”Agreed
E3Polishamm.sh argument convention. Spec T-I10 says “likely a --partition <name> flag or positional arg” (punted). Run-2’s validate-exit.sh uses ./amm.sh Alpha002 dev (positional). Workspace memory feedback_amm_sh_for_deploys says ./amm.sh <infra> with no partitions deploys infra-only (skips partition loop)” — implying canonical shape is positional, with partition optional.Pin in T-I10 (already covered by B2). This row is redundant if B2 is resolved; included here for completeness against the choreography.Agreed, make sure all documents are consistent (same wording if that is the case)

Summary of top items to resolve before authoring

Section titled “Summary of top items to resolve before authoring”
  1. A3 — Deploy timing (pre-merge vs post-merge for amm.sh Alpha002 dev).
  2. B1 — Operational definition for the dmarc-reports@arda.cards mailbox health check.
  3. C2 — Pin both the TS variable identifiers and the CFN export-name strings for the six -API- exports in one authoritative table.

Everything else is resolvable incrementally during Run-2 authoring, but the three above are decision points that benefit from operator input before an implementer commits to interpretation.


Copyright: (c) Arda Systems 2025-2026, All rights reserved