Run-2 Plan Review 1
Scope of review
Section titled “Scope of review”This review audits:
project-plan.md(Run-2: dev partition rollout)validate-exit.sh(Run-2 exit-criteria validation script)
against the surrounding Phase 4 documents — the choreography, the design specification, the decision log entries (especially DQ-R1-020, DQ-R1-021, DQ-R1-022), the requirements and verification matrices, and the workspace conventions captured in the workspace CLAUDE.md files. The goal is to surface inconsistencies, ambiguities, completeness gaps, and script-quality bugs that would force an implementer to invent answers during Run-2 authoring.
Findings are grouped by category and prioritised:
- Blocking — must resolve before Run-2 authoring can proceed unambiguously
- Should resolve — clarifies intent; not strictly blocking
- Polish — script-quality / minor cleanup
The right-most column is intentionally blank for operator guidance, to be filled in as decisions are made.
A. Inconsistencies between documents
Section titled “A. Inconsistencies between documents”| # | Severity | Finding | Recommended action | Guidance |
|---|---|---|---|---|
| A1 | Should resolve | Run-2 plan line 12 reads “Branch / PR: jmpicnic/email-integration-phase-4 (infra) → PR phase-4-G-B-C-D-dev”. jmpicnic/email-integration-phase-4 is already in use by Run-1 / PR #455. The actual Run-2 worktree was set up on jmpicnic/email-integration-phase-4-run-2 (per operator decision). | Update the Branch / PR line in the Run-2 project-plan to reference the actual jmpicnic/email-integration-phase-4-run-2 branch. Note the stacked-PR base (Run-1’s branch → main, retargeting after Run-1 merges). | Agreed, update other run plans to follow the new convention too. |
| A2 | Should resolve | T-D1 row claims it covers “V-PART-001..020 for dev, V-IAC-003..008, V-CLI-001..005”. After Run-1 landed (and the local c02db23 docs commit flipped statuses to “complete (Run-1)”), the following are already done: V-PART-007, V-IAC-001, V-IAC-002, V-IAC-007, V-CLI-003. V-IAC-003 was pre-existing from Phase 2 (CFN-name preservation). Without filtering these out, the implementer will re-author rows that already exist or appear to skip work that is actually already done. | Restate the T-D1 scope as the explicit V-check set Run-2 adds (V-PART-001..006, V-PART-008..020 dev-specific, V-IAC-004..006, V-IAC-008, V-CLI-001, V-CLI-002, V-CLI-004, V-CLI-005). Cross-reference the verification.md table directly to avoid drift. | Ageed |
| A3 | Blocking | Pre-merge vs post-merge deploy ambiguity. § “Worktree strategy” says: “Pre-merge operator verification of amm.sh Alpha002 dev happens against the PR branch locally before merge (recommended).” § “Exit criteria” says: “Post-merge operator: ./amm.sh Alpha002 dev runs end-to-end.” Spec T-O3 walks the deploy as a post-merge operator step. The contradiction matters because pre-merge amm.sh creates real AWS resources in Alpha002 before the source of truth lands on main. | Resolve to post-merge only. Reframe any pre-merge step to read-only checks (cdk diff against the PR branch’s synth) — never amm.sh, which writes. Update both the Worktree-strategy paragraph and the Exit-criteria block to match. | Agreed with the option for operator to override and execute deployments pre-merge. |
| A4 | Should resolve | Stack-id casing not pinned. Plan exit criterion: “CFN stack name is Alpha002-dev-Email” (lowercase partition). validate-exit.sh hardcodes Alpha002-dev-Email. Spec T-I4 references the image-storage.ts precedent; the canonical form is ${infrastructure}-${purpose}-Email. Should be explicit that the partition is lowercased and applies to all four (prod, demo, stage, dev). | Pin the canonical form in T-I4 (“CFN stack id: ${infrastructure}-${purpose.toLowerCase()}-Email”) and add a test in partition-email.test.ts that asserts the stackName property for at least two partition fixtures. | Agreed |
B. Ambiguity / missing operational detail
Section titled “B. Ambiguity / missing operational detail”| # | Severity | Finding | Recommended action | Guidance |
|---|---|---|---|---|
| B1 | Blocking | Entry criterion “dmarc-reports@arda.cards mailbox is healthy (T-O1 one-time-per-rollout check)” — “healthy” is not operationally defined. Send a test message? Confirm the mailbox exists in Workspace admin? Verify someone monitors it? Without a defined verification step the operator either invents one or skips. | Define the check explicitly. Suggested: send a synthetic DMARC report (via swaks or equivalent) to dmarc-reports@arda.cards and confirm receipt within 60 seconds. Or: a one-time confirmation via Workspace admin console that the mailbox exists and has a designated owner. Pin the exact verification in T-O1 in specification.md. | Agreed, send email and confirm by opertor. Time window is 10’ not 60’ |
| B2 | Should resolve | Spec T-I10 step 1 says: “The partition is parameterized: amm.sh follows its existing convention for partition selection (the implementer should match the existing partitionSecrets step’s interface — likely a --partition <name> flag or positional arg).” validate-exit.sh hardcodes ./amm.sh Alpha002 dev (positional). Workspace memory feedback_amm_sh_for_deploys says “./amm.sh <infra> with no partitions deploys infra-only (skips partition loop)” — implying the canonical shape is positional with partition optional. | Pin the interface in T-I10 explicitly: ./amm.sh <infrastructure> [<partition>] (two positionals, partition optional). Remove the “likely a flag” hedge. | Agreed |
| B3 | Should resolve | Spec T-O3 step 1: “Commit the updated cdk.context.json (the file is committed per DQ-R1-014; this commit lands in the PR or as a follow-up).” “In the PR or as a follow-up” is two materially different choices. If in the PR, reviewers see the post-prepare-step Postmark public values (DKIM selector, key, return-path) in the diff. If a follow-up, the PR can’t be fully verified by cdk synth because the context values aren’t present. | Pin: the pre-deploy step is run before the PR is opened (or as part of the PR’s first commit), and cdk.context.json updates are committed as part of the per-partition PR. This makes the PR diff complete and self-verifying. | Agrred |
| B4 | Should resolve | T-O3 step 1: “Surface the cdk diff summary in the PR description; user confirmation before proceeding.” The exact text of the operator confirmation prompt is not defined; nor is what counts as “confirmation” (typed Yes/y? acknowledgement in PR review?). | Pin the operator confirmation as a prompt in amm.sh that requires typing yes (literal) before invoking cdk deploy. Match the existing convention in corporate-cli deploy’s Deploy these changes to the Corporate stacks? [y/N] prompt or supersede it explicitly. | Agreed |
C. Completeness gaps
Section titled “C. Completeness gaps”| # | Severity | Finding | Recommended action | Guidance |
|---|---|---|---|---|
| C1 | Should resolve | Cross-account NS-delegation flow not in the plan. T-I4 row reads “Author PartitionEmailStack (three-interface pattern, validateProps, six -API- exports)” — it does not mention the cross-account NS-delegation back to PlatformRoot’s ardamails.com zone via WriteNSRecordsToUpstreamDns. This is the critical seam between Run-1’s generalised construct and Run-2’s deploy. Spec T-I4 implies it via the imports list; the plan should call it out as a first-class deliverable. | Add a bullet under T-I4 in the plan calling out the cross-account NS-delegation: the partition-email stack writes its zone’s NS records into PlatformRoot’s ardamails.com via WriteNSRecordsToUpstreamDns, assuming the generalised AllowCreatingDnsRecordsRole from Run-1 in stsAssumeRole trust mode. | Agreed |
| C2 | Blocking | The six -API- exports are described with two different naming conventions across artefacts. validate-exit.sh grep checks TS-variable identifiers like partitionMailZoneIdAPI, partitionMailZoneNameAPI, emailPostmarkAccountTokenArnAPI, emailEncryptionKeyArnAPI, emailDnsProvisioningRoleArnAPI, emailEncryptionKeyFallbackRoleArnAPI (suffix-API camelCase). The phase-4 design lists CFN-export-name strings like ${publishingPrefix}-API-PartitionMailZoneId, ${publishingPrefix}-API-PartitionMailZoneName, etc. (infix -API- PascalCase). Both can coexist — TS variable name vs CFN export name — but the consumer (operations Helm chart, Phase 5b) references the CFN names, and an implementer can synthesise the wrong export-name string without realising. | Pin both in specification.md § T-I4 in one authoritative table: TS variable identifier ↔ CFN export-name string ↔ what consumes it. Cross-reference from exports.md. validate-exit.sh should check the CFN-export string (not just the TS variable name) — e.g., by running cdk synth and asserting findOutputs("*", {Export: {Name: ...}}) for each of the six. | Agreed, first verify by code inspection or aws cli the exact conventions followed for other resources |
| C3 | Should resolve | The PartitionId type added in Run-1 (in platform/postmark-service.ts) is not mentioned as a dependency in the Run-2 plan. Multiple Run-2 files will need to import it (the partition-email stack, the instance configs, the entry script). | Add a “Run-1 outputs consumed” sub-section under T-I4 / T-I5 / T-I8 enumerating: PartitionId (type), postmarkCredentialOpReference (function), AllowCreatingDnsRecordsRole + TRUST_PRINCIPAL_KIND (construct + discriminator), tools/lib/postmark-client.ts, tools/lib/op-resolver.ts, tools/lib/logger.ts. | Agreed |
| C4 | Should resolve | No explicit “Sender Signature only — not Server” assertion. Phase 4 deliverable row 9 and the design make clear: Phase 4 creates per-partition Sender Signatures, not Postmark Servers (the PostmarkServer thin-wrapper from Phase 3 is not used in Phase 4; Phase 5b will introduce per-tenant Servers). An implementer following Phase 3’s corporate-cli prepare pattern might add a Server. | Add an explicit “Out of scope” bullet to T-I8: “Do NOT create a Postmark Server — Phase 4 only creates a Sender Signature.” Cross-reference exports.md § 10 (which already says the PostmarkServer thin-wrapper is not used by Phase 4). | Agreed, even stronger wording than “Out of Scope”, like “The design specifies that servers are created by Tenant at system Runtime, not at Provisioning time” |
| C5 | Should resolve | Run-2’s downstream artefacts not enumerated. Plan says “Run-2 lands once for the whole phase” but does not list what concretely Runs 3 / 4 / 5 inherit (the stack class, the entry script, the amm.sh step) vs what they add (only the per-partition instance config). | Add a “Downstream consumers” sub-section to the Run-2 plan listing: (a) artefacts Run-3 / Run-4 / Run-5 reuse as-is; (b) artefacts those runs only parameterise; (c) the operator gates that separate them. Cross-reference choreography § “Artifact dependencies”. | Agreed. |
D. Script-quality bugs in validate-exit.sh
Section titled “D. Script-quality bugs in validate-exit.sh”| # | Severity | Finding | Recommended action | Guidance |
|---|---|---|---|---|
| D1 | Polish | set -euo pipefail combined with ((PASS++)) (which returns the prior value of PASS as the arithmetic result, evaluating to 0 on the first PASS and triggering set -e early-exit). Same bug Run-1’s script had before our fix. | Apply the same fix Run-1 received: drop set -e (keep -uo pipefail) and switch counter increments to PASS=$((PASS+1)) / FAIL=$((FAIL+1)). | Agreed., propagate to later runs if applicable too. |
| D2 | Polish | Five expected="" “always-pass” checks at lines 55, 59, 67, 82, 83, 95. Each uses grep -c … against an empty expected substring; the empty substring matches anything including "0", so the check passes whether or not the artefact exists. Affects: amm.sh references register-partition-mail-signature.ts; stack defines six -API- export keys; T-D1: V-PART-001 entry present; Hosted zone $ZONE NS-delegated via root; Postmark Sender Signature verified for $ZONE. | Tighten each check to use a && echo <sentinel> pattern with a non-empty expected="<sentinel>", same as Run-1’s hardened script. | Agreed |
| D3 | Polish | TOTAL is not defined; the summary line shows Passed: N without a denominator. Run-1’s script defines TOTAL=15. | Set TOTAL to the count of check calls and emit Passed: $PASS / $TOTAL and Failed: $FAIL / $TOTAL. | Agreed |
| D4 | Polish | Post-merge SM-secret check uses a fragile fallback: aws secretsmanager describe-secret … --query DeletionPolicy --output text 2>/dev/null || aws secretsmanager describe-secret … --query ARN --output text. DeletionPolicy is not a field on describe-secret (it is a CloudFormation attribute on the parent stack template, not on the live SM resource). The first branch always errors silently and the fallback to ARN is what actually runs. | Either query the parent CFN stack via aws cloudformation describe-stack-resource … --logical-resource-id EmailEncryptionKey and assert DeletionPolicy: Retain, or assert only that the SM secret exists (and defer the RETAIN policy check to the unit test on the construct). | Asset that the secret exists |
| D5 | Polish | No cdk synth stack-name assertion. Plan exit criterion line 56 says “cdk synth … produces a valid template; CFN stack name is Alpha002-dev-Email” — but validate-exit.sh does not run cdk synth or check the stack name. | Add a check that runs cdk synth --context partition=dev and asserts the stack is emitted with the expected name. Or document that this gate is operator-driven and remove the wording from the exit criteria. | Add cdk synth ... check. |
| D6 | Polish | The SandboxKyle002 rejection check appears in the description of the register-partition-mail-signature.ts test (line 58) but is not asserted. Line 50: “register-partition-mail-signature.test.ts present” only checks file presence. Spec T-I8 says the script must reject SandboxKyle002 as an <infrastructure> argument with a PDEV-438 reference. | Add a grep-based check that the test file contains a SandboxKyle002 rejection case, or invoke the script with SandboxKyle002 <whatever> and assert non-zero exit + the expected usage-block output. | Eliminate any reference to SandboxKyle002. and its partitions. They are retired. |
E. Choreography / cross-document reconciliation
Section titled “E. Choreography / cross-document reconciliation”| # | Severity | Finding | Recommended action | Guidance |
|---|---|---|---|---|
| E1 | Should resolve | Worktree mapping in choreography.md § 4 says PRs #1, #2, #5, #6, #7 share phase-4/infrastructure/ and only Runs 3 + 4 need task-specific worktrees. The operator decision (separate Run-2 worktree to keep Run-1 / PR #455 undisturbed) means Run-2 also has a task-specific worktree at phase-4/infrastructure-run-2. | Update choreography.md § 4 “Worktree mapping” to reflect the actual layout. Either add Run-2 as a third sequential worktree, or document the deviation as a one-off because Run-1 was still in review when Run-2 started. | Agreed |
| E2 | Should resolve | T-O4 timing ambiguity. Choreography places T-O4 as the operator gate between Run-2 and Run-3 (it unlocks arda-nonprod for Run-3’s stage Sender Signature). Run-2 plan lists it as one of Run-2’s own tasks (table row 9). Both are right — it happens at the boundary — but an implementer might think the email-thread artefact lands inside the Run-2 PR diff. | Reword Run-2 plan T-O4 row to: “Operator-driven; sent after dev deploy is signed off. Closes Run-2 from a sign-off standpoint AND unblocks Run-3 (stage). Email-thread artefact, not a code change.” | Agreed |
| E3 | Polish | amm.sh argument convention. Spec T-I10 says “likely a --partition <name> flag or positional arg” (punted). Run-2’s validate-exit.sh uses ./amm.sh Alpha002 dev (positional). Workspace memory feedback_amm_sh_for_deploys says “./amm.sh <infra> with no partitions deploys infra-only (skips partition loop)” — implying canonical shape is positional, with partition optional. | Pin in T-I10 (already covered by B2). This row is redundant if B2 is resolved; included here for completeness against the choreography. | Agreed, make sure all documents are consistent (same wording if that is the case) |
Summary of top items to resolve before authoring
Section titled “Summary of top items to resolve before authoring”- A3 — Deploy timing (pre-merge vs post-merge for
amm.sh Alpha002 dev). - B1 — Operational definition for the
dmarc-reports@arda.cardsmailbox health check. - C2 — Pin both the TS variable identifiers and the CFN export-name strings for the six
-API-exports in one authoritative table.
Everything else is resolvable incrementally during Run-2 authoring, but the three above are decision points that benefit from operator input before an implementer commits to interpretation.
Copyright: (c) Arda Systems 2025-2026, All rights reserved
Copyright: © Arda Systems 2025-2026, All rights reserved