Phase 2 -- Implementation Suggestions
Forward-looking improvements identified during Phase 2 implementation. Each suggestion lists what it improves, why it’s not in Phase 2’s scope, and what triggers acting on it.
S-1: Add RemovalPolicy.RETAIN to the arda.cards-family hosted zones
Section titled “S-1: Add RemovalPolicy.RETAIN to the arda.cards-family hosted zones”Today: RootDnsStack’s four arda.cards family zones (app, io, auth, assets) inherit CFN’s default removal policy. A cdk destroy RootConfiguration (operator error, or an automated CI cleanup against the wrong account) would delete all four zones and orphan every partition’s NS-delegation chain.
Suggestion: extend the RemovalPolicy.RETAIN discipline introduced for ardamailsZone (per DQ-R1-008) to the four arda.cards family zones. Two-line change in root-dns-stack.ts; no CFN-template diff because retention policies don’t affect the resource block. Add a parallel V-ROOT-00X strict-match assertion for each zone.
Trigger: low-friction, high-value. Recommend bundling into Phase 3’s documentation PR or a short follow-up infrastructure PR after Phase 2 merges. The risk that motivated RemovalPolicy.RETAIN for ardamailsZone (production zone, manual intervention to recover) is identical for the family zones.
S-2: Drift detection for hosted zones and IAM roles outside CFN
Section titled “S-2: Drift detection for hosted zones and IAM roles outside CFN”Today: Phase 1’s tools/drift-check.ts covers external op:// references against the Postmark API. Phase 2 ships a hosted zone that was originally created outside CFN; future operator action against any AWS console (clicking “delete” in Route53, deleting an IAM role) would drift the live state from the CFN view, and CFN would not detect this until a subsequent deploy.
Suggestion: extend tools/drift-check.ts (or a sibling tools/drift-check-aws.ts) to assert key invariants of the live AWS account against the CFN-stack export contract: arda-ardamails-zone resolves to a hosted zone whose name is ardamails.com., the four arda.cards family exports resolve to existing zones with matching names, the AllowCreatingNSRecordsRole role exists with the canonical name and trust policy. One AWS API call per assertion, runs on the same monthly schedule as the existing drift workflow.
Trigger: when Phase 4 is being planned. Per-partition deploys consume the role and the family zone exports; the cost of an undetected drift grows with the consumer count. A modest investment (~150 LOC) in Phase 4-or-earlier.
S-3: A reusable “adopt-existing-resource” pattern doc
Section titled “S-3: A reusable “adopt-existing-resource” pattern doc”Today: the IMPORT detour in Phase 2 was figured out one-off (see learnings.md L-2, L-3, L-5). The discoveries — IMPORT change-set’s modification ban, comment-must-match-live-zone, mangled-logical-ID lookup, two-phase deploy — are already captured in the project’s decision log (DQ-R1-008) and learnings. They are not yet in a generalised current-system/architecture/patterns/ page.
Suggestion: write a “Adopting an Existing AWS Resource into CDK/CFN” pattern page under current-system/architecture/patterns/iac/. Covers: pre-deploy AWS-CLI inventory check (L-1, L-4), strict-property-mirror discipline (L-3), two-phase IMPORT deploy choreography (L-2, L-5), strict-equality test pattern (L-3). Cite Phase 2’s DQ-R1-008 as the worked example.
Trigger: when the second adoption case appears — e.g., adopting an existing IAM role, KMS key, or a manually-provisioned Lambda. The pattern doc pays back when there are at least two consumers; one (Phase 2) is fine to leave as a learnings entry.
S-4: A cdk-import-helper script in tools/
Section titled “S-4: A cdk-import-helper script in tools/”Today: the IMPORT change-set staging in Phase 2 was a sequence of aws cloudformation get-template, jq template-merge, aws cloudformation create-change-set --change-set-type IMPORT, describe-change-set, execute-change-set, with manual error-recovery between attempts. The pattern is reproducible but requires staring at five tabs.
Suggestion: a small tools/cdk-import.ts (or .sh) that takes --stack-name, --logical-id, --resource-type, --resource-id and wraps the get-template / merge / create-change-set / describe / wait-for-clean / execute steps into a single command. Behaves like a CDK-aware cdk import that defends against the modification-ban gotcha (compares synth template’s resource block to the deployed-state template before merging; aborts if any other resource has drifted).
Trigger: when the third IMPORT case lands (low confidence — IMPORTs in this codebase are rare). The cost (~300 LOC) is moderate; the benefit (no IMPORT detour ever needs the failed-attempt rehearsal Phase 2 went through) is high. Skip until the third case proves the pattern.
S-5: Tighten AllowCreatingNSRecordsRole.allowedParentHostedZoneIds
Section titled “S-5: Tighten AllowCreatingNSRecordsRole.allowedParentHostedZoneIds”Today: RootDnsStack exposes the AllowCreatingNSRecordsRole IAM role with allowedParentHostedZoneIds: ["*"] (or the equivalent broad scope). Phase 3 / Phase 4 child stacks assume the role to write NS records into Root’s zones (arda.cards family + ardamails.com).
Suggestion: tighten allowedParentHostedZoneIds to the explicit list of Root-managed zone IDs (the four arda.cards family + ardamails.com). Token-resolved at synth time from the same instances/Root/dns.ts source of truth that Phase 2 introduced. Reduces the role’s blast radius from “any zone in the Root account” to “zones the architecture intends”.
Trigger: a separate security-hardening project, or post-Phase-4 once the tightened scope is verifiable end-to-end. Recorded in skipped.md SK-1 of this phase as out-of-scope.
S-6: A canonical CFN-stack-name preservation lint
Section titled “S-6: A canonical CFN-stack-name preservation lint”Today: V-IAC-003 (the inline comment above the stack constructor call) is asserted by a single grep test against apps/Root/r53-zones.ts. The pattern is now adopted in two places (RootConfiguration is the second, after the existing partition IngressStack lineage). Future stacks that need name preservation will repeat the comment and the test.
Suggestion: a lightweight ESLint or tsc-pass custom rule that flags any new XStack(...) call whose third argument is a literal string and verifies the line immediately above contains the magic comment marker. Centralises the convention; surfaces forgotten comments at lint time, not test time.
Trigger: when the third stack joins the “name-preservation matters” club. Two is fine to handle by hand; three is the cue for codification.
S-7: Parametric instances/<InstanceGroup>/dns.ts factory
Section titled “S-7: Parametric instances/<InstanceGroup>/dns.ts factory”Today: instances/Root/dns.ts (Phase 2) exports zone-name constants and expectedExports literals. Phase 3 will introduce instances/Corporate/dns.ts with the same shape; Phase 4 will introduce instances/<Partition>/dns.ts per partition. Each is a hand-typed file.
Suggestion: when Phase 4’s per-partition variant lands, extract a small instances/lib/dns-config.ts factory that takes { instanceGroupName, parentMailRootZone, subdomain } and produces the typed export. Reduces the per-partition file to a one-line factory call. Don’t pre-extract during Phase 2 / Phase 3 — premature abstraction with one consumer.
Trigger: Phase 4 Tasks 1-3. Three consumers (Root, Corporate, partition) is the natural cue.
S-8: Live registrar-NS-chain assertion in the drift workflow
Section titled “S-8: Live registrar-NS-chain assertion in the drift workflow”Today: Phase 2’s verification regime confirms the ardamails.com zone has the four AWS-default nameservers and the registrar’s NS chain matches. The assertion is a one-time walkthrough check; nothing in CI revisits it.
Suggestion: a small monthly check (likely as part of the external-resources-drift.yml workflow once it generalises beyond Postmark) that calls the AWS Route53 Domains GetDomainDetail API for ardamails.com and confirms the registrar’s NS list matches the hosted zone’s NS list. Catches the “someone changed the registrar NS chain” failure mode — low-probability, high-impact.
Trigger: Phase 4 is the natural place to absorb this (it adds per-partition mail sub-zones and is the second consumer of the registrar NS chain). Add as a single assertion to the existing drift workflow rather than spinning up a parallel one.
S-9: Promote scratch-template-merge into a documented operator pattern
Section titled “S-9: Promote scratch-template-merge into a documented operator pattern”Today: the scratch/importv2.json build (deployed-template + new-resource-only) was a small piece of jq plumbing. It’s not documented as a pattern beyond the learnings.md L-2 entry.
Suggestion: when S-3 (the pattern doc) is written, include the exact jq invocation as a code block; cite it as a reusable shape. Two follow-up consumers (Phase 4 partition adoption, if any) would benefit; otherwise the learnings.md entry is sufficient.
Trigger: same as S-3 — pattern doc creation is the natural carrier.
Copyright: © Arda Systems 2025-2026, All rights reserved