Phase 3 -- DQ-R1-009 Placement Divergence and Resolution
A defect found by the Phase B post-deploy verification, what caused it, why no test caught it, and how the implementation now prevents the same shape of defect recurring. The fix is in Arda-cards/infrastructure PR #450 (commit cd85527).
DQ-R1-009 chose Option B for the Corporate instance group: register the Postmark Sender Signature at the Corporate zone parent (arda.ardamails.com); leaves inherit DKIM via the parent’s signing key. The decision was recorded as prose only — in the decision log, in a docstring on instances/Corporate/corporate.ts, and in the operator runbook — but not as a value or function any code consumed.
As a result, the two consumers of the decision composed the FQDNs inline and disagreed:
- Phase A (
tools/corporate-cli.ts) registered the parent on Postmark — correctly honoring DQ-R1-009. - Phase B (
stacks/corporate/free-kanban-tool-mail-dns.tsviaconstructs/xgress/dns-email-records.ts) published the DKIM TXT record at the leaf (freekanban.arda.ardamails.com) — silently contradicting DQ-R1-009.
The byte-correctness of the DKIM key value (the same key Postmark issued, written into cdk.context.json, then served from DNS) made every existing single-side test pass. The Postmark API reported DKIMVerified: false / DKIMUpdateStatus: Pending because the host where it was looking and the host where the record sat were not the same.
Timeline
Section titled “Timeline”| When | Event |
|---|---|
| Phase A executed | corporate-cli prepare free-kanban ran; Postmark Sender Signature registered for arda.ardamails.com; cdk.context.json populated with DKIM selector and key (under key prefix postmark.ardaArdamailsCom.*). |
| Phase B (initial) executed | CorporateMailDns and FreeKanbanToolMailDns deployed. DKIM TXT placed at <selector>._domainkey.freekanban.arda.ardamails.com (leaf — wrong per DQ-R1-009). Return-Path CNAME at pm-bounces.freekanban.arda.ardamails.com (leaf — correct per DQ-R1-009). |
| Phase B verification | dig at the leaf returns the DKIM key (byte-matches cdk.context.json). Postmark API: DKIMVerified: false / DKIMUpdateStatus: Pending; DKIMPendingHost reads <selector>._domainkey.arda.ardamails.com (the parent — where Postmark is looking). |
| Diagnosis | The diff between the two sides revealed Phase B’s DnsEmailRecords was placing DKIM at the leaf because the construct API exposed only a subdomain parameter and composed both record names as <...>.<subdomain>. Tracing the design intent back to DQ-R1-009 confirmed the leaf placement was the defect (not the parent registration). |
| Fix landed | PR #450 commit cd85527: typed source-of-truth function sendingDomainPlacement() and helpers, three consumers refactored to read it, cross-seam drift assertion added. |
| Re-deploy | FreeKanbanToolMailDns update-in-place: DKIM TXT moved from leaf to parent (CFN replace because Route53 RecordSet keys on Name+Type). Return-Path unchanged. ~47s deploy time. |
| Postmark verification | verifyDkim + verifyReturnPath Account API calls flipped both to Verified on first poll. |
Root cause
Section titled “Root cause”The decision constrained two record placements (DKIM at parent, Return-Path at leaf) that look superficially similar but are different by design. The construct that emitted those records, DnsEmailRecords, had a single subdomain: string parameter that controlled both record paths — conflating the two concerns into one input.
// Before: one input, both records share its placement.recordName: `${dkimSelector}._domainkey.${subdomain}`, // DKIM at <subdomain>recordName: `pm-bounces.${subdomain}`, // Return-Path at <subdomain>The construct could not distinguish “DKIM at parent” from “Return-Path at leaf”. It defaulted to the simpler convention (both at the same level), and the caller obeyed.
The root cause is therefore one level deeper than the construct’s API surface: the design decision was prose only. Nowhere in the codebase was DQ-R1-009 expressed as data or a function. The construct cannot honor a decision it has no way to read; it follows its own implicit convention, and the decision is silently violated. With this representation gap, the same shape of defect would recur anywhere else the decision had to be applied (a second Corporate consumer adding its own stack, or a future partition adopting the same pattern).
Why the tests did not detect it
Section titled “Why the tests did not detect it”Every existing test verified one side in isolation:
| Test surface | What it checks | Why it missed the seam |
|---|---|---|
corporate-cli.test.ts (Phase A) | The CLI calls POST /domains and writes context | The body’s Name field was not compared against the design-intent FQDN; only that the call happened. |
corporate.test.ts | ${sendingSubdomain}.${corporateZoneName} composes to "freekanban.arda.ardamails.com" | Asserts what a From: domain would look like, never what the construct actually publishes in DNS. |
free-kanban-tool-mail-dns.test.ts (Phase B) | CDK template’s DKIM recordName matches <selector>._domainkey.<subdomain> | The convention is what the construct emits; the assertion mirrors the construct rather than challenging it. Effectively pinned the wrong placement. |
dns-email-records.test.ts | Construct behavior given its inputs | Same — internal consistency, not contract against an external truth. |
ci-corporate-check.js (CI synth gate) | Construct graph synthesises with placeholder context | No Postmark API interaction; cannot see the seam. |
corporate-drift.ts (pre-fix) | DNS records exist at the FQDN the construct would publish, and match cdk.context.json | Probed the deployed host, compared to local context. Never compared against Postmark’s DKIMPendingHost / DKIMHost. |
Pre-deploy cdk diff | Template-level differences | The construct’s template was internally correct (matched its own convention). |
The gap was structural: no artifact crossed the seam. No code or test ever asserted
“the FQDN at which Postmark expects DKIM (
domain.DKIMPendingHost/domain.DKIMHost) must equal the FQDN of the DKIM record CDK publishes”.
Compounding this, two indicators that did succeed (DKIM key byte-match, SPFVerified: true) made the failing indicator (DKIMVerified: false) easy to overlook unless explicitly read.
The fix (PR #450, commit cd85527)
Section titled “The fix (PR #450, commit cd85527)”Three structural changes that together close the representation gap and the cross-seam test gap.
(1) DQ-R1-009 expressed as a typed source-of-truth
Section titled “(1) DQ-R1-009 expressed as a typed source-of-truth”New code in src/main/cdk/platform/constructs/postmark/sending-domain.ts:
export interface SendingDomainPlacement { readonly fromDomain: string; // leaf — recipient's From: header readonly postmarkDomainName: string; // parent — Postmark Sender Signature Name (DQ-R1-009) readonly dkimHostName: string; // parent — host where DKIM TXT lives readonly returnPathHostName: string; // leaf — host the Return-Path anchors at}
export function sendingDomainPlacement(args: { sendingSubdomain: string; corporateZoneName: string;}): SendingDomainPlacement { /* … encodes DQ-R1-009 Option B … */ }
export function dkimRecordFqdn(p, selector): string { return `${selector}._domainkey.${p.dkimHostName}`;}export function returnPathRecordFqdn(p): string { return `pm-bounces.${p.returnPathHostName}`;}The decision is now a value, not prose. Adding a future Corporate consumer requires no per-stack re-derivation; everyone reads the same function.
(2) Three consumers refactored
Section titled “(2) Three consumers refactored”| Consumer | Before | After |
|---|---|---|
corporate-cli.ts (Phase A) | senderDomainName = corporateZoneName (inline, accidentally correct) | senderDomainName = sendingDomainPlacement(...).postmarkDomainName |
free-kanban-tool-mail-dns.ts + dns-email-records.ts (Phase B) | subdomain param drove both record paths | DKIM and Return-Path are independent absolute FQDNs supplied by the caller, composed from the placement function |
corporate-drift.ts (drift check) | Probed FQDNs the construct would have published | Probes FQDNs from the placement function; also asserts Postmark’s reported state matches it |
The DnsEmailRecords construct now takes two absolute FQDNs (dkimRecordFqdn, returnPathRecordFqdn) instead of one shared subdomain. It strips the zone-name suffix internally to produce zone-relative record names; throws at synth if either FQDN is not a sub-domain of the hosted zone (an upstream placement bug fails loudly rather than producing a malformed record).
(3) Cross-seam drift assertion
Section titled “(3) Cross-seam drift assertion”corporate-drift.ts gained three new checks against the Postmark Account API in addition to its existing DNS-vs-context comparison:
postmark:sender-signature-name:<expected>— Postmark’sdomain.Nameequalsplacement.postmarkDomainName.postmark-dns-agreement:dkim-host— Postmark’sDKIMHost/DKIMPendingHostends with._domainkey.<placement.dkimHostName>.postmark-dns-agreement:return-path-domain— Postmark’sReturnPathDomainequalspm-bounces.<placement.returnPathHostName>.
A future drift between Phase A, Phase B and DNS now surfaces in the monthly run.
What is locked by tests
Section titled “What is locked by tests”| Test | Asserts |
|---|---|
sending-domain.test.ts § “sendingDomainPlacement (DQ-R1-009)“ | The four FQDNs against canonical inputs; siblings under arda.ardamails.com share dkimHostName (inheritance); function is pure. |
corporate-cli.test.ts | POST /domains body Name === sendingDomainPlacement(...).postmarkDomainName. The CLI registers what the function says. |
free-kanban-tool-mail-dns.test.ts | DKIM record’s Name equals dkimRecordFqdn(placement, selector) (mechanical link from decision to template). Belt-and-braces: DKIM Name must not contain the sending sub-domain. |
dns-email-records.test.ts | Construct is shape-neutral (caller-supplied FQDN); throws on FQDN-not-under-zone. |
corporate-drift.test.ts § “cross-seam” | Positive case + three failure cases (wrong Name, wrong DKIM host suffix, wrong Return-Path domain). |
If any consumer drifts away from sendingDomainPlacement() in future, at least one of these tests fails at the construct or at synth time.
Implications and follow-ups
Section titled “Implications and follow-ups”- DQ-R1-009 scope clarified. The parent-verification rule applies within an instance group, not across. Corporate’s parent is
arda.ardamails.com. Phase 4’s per-partition sub-zones (prod.ardamails.com,dev.ardamails.com, etc.) are siblings under theardamails.comroot and will each have their own Sender Signature anchored at their respective sub-zone, with their own DKIM keys, so deliverability reputation is independent per environment. Theardamails.comapex is not a verification target. - A new decision will pin Phase 4 Signature granularity (per-partition Signature is the working assumption; per-tenant Signature is a Phase 5b decision). Not blocking Phase 3 completion; flagged for Phase 4 planning.
- Pre-existing principle reinforced. Design decisions that constrain code should be encoded as code (typed values, functions, or assertions) — not only as prose. Prose alone leaves room for consumers to re-derive the same value and disagree. This applies workspace-wide; the corollary is to add cross-seam assertions when two systems must agree on a derived value.
Copyright: © Arda Systems 2025-2026, All rights reserved