Skip to content

Decision Log: Email Integration

Tracks design decisions for the Arda Email Integration project, covering domain structure, sending model, tenant isolation, address handling, and subsystem responsibilities.

#QuestionStatusDecisionRound
DQ-001Tenant sending domain structureDecided<tenant>.<partition>.{mail-root-domain} uniformly (see DQ-010)R1
DQ-002Multi-config domain strategyDecidedSub-subdomain, deferred to v2+R1
DQ-003Tenant slug sourceDecidedFrom provisioning request (tenantEId, tenantName, tenantSlug); algorithm deferredR1
DQ-004Reply-To editabilityDecidedNot user-editableR1
DQ-005Email order send pathsDecidedCopy-paste (existing) + system send (new)R1
DQ-006CS alerting scope in v1DecidedESP OOTB only; Arda-built is v2+R1
DQ-007Document generation responsibilityDecidedCalling feature, not email capabilityR1
DQ-008Send dialog interaction modelDecidedSingle-step (no separate confirm)R1
DQ-009Mail root domain choiceDecidedardamails.com (implementation parametric)R1
DQ-010Prod tenant zone placementDecidedOwn partition zone (prod.{mail-root-domain}), not root zoneR1
DQ-011Webhook authentication mechanismDecidedBearer token via Postmark modern Webhooks APIR1
DQ-012Per-tenant server token storageDecidedEncrypted in DB (application-level), not Secrets ManagerR1
DQ-013IAM role extraction from root stackDecidedDo not extract; role stays in RootDnsStackR2
DQ-R1-001Drift workflow filenameDecidedexternal-resources-drift.yml — describes the asserted invariantR1-Phase1
DQ-R1-002Drift-check TypeScript locationDecidedtools/drift-check.ts — operator- and CI-runnableR1-Phase1
DQ-R1-003Operator runbook sign-off mechanismDecidedMarkdown “Operator Sign-off” section with name/date/deviations tableR1-Phase1
DQ-R1-004Disposition of legacy parser-gated runbookDecidedDelete in Phase 1 — no parser gate remainsR1-Phase1
DQ-R1-005API-surface freshness cadenceDecidedAt first drift-test failure attributable to surface drift, augmented by an annual reviewR1-Phase1
DQ-R1-006Locus of cross-zone NS-delegation writesDecidedChild zone owner writes upstream via WriteNSRecordsToUpstreamDns; Root only owns the assume-role targetR1-Phase2
DQ-R1-007Vault separation for Free Kanban Tool server tokenDecidedLives in Arda-CorporateOAM (separate vault), not Arda-SystemsOAMR1-Phase1
DQ-R1-008Adopt-vs-create the existing ardamails.com zoneDecidedAdopt via cdk import against Z0721066239FWCD47EJDX; CDK code mirrors the live zone’s AWS-default comment to keep the import read-onlyR1-Phase2
DQ-R1-009Postmark domain-verification target (parent vs leaf)DecidedVerify at the Corporate-zone parent (arda.ardamails.com); leaf sub-domains inherit DKIMR1-Phase3
DQ-R1-010Locus of Corporate’s NS-delegation write (same-account)DecidedAlways go through WriteNSRecordsToUpstreamDns and assume the Root role even when same-account; preserves the pattern under future Corporate-account migrationR1-Phase3
DQ-R1-011route-53-hosted-zone.tsdns-zone.ts migration shapeDecidedRename in place; existing callers updated in the same PRR1-Phase3
DQ-R1-012Corporate drift-workflow filename and scopeDecidedcorporate-drift.yml — one workflow per instance group, exercising every asset listed in instances/Corporate/R1-Phase3
DQ-R1-013Phase A failure ordering for the Postmark server tokenDecidedIn-memory buffer + retries on the 1Password write; fail loud with redacted summary on permanent failure; manual operator action to recoverR1-Phase3
DQ-R1-014cdk.context.json commit policy for Phase A’s outputsDecidedCommit cdk.context.json — public values only, standard CDK convention, deterministic re-synth on a fresh checkoutR1-Phase3
DQ-R1-015DMARC reporting mailbox (rua / ruf) for _dmarc.arda.ardamails.comDecideddmarc-reports@arda.cards; operator action to create the mailbox in Arda’s Google Workspace before Phase B deployR1-Phase3
DQ-R1-016Reserved-name registry scope at arda.ardamails.comDecidedDocumentation-only; corporate-cli.ts enforces locally via a conflict-check at Phase A entry against pre-existing Postmark Sender Signatures, servers, and 1Password itemsR1-Phase3
DQ-R1-017Postmark Sender Signature granularity per partitionDecidedOne Signature per partition sub-zone; leaves inherit DKIM; per-tenant Signatures deferred to Phase 5bR1-Phase4
DQ-R1-018corporate-drift rename and scopeDecidedKeep corporate-drift; add a parallel runtime-platform-drift workflow with shared reusable scriptsR1-Phase4
DQ-R1-019Per-partition email server-token encryption keyDecidedSingle SM secret per partition with native versioning; two-axis envelope a{N}.k{SM-VERSION-ID}; hot-swap via AWSCURRENT+AWSPREVIOUS mounts; lazy + coroutine migration; SDK fallbackR1-Phase4
DQ-R1-020DNS-provisioning + SM-fallback IAM rolesDecidedFresh per-purpose roles assumed via STS from the operations pod role (mirroring the image-asset-bucket preSigningRole pattern); trust policy = account principal + ArnLike on the partition role-name prefixR1-Phase4
DQ-R1-021Order of partition rolloutDecideddevstagedemoprod; kyle excluded (partition suspended)R1-Phase4
DQ-R1-022Operator CLI shape for Phase 4DecidedIntegrate into amm.sh; extract reusable utilities shared with corporate-cli (no standalone partition-mail-cli)R1-Phase4
DQ-R1-023Per-tenant Postmark Sender Signature introduction (Phase 5b)Open — TBC at Phase 5b planningFour options (α status quo / β per-tenant v1 / γ hybrid opt-in / δ remediation-only). No Phase 4 dependency.R1-Phase5b

Context: Each tenant needs an isolated sending domain for DKIM, SPF, and DMARC. The domain shape affects FQDN length, DNS zone management, and future extensibility. The choice of mail root domain itself is a separate decision (see DQ-009); this decision addresses the structure beneath whatever root is chosen.

OptionDescriptionTrade-offs
A<tenant>.{mail-root-domain} (prod), <tenant>.<partition>.{mail-root-domain} (non-prod)Short prod FQDNs but requires prod tenant records in the root zone (cross-account writes, mixed static/dynamic records).
B<tenant>.<partition>.{mail-root-domain} uniformly for all partitionsOne extra label in prod FQDNs. Consistent structure, clean IAM scoping, root zone stays static.
CFull canonical: <partition>.<infra>.{mail-root-domain} per tenantConsistent with existing Arda pattern but longest FQDNs; tenant identity buried in subdomain hierarchy.

Recommendation: Option A initially; revised to Option B after DQ-010.

Decision: Option B. Uniform <tenant>.<partition>.{mail-root-domain} across all partitions. The one-label cost in prod is outweighed by consistent zone structure, clean IAM, and a static root zone. See DQ-010 for the detailed rationale.

Applied to:


DQ-002: Multi-Configuration Domain Strategy

Section titled “DQ-002: Multi-Configuration Domain Strategy”

Context: A tenant may eventually need multiple email configurations (e.g., separate sending domains for procurement vs. shipping). The v1 domain structure must not block this. Builds on DQ-001 and DQ-010, which fix the canonical Application-Runtime-tenant shape as <tenant-slug>.<partition>.{mail-root-domain}; this decision adds the <conf-slug> label.

OptionDescriptionTrade-offs
ASub-subdomain: <conf-slug>.<tenant-slug>.<partition>.{mail-root-domain}Each config gets independent DKIM key and reputation. DMARC can apply at tenant level with subdomain policy. DNS hierarchy is explicit and parseable. Adds a label.
BComposite slug: <conf-slug>-<tenant-slug>.<partition>.{mail-root-domain}Flat structure at the conf-tenant boundary, shorter. But: hyphen boundary is ambiguous (parsing fragility), all configs share one DKIM key (defeats isolation purpose), no per-config DMARC override.

Recommendation: Option A — sub-subdomain preserves DKIM isolation and DNS hierarchy.

Decision: Option A. v1 provisions at <tenant-slug>.<partition>.{mail-root-domain} (single config, no <conf-slug> label; partition included per DQ-001 / DQ-010). Schema includes nullable config_slug field for v2+. Adding <conf-slug>.<tenant-slug>.<partition>.{mail-root-domain} later is additive — no migration of existing domains. Trade-off noted: if v2+ wants the default config to also live at a sub-subdomain, existing supplier address books would need updating, but this is opt-in, not forced.

Applied to:

  • (No surviving design artefact references this decision; recorded here for traceability.)

Context: The sending domain uses a tenant slug (<slug>.{mail-root-domain}). The slug must be DNS-safe (lowercase alphanumeric + hyphens), validated against reserved words, and permanent (changing it requires DKIM reputation re-warming and supplier address book updates).

OptionDescriptionTrade-offs
ANew field on Tenant entityExplicit, decoupled from display name. Requires schema change and UI for CS to set it.
BDerived from tenant name automaticallyNo new field. But: tenant names may contain spaces/special chars, derivation rules need defining, name changes would create inconsistency.
CProvided by CS at provisioning time as separate inputNo schema change on Tenant. Slug stored only in tenant_email_config. But: not visible in tenant management UI, potential for typos.

Recommendation: Option A — a permanent identifier deserves an explicit field with validation.

Decision: The tenant slug is provided as part of the provisioning request alongside tenantEId and tenantName. The slug and name may be null; the emailConfiguration service determines the final slug using a combination of the three inputs. The specific derivation algorithm is deferred to implementation. The slug is stored on the EmailConfiguration entity, not on the Tenant entity.

Applied to:


Context: When sending an order by email, should the user be able to edit the Reply-To address in the send dialog?

OptionDescriptionTrade-offs
AEditable (To, Cc, Reply-To all editable)Maximum flexibility. Risk: user sets Reply-To to an address they don’t control, replies go to wrong person.
BRead-only (To and Cc editable, Reply-To resolved by system)Controlled. Reply-To is always the procurement contact or user’s own email. v2+: tenant-configured functional address.

Recommendation: Option B — Reply-To should be system-controlled to prevent misdirected replies.

Decision: Option B. Reply-To resolved in order: (1) procurement.email from order header, (2) user email from JWT/ApplicationContext. Displayed as read-only in send dialog. v2+: tenant may configure a functional Reply-To (e.g., “procurement inbox”).

Applied to:

  • product/features/general-behaviors/email-communications.md (feature; not yet authored) § Sending Model
  • product/features/procurement/email-orders.md (feature; not yet authored) § Recipient Resolution, Requirements FR-0004
  • product/use-cases/general-behaviors/email-communications.md (use cases; not yet authored) § GEN::EML::0001::0003
  • product/use-cases/procurement/email-orders.md (use cases; not yet authored) § PRO::EML::0001::0004

Context: Email orders currently use a copy-paste workflow (side panel renders text, user copies to their own client). The new email capability adds system-send. Should copy-paste be removed?

OptionDescriptionTrade-offs
AReplace copy-paste with system sendSimpler UX, one path. But: breaks existing workflow, users who prefer their own client lose that option.
BBoth paths coexistBackward compatible. Copy-paste preserved for email orders; system send added as new option. PO orders are system-send only (no existing copy-paste path for PO).

Recommendation: Option B — backward compatibility with no user disruption.

Decision: Option B. Copy-paste is the existing path that stays as-is. System send is a new parallel path. For orderMethod=PURCHASE_ORDER, only system send is available (PDF attachment requires system involvement).

Applied to:

  • product/features/procurement/email-orders.md (feature; not yet authored) § Overview, Requirements FR-0011, FR-0012
  • product/use-cases/procurement/email-orders.md (use cases; not yet authored) § PRO::EML::0002::0002

Context: The feature specifies bounce rate > 5% and complaint rate > 0.1% thresholds triggering CS alerts. Should Arda build this alerting in v1?

OptionDescriptionTrade-offs
AArda-built alerting from day oneFull control, custom thresholds. Engineering cost in v1.
BRely on ESP’s built-in alerting in v1, Arda-built in v2+Postmark provides bounce/complaint alerting OOTB via its console. No engineering cost. Less customizable.

Recommendation: Option B — Postmark’s console alerting is sufficient for v1 at 100-150 tenants.

Decision: Option B. v1 relies on Postmark’s built-in alerting. Arda-built alerting with configurable thresholds is v2+.

Applied to:

  • product/features/general-behaviors/email-communications.md (feature; not yet authored) § Administration
  • product/use-cases/general-behaviors/email-communications.md (use cases; not yet authored) § GEN::EML::0004::0003

DQ-007: Document Generation Responsibility

Section titled “DQ-007: Document Generation Responsibility”

Context: For PO-by-email, a PDF must be generated and attached. Should the general email capability generate documents, or receive them pre-generated?

OptionDescriptionTrade-offs
AEmail capability generates documentsCentralized, but couples email to PDF pipeline. Email capability needs to know about order rendering.
BCalling feature generates document, passes Blob/URL to email capabilityClean separation. Email capability is document-agnostic. Calling feature handles generation errors before invoking email.

Recommendation: Option B — email capability should not know about document types.

Decision: Option B. The calling feature generates the PDF and passes it as a Blob or URL. If generation fails, the calling feature handles the error; email capability is never invoked.

Applied to:

  • product/use-cases/general-behaviors/email-communications.md (use cases; not yet authored) § GEN::EML::0002::0002
  • product/use-cases/procurement/email-orders.md (use cases; not yet authored) § PRO::EML::0003::0002

Context: Should the send flow have separate “edit addresses” and “confirm send” steps, or a single combined dialog?

OptionDescriptionTrade-offs
ATwo steps: address resolution → confirmation dialogExplicit separation. But: unnecessary friction if defaults are correct — user clicks through two dialogs to send.
BSingle-step dialog with editable fields + previewOne interaction: if defaults are correct, user just hits “Send.” Cancel with edits prompts for confirmation.

Recommendation: Option B — minimize friction for the happy path.

Decision: Option B. Single-step send dialog with To/Cc editable, Reply-To read-only, content preview. Cancel prompts if edits were made.

Applied to:

  • product/use-cases/general-behaviors/email-communications.md (use cases; not yet authored) § GEN::EML::0001::0003 (merged from former 0003+0004)
  • product/use-cases/procurement/email-orders.md (use cases; not yet authored) § PRO::EML::0001::0001

Context: All tenant sending domains are subdomains of a root mail domain. The choice of root domain affects reputation separability from the app domain (arda.cards), DNS delegation mechanics, and FQDN length.

OptionDescriptionTrade-offs
Amail.arda.cards (subdomain of app domain)No new domain registration. Shorter FQDNs if tenants are already familiar with arda.cards. But: shares reputation baseline with arda.cards — a deliverability incident on the app domain could affect mail, and vice versa. NS delegation from GoDaddy apex.
BStandalone domain (e.g., arda-mail.com or similar)Fully independent reputation from app domain. Clean separation for compliance or brand reasons. But: requires new domain registration and management. Tenants see an unfamiliar domain.
COther subdomain of arda.cards (e.g., email.arda.cards, send.arda.cards)Same trade-offs as Option A with a different label.

Recommendation: Option B — standalone domain for full reputation separation.

Decision: Option B. ardamails.com (already owned, registered with Route53 in platformRoot account). Implementation must be parametric on the root domain value so it can be changed later if needed. The {mail-root-domain} parameter in infrastructure.md resolves to ardamails.com.

Applied to:

  • infrastructure.md § Parameters (entire document parametrized)
  • All documents using {mail-root-domain} notation

Context: The original design (exploration doc, Working Assumption C) placed prod tenant records directly in the root zone ({mail-root-domain}) to achieve shorter prod FQDNs (4 labels: acme.ardamails.com). Non-prod partitions each had their own delegated zone. This creates an asymmetry where the root zone contains both static infrastructure records (SPF, DMARC, NS delegations) and runtime-provisioned tenant records, and the operations service in Alpha001 needs write access to a zone in platformRoot.

OptionDescriptionTrade-offs
AProd tenants in root zone (original)Shorter prod FQDNs (4 labels). But: root zone mixes static and dynamic records, prod provisioning needs cross-account write access to platformRoot, IAM scoping is more complex, root zone is not CDK-only.
BProd gets its own partition zone (prod.{mail-root-domain})One extra label in prod FQDNs (5 labels: acme.prod.ardamails.com). Uniform structure across all partitions, clean IAM (Alpha001 writes to its own zones), root zone stays static/CDK-only, no cross-account writes for tenant records.

Recommendation: Option B — consistency, clean IAM boundaries, and a static root zone outweigh one label of FQDN length.

Decision: Option B. All partitions (dev, stage, demo, prod) get their own delegated zone under {mail-root-domain}. The root zone contains only NS delegations and parent SPF/DMARC records — no runtime-provisioned records. This supersedes the “Working Assumption C” FQDN shape from the exploration doc for prod.

Applied to:

  • DQ-001 — revised from Option A to Option B
  • infrastructure.md § DNS (Tenant Domain Shape table, zone tables, IAM scoping)
  • DNS-structure diagram: see mail-dns-structure.drawio.svg in public/assets/diagrams/ (rendered inline in exploration/infrastructure.md § DNS).

Context: Postmark sends delivery status events (Delivery, Bounce, SpamComplaint) to a webhook URL on the Arda backend. The endpoint must verify that incoming requests are genuinely from Postmark. Postmark does not sign webhook payloads (no HMAC/signature). Two authentication approaches are available.

OptionDescriptionTrade-offs
AHTTP Basic Auth credentials embedded in the webhook URLSupported via legacy server-level fields (DeliveryWebhook etc.) and modern API (HttpAuth field). Credentials appear in URL strings, which may be logged by proxies and access logs. Requires a new credential type separate from existing API auth.
BBearer token via HttpHeaders on the modern Webhooks APIConfigured via POST /webhooks with HttpHeaders: [{"Name": "Authorization", "Value": "Bearer <token>"}]. Reuses the existing ARDA_API_KEY validation already implemented in the backend. No credentials in URL strings. Requires the modern Webhooks API (not the legacy server-level fields).

Recommendation: Option B — reuses existing auth infrastructure, cleaner security posture.

Decision: Option B. Use Bearer token authentication via the modern Postmark Webhooks API. The token can be the same ARDA_API_KEY already used for API authentication, validated by the same backend mechanism. Webhooks are configured per server during provisioning via POST /webhooks (Server Token), not via the legacy server-level URL fields.

Applied to:

  • postmark-service.md § Webhook Authentication, § Step 5: Configure Webhooks, § Provisioning Sequence, § Legacy vs Modern Webhook API
  • functional.md § postmark-events endpoint, § Tenant Provisioning

Context: Each Postmark server has an API token used at runtime to send email. This token must be stored securely. The operations service follows the ESO pattern where secrets are delivered to the pod at startup via External Secrets Operator, not fetched from Secrets Manager at runtime.

OptionDescriptionTrade-offs
APer-tenant Secrets Manager secretsEach provisioning creates a new SM secret. ESO would need to sync all per-tenant secrets, or the service would need runtime SM read access (breaking the ESO pattern). IAM write access needed during provisioning. Scales poorly (N secrets per N tenants).
BEncrypted in database (Aurora volume encryption only)Tokens stored as plaintext columns, encrypted at the storage layer by Aurora’s KMS-backed volume encryption. Sufficient against disk theft, but plaintext to any DB user with SELECT access. No additional key management.
CEncrypted in database (application-level encryption)Service encrypts tokens with a partition-wide symmetric key before INSERT, decrypts after SELECT. The encryption key is a single static secret delivered via ESO at startup. DB dumps and SQL injection do not expose raw tokens. Key rotation is one key, not N.

Recommendation: Option C — maintains the ESO pattern (one static secret), eliminates per-tenant SM writes, and adds defense-in-depth beyond Aurora volume encryption.

Decision: Option C. Per-tenant server tokens are encrypted with a partition-wide encryption key and stored in the serverTokenEncrypted column of tenant_email_config. The encryption key is created by CDK in Secrets Manager and delivered to the pod via ESO as extras.email.encryptionKey in HOCON config. Only the emailConfiguration service handles encryption/decryption; the emailJob service calls emailConfiguration.getActiveConfiguration() to receive the decrypted token as an in-memory value.

Applied to:

  • infrastructure.md § AWS Secrets Manager (SM-3, SM-4 added; per-tenant SM writes removed; IAM-3 removed)
  • functional.md § Email Configuration (Secret Storage section, internal service method)
  • postmark-service.md § Authentication, § Step 6, § Provisioning Sequence
  • architectural-scenarios.md § Scenario 1 (encrypt + persist), § Scenario 2 (getActiveConfiguration with decrypted token)

Round 2: Infrastructure Implementation Decisions

Section titled “Round 2: Infrastructure Implementation Decisions”

DQ-013: IAM Role Extraction from Root Stack

Section titled “DQ-013: IAM Role Extraction from Root Stack”

Context: The root CDK stack (RootConfiguration) contains both DNS hosted zones and the AllowCreatingNSRecordsRole IAM role used for cross-account NS delegation. As this project renames the stack class to RootDnsStack and adds new stacks to the root application, the question is whether to extract the IAM role to a dedicated RootSecurityStack for cleaner separation of concerns.

OptionDescriptionTrade-offs
AExtract role to RootSecurityStack via two-step deployCleaner separation. But: requires two-step deploy due to IAM physical name collision. Creates a 2-5 minute window where the role doesn’t exist. If step 2 fails, role is gone until manually recreated.
BExtract role to RootSecurityStack via CloudFormation stack refactoringCleaner separation. No danger window — the role transfers ownership atomically without delete/recreate. But: requires a manual CloudFormation operation outside the CDK workflow, followed by CDK code realignment. Relatively new AWS feature; should be tested in non-production first.
CKeep role in RootDnsStackNo migration work. Role is conceptually tied to DNS delegation (it enables writing NS records). A RootSecurityStack with a single resource doesn’t justify the extra work in this project’s scope.

Recommendation: Option C for this project. Option B is the viable future path when extraction is justified.

Decision: Option C. The AllowCreatingNSRecordsRole stays in RootDnsStack (CloudFormation name: RootConfiguration). The role is functionally tied to DNS delegation and is acceptable in the DNS stack. Extraction is known to be operationally safe via CloudFormation stack refactoring (Option B), but adds complexity for no immediate functional benefit. When a RootSecurityStack is needed for additional security resources, use stack refactoring to move the role atomically. See root-refactor-analysis.md for the full analysis.

Applied to:

  • infrastructure/root-refactor-analysis.md (full analysis)
  • infrastructure/specification.md § Task 3 (root stack rename only, no role extraction)
  • infrastructure/analysis.md § Root Configuration

Round R1-Phase1: External Resources Provisioning Decisions

Section titled “Round R1-Phase1: External Resources Provisioning Decisions”

DQ-R1-001 through DQ-R1-005 resolve the Open Questions in 1-external-resources/specification.md § 5. DQ-R1-007 is an additional Phase 1 decision captured in the same round (vault separation for the Free Kanban Tool server token). All entries follow the DQ-R1-NNN convention introduced in architecture-overview.md § 10.

Context: The CI workflow that asserts the live external-resource invariants needs a stable filename. Phase 1 originally raised three candidates (external-resources-drift.yml, phase-1-drift.yml, op-drift.yml).

Decision: external-resources-drift.yml. The filename describes the invariant asserted (drift of the external resources Arda consumes), not the phase that introduced the workflow. This keeps the filename stable across phases as the workflow evolves.

Applied to:


DQ-R1-002: Drift-Check TypeScript Module Location

Section titled “DQ-R1-002: Drift-Check TypeScript Module Location”

Context: The drift-check module is dual-purpose: an operator runs it locally with 1Password DesktopAuth; CI runs it with OP_SERVICE_ACCOUNT_TOKEN. Two candidate locations existed in the infrastructure repo: scripts/drift-check.ts (alongside legacy script utilities) or tools/drift-check.ts (a fresh top-level convention).

Decision: tools/drift-check.ts. The module is operator-runnable in addition to CI-runnable, and the tools/ convention better matches the dual-purpose nature than scripts/ (which the prior implementation largely used for one-shot orchestrators). The tools/ convention is forward-compatible with the eventual move of scripts/gha-secrets/ to tools/gha-secret.ts (out of scope of this project but on the trajectory).

Applied to:


DQ-R1-003: Operator Runbook Sign-Off Mechanism

Section titled “DQ-R1-003: Operator Runbook Sign-Off Mechanism”

Context: REQ-OPS-003 requires the runbook to capture sign-off (operator name, date, deviations) so the document is itself the audit record. Three encoding options were considered: a code block, a YAML frontmatter field, or a designated Markdown section with a small table.

Decision: A designated ## Operator Sign-Off section containing a Markdown table with columns Step / Operator / Date / Deviations / Notes, with one pre-populated empty row per REQ-EXT-NNN. The table is human-readable, diff-friendly under git, and does not require new tooling. YAML frontmatter would conflict with Starlight’s required frontmatter schema and would not naturally express per-step rows.

Applied to:


DQ-R1-004: Disposition of Legacy Parser-Gated Runbook

Section titled “DQ-R1-004: Disposition of Legacy Parser-Gated Runbook”

Context: The prior Phase-0 implementation maintained a parser-gated operator runbook (HUMAN-STEPS.md) under infrastructure/scripts/postmark-foundations/, whose state was enforced by a TypeScript parser as a CI gate. REQ-OPS-004 retires the parser gate entirely; the runbook in the documentation repo becomes the canonical operator artefact.

Decision: Delete the parser-gated runbook and its parser code in Phase 1, gated on the canonical runbook (current-system/oam/postmark-service/operator-runbook.md) being merged. Two-step ordering preserves operator availability during the cut-over: docs land first, then the legacy artefact is removed in the infrastructure PR (T-C6 in the task plan).

Applied to:


Context: The API observations note (postmark-api-observations.md) records observed Postmark API behaviour. Surface drift (Postmark adding/changing endpoints) would invalidate parts of the note. The question is when to refresh: annually, on every Postmark major-update post, or on first drift-test failure attributable to surface drift.

Decision: Refresh on first drift-test failure attributable to surface drift, augmented by an annual review. A scheduled-only cadence (annual) without the failure trigger would let regressions sit unnoticed for up to a year; a per-update cadence would create unnecessary documentation churn since most Postmark updates do not affect the small surface Arda uses. The combination keeps the note current where it matters and bounds staleness.

Applied to:

  • current-system/oam/postmark-service/postmark-api-observations.md (Phase 1 deliverable; freshness cadence noted in version-pin section)

DQ-R1-007: Vault Separation for Free Kanban Tool Server Token

Section titled “DQ-R1-007: Vault Separation for Free Kanban Tool Server Token”

Context: The Free Kanban Tool sends transactional email from freekanban.arda.ardamails.com. Its Postmark server token is the runtime sending credential — a leak yields the ability to send arbitrary email under that domain. The original cross-cutting design placed this item in Arda-SystemsOAM alongside the OAM-tier credentials (Postmark account tokens, IAC service-account tokens). OP_SERVICE_ACCOUNT_TOKEN — the GitHub Actions secret that authenticates CI to 1Password — is scoped read-only to Arda-SystemsOAM. So the Free Kanban server token sat in the same blast radius as every other OAM credential, contradicting the bounded-blast-radius framing in cross-cutting-design.md § 2.5.

Discovered: 2026-05-05, during the Phase 1 operator-walkthrough preparation. Re-running tools/drift-check.ts locally surfaced the placement and prompted a re-evaluation of vault scoping for runtime credentials.

OptionDescriptionTrade-offs
AKeep the item in Arda-SystemsOAM.One vault to manage. But: Free Kanban Tool’s runtime credential is reachable by OP_SERVICE_ACCOUNT_TOKEN, which expands the blast radius of any CI compromise to include the live sending key.
BMove the item to a dedicated Arda-CorporateOAM vault. The Free Kanban Tool’s runtime resolves the credential via its own SDK auth path; OP_SERVICE_ACCOUNT_TOKEN does not have read access to this vault.One additional vault to provision. The Free Kanban server token is now isolated from the OAM-tier credentials; a CI / OP_SERVICE_ACCOUNT_TOKEN compromise does not yield it. Matches the rev1 design intent: deploy-time / OAM credentials in Arda-SystemsOAM; runtime sending credentials in instance-group-scoped vaults.

Recommendation: Option B — bounded blast radius outweighs the single-vault simplicity.

Decision: Option B. The Free Kanban Tool’s Postmark server token lives at:

FieldValue
VaultArda-CorporateOAM
Item titleFree-Kanban-Generator-Postmark-Server
Fieldcredential
Canonical referenceop://Arda-CorporateOAM/Free-Kanban-Generator-Postmark-Server/credential

The vault was provisioned 2026-05-05 (operator action by Miguel). The 1Password item itself is created by Phase 3 (Corporate CLI Phase A writes the Postmark server token into the item the first time it runs). Phase 1 does not create or assert the existence of this item.

This decision establishes a vault-naming convention that future instance groups follow: Arda-<InstanceGroup>OAM for runtime sending credentials owned by that instance group. The existing partition-scoped vaults (Arda-DevOAM, Arda-StageOAM, Arda-DemoOAM, Arda-ProdOAM, Arda-SandboxKyle) already follow this pattern; Arda-CorporateOAM extends it to the new Corporate Resource Group.

Clarification on item naming within partition vaults. The Arda-SystemsOAM vault holds both Postmark accounts (Postmark-Prod and Postmark-NonProd) and therefore uses qualified item names (the account suffix disambiguates the two within the single vault). In contrast, each per-partition vault (Arda-DevOAM, Arda-ProdOAM, etc.) holds only one Postmark account reference — the one relevant to that partition — so the service-name-only item title Postmark is used (the vault name itself carries the environment). This follows the workspace CLAUDE.md 1Password vault convention: vaults are scoped by usage; store independently even when the value is currently shared.

Consequences:

  • Phase 1: the typed reference FREE_KANBAN_POSTMARK_ITEM is removed from infrastructure/src/main/cdk/platform/one-password.ts. Phase 1 declares only the three items it creates (Postmark-Prod, Postmark-NonProd, IAC-SCRIPTS Service Account Token). tools/drift-check.ts and the Phase 1 V-PLAT-002 test surface shrink correspondingly.
  • Phase 3: Corporate Updates (re)introduces the typed reference with the new vault, item title, and field. Phase 3’s spec explicitly enumerates the SDK auth path the Free Kanban Tool’s runtime uses to read the credential (out of scope of this project’s IaC, but documented for the Free Kanban Tool team).
  • Threat model: cross-cutting-design.md § 2.1 line 39 (“attacker holding OP_SERVICE_ACCOUNT_TOKEN reads every credential reachable from Arda-SystemsOAM”) remains true; the Free Kanban server token is no longer in that set. § 2.5 is updated to explicitly call out the vault-separation guarantee.

Applied to:


This round captures decisions made while planning Phase 2 — Root Updates.

DQ-R1-006: Locus of Cross-Zone NS-Delegation Writes

Section titled “DQ-R1-006: Locus of Cross-Zone NS-Delegation Writes”

Context: The Root account owns the ardamails.com mail-root zone (Phase 2 introduces it) and the four arda.cards family zones. Child zones (arda.ardamails.com for Corporate in Phase 3; {partition}.ardamails.com per partition in Phase 4) need NS-delegation records in the parent zone. The question is which stack writes those NS records:

OptionDescriptionTrade-offs
ARoot stack writes the per-child NS record set. The parent stack reads each child zone’s hostedZoneNameServers via cross-stack import or live API lookup and writes the NS record into the parent zone.Centralises NS records in one stack. But: creates a Phase-2-on-Phase-3 (and Phase-2-on-Phase-4) deploy-order dependency; Root cannot complete its NS-delegation writes until every child zone has been provisioned. Inverts the natural “owner of a zone owns its delegation” intuition.
BChild stack writes its own NS record into the parent zone using a cross-account assume-role pattern. Root owns only the assume-role IAM target (AllowCreatingNSRecordsRole); each child stack instantiates a WriteNSRecordsToUpstreamDns construct that runs a Lambda + Custom Resource in the child account, assumes the Root role, and writes the parent NS record.Matches the existing arda.cards family pattern (every partition’s IngressStack already writes its own NS records into Root’s arda.cards family zones). Phase 2 is fully self-contained; Phase 3 / Phase 4 depend on Phase 2 only for the role and the parent zone existence. Slightly more constructs per child stack, but the constructs already exist.

Recommendation: Option B — consistency with the existing pattern, clean dependency direction, no joint-deploy requirements between phases.

Decision: Option B. The WriteNSRecordsToUpstreamDns construct (at src/main/cdk/constructs/xgress/write-ns-records-to-upstream-dns.ts) is owned and instantiated by the child zone stack. It internally creates a Lambda execution role in the child account, a NodejsFunction from constructs/inline-lambdas/write-platform-root-ns-record.ts, and a cdk.CustomResource that on stack lifecycle events assumes the Root role (AllowCreatingNSRecordsRole, deterministic name from aws-configuration.ALLOW_WRITE_NS_RECORDS_ROLE.name) and writes / updates / deletes the NS record set in the parent zone. The child zone’s own hostedZoneNameServers token is passed in as the nameServers property — no live cross-zone lookup is required.

Consequences:

  • Phase 2 does not write NS records for any child zone. Its scope is limited to: renaming the existing app/stack, declaring the ardamails.com zone, exporting the zone ID and the IAM role ARN, and adding the instances/Root/dns.ts declarative configuration.
  • Phase 3 (Corporate) instantiates WriteNSRecordsToUpstreamDns against the ardamails.com zone with subdomain: "arda" and nameServers: arda.hostedZoneNameServers.
  • Phase 4 (per-partition) does the same, once per partition, with subdomain: "<partition>" and nameServers: <partition>Zone.hostedZoneNameServers.
  • Phase 2 → Phase 3 / Phase 4 dependency reduces to deploy order (Root must deploy first because the child stacks’ lambdas assume the Root role at deploy time).

Applied to:

  • 2-root-updates/specification.md — Phase 2 scope explicitly excludes NS-delegation writes.
  • phases.md § Phase 2 — deliverables list updated; the “NS-delegation for arda.ardamails.com” row replaced with the ardamails.com zone declaration.
  • phases.md § Phase 3 — Corporate Email stack deliverable extended to mention the WriteNSRecordsToUpstreamDns instantiation.

DQ-R1-008: Adopt vs. Create the existing ardamails.com Hosted Zone

Section titled “DQ-R1-008: Adopt vs. Create the existing ardamails.com Hosted Zone”

Context: When cdk diff was run against the deployed RootConfiguration stack to validate the Phase 2 implementation (Gate 3), it surfaced an additive-only result — as expected by design. But a separate AWS investigation (motivated by an offhand challenge from the operator: “is the zone already there?”) revealed that the ardamails.com hosted zone already existed in the Root account as Z0721066239FWCD47EJDX, with two records (apex NS and SOA) and the four AWS-assigned nameservers (ns-2046.awsdns-63.co.uk, ns-944.awsdns-54.net, ns-158.awsdns-19.com, ns-1497.awsdns-59.org). The zone was auto-created by AWS Route53 Domains when the ardamails.com domain was originally registered through the registrar service.

The original Phase 2 implementation declared a brand-new r53.PublicHostedZone(this, "ArdamailsZone", {...}). Deploying as written would have created a second hosted zone for ardamails.com with a different NS set; the registrar would still have pointed at the original four nameservers, so the new zone would have been orphaned at the DNS level. The deploy-as-coded path was unsafe.

Discovered: 2026-05-05, after Gate 3 cleared (the cdk-diff against deployed also reported additive-only because both zones were missing from the deployed stack and the synthesized template added a new one — the diff couldn’t see the duplication risk because the duplicated resource is in Route53 but not in CloudFormation).

OptionDescriptionTrade-offs
Acdk import the existing zone into RootConfiguration (logical ID ArdamailsZone1DCDDC15). Zone becomes CDK/CFN-managed; no duplicate created; registrar’s NS chain preserved.One-time operator action; zone properties must match the import target exactly; CFN’s IMPORT change-set type doesn’t allow Output additions or other resource modifications, so the deploy is two-phase (import-only template, then full deploy).
BReference the existing zone via r53.HostedZone.fromHostedZoneAttributes() and export its ID via CfnOutput without trying to manage it.Zone stays outside CDK control; future record additions (root-level SPF, DMARC) require ad-hoc tooling. Doesn’t match the “Phase 2 declares the ardamails.com zone” intent in phases.md.

Recommendation: Option A.

Decision: Option A. The CDK code at src/main/cdk/stacks/root/root-dns-stack.ts was extended in two ways:

  1. The ArdamailsZone declaration now sets comment: "HostedZone created by Route53 Registrar" — the AWS-default comment string on the live zone — so the IMPORT change-set is read-only (CFN reports Scope: [], no property writes).
  2. applyRemovalPolicy(cdk.RemovalPolicy.RETAIN) defends the imported zone against accidental cdk destroy of the production root stack.

The root-dns-stack.test.ts file’s V-ROOT-001 was extended with a strict-equality assertion that locks the synthesized resource block to the live zone’s properties (Name, HostedZoneConfig.Comment) plus the RETAIN retention policies. Future CDK code changes that drift from the import target fail at test time.

The deployment proceeded in two CFN operations:

  1. IMPORT change-set with a stripped template (deployed-state + just the ArdamailsZone resource added; no Outputs added, no other resource modifications). Executed cleanly: Action: Import, Replacement: null, Scope: []. Stack transitioned to IMPORT_COMPLETE.
  2. Normal cdk deploy with the full synthesized template, adding the ardamailsZone Output (publishing the arda-ardamails-zone CFN export) and reconciling CDKMetadata. Stack transitioned to UPDATE_COMPLETE. Final cdk diff reported zero differences.

Forward implications:

  • Phase 3’s arda.ardamails.com zone is created fresh by the Corporate Email stack (no pre-existing zone in Route53); no IMPORT detour needed.
  • Phase 4’s per-partition {partition}.ardamails.com zones are created fresh in each partition’s AWS account (no pre-existing zone); no IMPORT detour needed.
  • Future zone-creation work in this project follows the standard cdk deploy flow.

Applied to:

  • infrastructure/src/main/cdk/stacks/root/root-dns-stack.ts — comment + retention policy on ArdamailsZone.
  • infrastructure/src/main/cdk/stacks/root/root-dns-stack.test.ts — V-ROOT-001 strict-match.
  • infrastructure/CHANGELOG.md [2.29.0] — Added entry refined to mention the import.
  • 2-root-updates/implementation/learnings.md, alternatives.md, skipped.md — project-completion byproducts.

Round R1-Phase3: Corporate Updates Decisions

Section titled “Round R1-Phase3: Corporate Updates Decisions”

This round captures decisions made while planning Phase 3 — Corporate Updates. All decisions resolved during the Pass-1 analysis (3-corporate-updates/analysis.md) on 2026-05-06.

DQ-R1-009: Postmark Domain-Verification Target (Parent vs Leaf)

Section titled “DQ-R1-009: Postmark Domain-Verification Target (Parent vs Leaf)”

Context: The Free Kanban Tool sends from freekanban.arda.ardamails.com. Postmark verifies sending domains via DKIM + Return-Path records published in DNS. The verification can target either the leaf sub-domain (freekanban.arda.ardamails.com) or the Corporate-zone parent (arda.ardamails.com). Verifying the parent makes leaves inherit DKIM through the parent’s signing key, removing the need for a per-leaf verification click as future Corporate consumers (HubSpot, marketing) are added.

OptionDescriptionTrade-offs
AVerify each leaf sub-domain individually as it is created.Simple per-leaf isolation; failure of one leaf’s DKIM doesn’t affect siblings. But: each new Corporate consumer requires its own verification click (or API call) and its own DKIM rotation runbook.
BVerify once at the Corporate-zone parent (arda.ardamails.com); leaves inherit DKIM via the parent’s signing key.One verification step covers every current and future leaf under arda.ardamails.com. Single DKIM key rotation runbook. Aligns with Postmark’s parent-domain verification semantics.

Recommendation: Option B — parent verification. Pre-decided 2026-05-05 during the Phase 1 operator-walkthrough preparation.

Decision: Option B. Phase 3’s PostmarkSendingDomain thin-wrapper registers arda.ardamails.com as the Sender Signature in PostmarkProd. The Corporate CLI invokes verifyDkim and verifyReturnPath against this parent. Leaf sub-domains (freekanban.arda.ardamails.com, future siblings) do not receive their own Sender Signature.

Applied to:

  • 3-corporate-updates/analysis.md § “Note on what becomes ‘known to Postmark’” and gaps G-1, G-7, G-8.
  • operator-domain-verification-checklist.md — the stub already pointed at this decision; the just-in-time expansion at implementation time formalizes the verification target.
  • Phase 3 specification (Pass 2) — the PostmarkSendingDomain configuration is arda.ardamails.com, not the leaf.

Implementation note (added post-Phase-3): The first implementation pass diverged from this decision — Phase A’s CLI honored it by accident while the CDK construct silently placed the DKIM TXT under the leaf sub-domain. Surfaced by Phase B post-deploy verification when Postmark’s DKIMPendingHost did not match the deployed FQDN. The root cause was that the decision was prose-only (this entry, a docstring, a runbook) with no value or function any code consumed. Resolved by Arda-cards/infrastructure PR #450 commit cd85527: a typed source-of-truth sendingDomainPlacement() function in platform/constructs/postmark/sending-domain.ts is now consumed identically by the CLI, the CDK construct, and the drift check; cross-seam assertions in tools/corporate-drift.ts verify Postmark’s reported state agrees with the placement function. Full narrative at 3-corporate-updates/implementation/dqr1009-divergence.md; the structural lesson is captured in 3-corporate-updates/implementation/learnings.md L-1.

The scope of this decision is the Corporate instance group: verification at the Corporate-zone parent (arda.ardamails.com); leaves under it inherit. Phase 4’s per-partition Sender Signatures apply the same “verify at the instance-group parent” pattern at their own level ({partition}.ardamails.com), with each partition having its own DKIM key for receiver-side reputation isolation. The ardamails.com apex is not a verification target. The Phase 4 granularity decision is pinned in DQ-R1-017 (Round R1-Phase4).


DQ-R1-010: Locus of Corporate’s NS-Delegation Write (Same-Account Case)

Section titled “DQ-R1-010: Locus of Corporate’s NS-Delegation Write (Same-Account Case)”

Context: DQ-R1-006 settled that the child zone owner writes the NS-delegation record upstream into the parent zone. The construct (WriteNSRecordsToUpstreamDns) was designed for the cross-account case where Application-Runtime partitions (in Alpha001 / Alpha002) write into Root’s ardamails.com zone (in platformRoot). For Phase 3, Corporate currently lives in platformRoot — the same account as Root. The question is whether Corporate’s stack still uses the assume-role construct or writes Route53 directly.

OptionDescriptionTrade-offs
AAlways go through WriteNSRecordsToUpstreamDns and assume the role even when same-account; preserves the pattern uniformly across instance groups.One extra STS AssumeRole call per deploy (~tens of milliseconds, negligible). Construct behavior is invariant under the future Corporate-account migration (architecture-overview § 6.4). DQ-R1-006’s “child writes upstream” intent is preserved.
BBranch the construct so same-account writes skip the assume-role hop (direct Route53 write).Slightly faster deploy; no STS call. But: introduces a same-account vs cross-account branch in the construct, expanding the test surface and creating a behavior change at the future Corporate-account migration moment.
CWrite the NS record from Root’s stack instead (revisits DQ-R1-006 for this case).Simpler in the same-account case. But: re-opens DQ-R1-006 and breaks the “child owns its delegation” invariant.

Recommendation: Option A — uniform pattern.

Decision: Option A. Phase 3’s CorporateMailDns stack instantiates WriteNSRecordsToUpstreamDns exactly as a partition would, with targetAccountId set to platformRoot’s account ID. The assume-role hop fires; the role grants ChangeResourceRecordSets on ardamails.com (the only zone the role’s allowedParentHostedZoneIds whitelists). The construct’s behavior is identical between the same-account (Phase 3 today) and cross-account (future Corporate-account migration) cases.

Applied to:


DQ-R1-011: route-53-hosted-zone.tsdns-zone.ts Migration Shape

Section titled “DQ-R1-011: route-53-hosted-zone.ts → dns-zone.ts Migration Shape”

Context: The existing constructs/xgress/route-53-hosted-zone.ts is the arda.cards-shaped hosted-zone construct (its overrideDomainName defaults to arda.cards). Phase 3 needs a generalized DnsZone construct that supports any registrable domain (ardamails.com, arda.ardamails.com, future). Two construct names cannot survive long-term; the question is the migration shape.

OptionDescriptionTrade-offs
ARename in place: dns-zone.ts replaces route-53-hosted-zone.ts; existing callers updated in the same PR.One PR, contained blast radius. The repo’s validateProps discipline catches missed callers at synth time.
BCoexist for a transition window: dns-zone.ts is added; route-53-hosted-zone.ts becomes a thin re-export with a deprecation notice; followup PR removes the old name.Smaller per-PR diff, easier review. But: two PRs land in sequence; the deprecation alias outlives any actual deprecation period.
CLeave the old construct, add the new one; the old continues to serve arda.cards-family callers.No caller migration. But: construct sprawl — two near-identical constructs co-exist indefinitely.

Recommendation: Option A — rename in place.

Decision: Option A. The construct is renamed in the same Phase 3 PR; validateProps catches missed callers at synth, which is exercised by the repo’s CDK matrix in CI.

Applied to:


DQ-R1-012: Corporate Drift-Workflow Filename and Scope

Section titled “DQ-R1-012: Corporate Drift-Workflow Filename and Scope”

Context: Phase 1 added external-resources-drift.yml (one workflow that exercises every external resource the platform consumes). Phase 3 introduces the first Corporate asset (Free Kanban Tool); future Corporate assets (HubSpot, marketing-site) follow. The question is whether to scope the drift workflow per asset or per instance group.

OptionDescriptionTrade-offs
Acorporate-free-kanban-tool.yml (asset-specific, one workflow per asset).One failure isolates to one workflow run. But: workflow file count grows linearly with Corporate assets; each new asset requires a new workflow file.
Bcorporate-drift.yml (instance-group-scoped, one workflow that exercises every Corporate asset).Workflow count proportional to instance groups, not assets. The driver script enumerates instances/Corporate/ and exercises each. New Corporate assets are picked up automatically.
C<asset>-drift.yml per asset with a shared tools/corporate-drift.ts driver.Combines the worst of A and B.

Recommendation: Option B — instance-group-scoped.

Decision: Option B. The workflow file is corporate-drift.yml. The driver enumerates instances/Corporate/ and exercises each asset’s Postmark server, DNS records, and 1Password item. Failures open one issue per failed asset (label includes the asset name).

Applied to:


DQ-R1-013: Phase A Failure Ordering for the Postmark Server Token

Section titled “DQ-R1-013: Phase A Failure Ordering for the Postmark Server Token”

Context: Phase A of the Corporate CLI creates a Postmark server (which yields the Server API token), writes the token to 1Password, and writes public values to cdk.context.json. Postmark’s API surfaces the token once at server creation; it cannot be re-retrieved. If the 1Password write fails after the server is created, the token is unrecoverable from Postmark’s side.

OptionDescriptionTrade-offs
AWrite to 1P first, then cdk.context.json; on 1P-write failure, roll back by calling Postmark’s delete-server API.Atomic-looking. But: delete-server is a destructive operation that runs against the live Postmark account; a botched rollback (e.g., after partial state was already created) destroys observable history. The rollback path is harder to test than the forward path.
BPersist the token to a process-local secret-handling buffer immediately on receipt; write to 1P with retries (exponential backoff, finite). Fail loud on permanent 1P-write failure with the buffer’s redacted summary; manual operator action to recover.Token is never persisted outside 1P. The 1P-write failure surfaces clearly with a redacted alert; the operator pastes the buffer summary into 1P via DesktopAuth or chooses to call delete-server deliberately as recovery. Forward path is the only tested path.

Recommendation: Option B — buffer + retries.

Decision: Option B. The Corporate CLI implements a process-local secret buffer for the freshly issued server token. The 1P write retries up to N times with exponential backoff (defaults TBD by implementer; the spec lists the parameter). On exhaustion, the CLI exits with a clearly redacted summary that allows the operator to either manually paste the token into 1P (DesktopAuth) or invoke delete-server to reset. cdk.context.json is written after the 1P write succeeds; a 1P-write failure leaves cdk.context.json untouched.

Applied to:


DQ-R1-014: cdk.context.json Commit Policy for Phase A’s Outputs

Section titled “DQ-R1-014: cdk.context.json Commit Policy for Phase A’s Outputs”

Context: Phase A writes postmark.free-kanban.serverId, .dkimSelector, .dkimKey, .returnPathTarget into cdk.context.json. These are public values (DKIM selector and key are published in DNS; serverId is non-sensitive). Standard CDK practice is to commit cdk.context.json so synth is deterministic on a fresh checkout.

OptionDescriptionTrade-offs
ACommit cdk.context.json — standard CDK practice; deterministic re-synth on a fresh checkout.New developers / CI checkouts can cdk synth without re-running Phase A. The values are public and DNS-published; no leak surface.
BLocal-only with .gitignore; CI re-runs Phase A to repopulate.Eliminates the commit-of-generated-values pattern. But: re-running Phase A in CI requires Postmark Account API credentials in CI’s environment, which is the opposite of the design intent (only OP_SERVICE_ACCOUNT_TOKEN should be in CI; everything else is resolved at runtime via the SDK).
CCommit, but exclude the postmark.* keys via a custom serializer.Adds tooling complexity for no benefit; the public values are not sensitive.

Recommendation: Option A — commit.

Decision: Option A. cdk.context.json is committed to the repo with the postmark.free-kanban.* keys populated by Phase A. The keys are DNS-public; commit is safe. Re-running Phase A is idempotent and updates the file when Postmark issues a new value (e.g., a DKIM-key rotation).

Applied to:


DQ-R1-015: DMARC Reporting Mailbox (rua / ruf) for _dmarc.arda.ardamails.com

Section titled “DQ-R1-015: DMARC Reporting Mailbox (rua / ruf) for _dmarc.arda.ardamails.com”

Context: The DMARC record at _dmarc.arda.ardamails.com (per architecture-overview § 5.2) has an initial monitoring policy of p=quarantine; sp=quarantine. The aggregate-report destination (rua=mailto:...) and forensic-report destination (ruf=mailto:..., optional) need a reachable mailbox to be meaningful.

OptionDescriptionTrade-offs
Admarc-reports@arda.cards (existing arda.cards-family Google Workspace inbox).Least operational cost; mailbox provisioning is one Google Workspace step. Reports aggregate over time and are reviewed periodically, not in real time.
BA new dmarc-reports@ardamails.com mailbox, hosted independently.Cleaner naming alignment with the mail-root domain. But: requires standing up MX records on ardamails.com, which is currently a sending-only domain; introduces inbound-mail handling that this project deliberately avoids.
CNo rua / ruf in v1; revisit when DMARC reporting becomes a routine input.No mailbox to provision. But: DMARC monitoring (p=quarantine) is meaningless without a reporting destination; the policy effectively reduces to “do whatever your local rules say.”

Recommendation: Option A — dmarc-reports@arda.cards.

Decision: Option A. The DMARC record carries rua=mailto:dmarc-reports@arda.cards. The mailbox is provisioned by the operator in Arda’s Google Workspace before Phase B deploy; the operator companion (G-18 in the analysis) captures the step at implementation time. ruf is omitted in v1 (forensic reports are noisier and not actioned today).

Applied to:

  • 3-corporate-updates/analysis.md gaps G-6 and G-20.
  • Phase 3 specification (Pass 2) — the DMARC TXT record content; the operator companion captures the prerequisite mailbox step.
  • Operator companion at implementation time — explicit pre-deploy step.

DQ-R1-016: Reserved-Name Registry Scope at arda.ardamails.com

Section titled “DQ-R1-016: Reserved-Name Registry Scope at arda.ardamails.com”

Context: Architecture-overview § 6.5 reserves arda at the ardamails.com level so future tenant slugs (in any partition) cannot collide. The question is whether to also reserve sub-domain slugs at the arda.ardamails.com level (freekanban, future hubspot, …): import them into a constants list and have partition validators reject them, or leave the arda.ardamails.com-level registry as documentation only.

OptionDescriptionTrade-offs
ARegister freekanban (and future Corporate slugs) in platform/ari-configuration.ts; partition validators import the constant.Cross-instance-group collision detection is mechanical. But: Application-Runtime partitions and the Corporate instance group become coupled through a shared constants list; any change to the Corporate registry forces a re-deploy of every partition (or at least invalidates their lint).
BDocumentation-only registry at arda.ardamails.com; partition validators do not import; corporate-cli.ts enforces the registry locally on Phase A entry by listing pre-existing Postmark Sender Signatures, servers, and 1P items.No cross-instance-group import coupling. The CLI’s Phase A is the only writer; it can enforce uniqueness against live Postmark + 1P state. Adds a conflict-check requirement to the CLI.

Recommendation: Option B — documentation-only with CLI-enforcement.

Decision: Option B. Partition validators do not import a Corporate slug list. corporate-cli.ts Phase A entry includes a conflict-check: it lists existing Postmark Sender Signatures (in the configured account), existing Postmark servers (in the configured account), and existing 1Password items (in Arda-CorporateOAM); if a name collision exists for the asset being created, the CLI exits before any state-mutating call. This catches both intra-Corporate collisions (two assets with overlapping names) and cross-instance-group collisions (a partition somehow registered an arda.ardamails.com slug).

Applied to:

  • 3-corporate-updates/analysis.md gaps G-15 and G-17.
  • Phase 3 specification (Pass 2) — the conflict-check is a corporate-cli.ts Phase A acceptance criterion.

Round R1-Phase4: Runtime Platform Updates Decisions

Section titled “Round R1-Phase4: Runtime Platform Updates Decisions”

This round captures decisions made during Phase 4 — per-partition mail capability for the Application Runtime instance group. Decision IDs DQ-R1-017 through DQ-R1-022 are reserved for this round; all entries are resolved.

DQ-R1-017: Postmark Sender Signature Granularity per Partition

Section titled “DQ-R1-017: Postmark Sender Signature Granularity per Partition”

Context: Phase 4 brings per-partition mail capability online for the Application Runtime instance group across four active partitions (prod, demo, dev, stage; kyle excluded per DQ-R1-021). Each partition has its own mail sub-zone {partition}.ardamails.com. The question is whether each partition gets its own Postmark Sender Signature (with its own DKIM key, independent reputation), whether multiple partitions share a parent Signature in the spirit of DQ-R1-009 (which used parent verification for the Corporate instance group), and how per-tenant isolation fits in.

OptionDescriptionTrade-offs
AOne Signature at ardamails.com (root); all partitions and Corporate inherit via parent verification.One Signature covers the entire tree. But: reputation pools across every environment and the Corporate consumer; abuse on dev taints prod. Defeats the per-partition isolation goal.
BOne Signature per partition sub-zone (prod.ardamails.com, etc.); each carries its own DKIM key; leaves under each partition (per-tenant sub-domains) inherit.Per-partition reputation independence. Matches the Postmark account split (Prod vs NonProd). Future per-tenant Signatures (for stricter isolation) can be added in Phase 5b without changing this layer.
COne Signature per tenant from day one.Strictest isolation. But: thousands of Signatures to manage; per-tenant verification cost; premature when tenant volume is zero.

Recommendation: Option B — per-partition Signature, parent-verified at the partition sub-zone, leaves inherit.

Decision: Option B. Phase 4 registers one Postmark Sender Signature per active partition at the partition’s sub-zone ({partition}.ardamails.com). The Signature is anchored at the partition apex; per-tenant sub-domains within the partition inherit DKIM via the partition’s signing key. Production partitions (prod, demo) land on the PostmarkProd account; non-production partitions (dev, stage) on PostmarkNonProd. The first non-prod Signature (dev.ardamails.com) also satisfies Postmark Compliance’s pending approval for arda-nonprod. Per-tenant Signature granularity is deferred to Phase 5b when tenant volume exists.

Applied to:

  • phases.md § Phase 4 Scope and Deliverables (Postmark Sender Signature rows).
  • 4-runtime-platform-updates/goal.md Success Criteria #5 (first non-prod Signature verified).
  • Phase 5b Email module design (whether to add per-tenant Signatures becomes a tractable choice once tenants exist).

DQ-R1-018: corporate-drift Rename and Scope

Section titled “DQ-R1-018: corporate-drift Rename and Scope”

Context: Phase 3 introduced tools/corporate-drift.ts and .github/workflows/corporate-drift.yml — a scheduled drift check that asserts Postmark account state and DNS state for the Corporate instance group, with cross-seam Postmark↔placement-function assertions added by the DQ-R1-009 fix. Phase 4 adds per-partition Postmark Sender Signatures that need equivalent drift coverage. The question is whether corporate-drift is renamed and generalized (e.g., to mail-drift) to cover Corporate + every partition Signature, or kept as Corporate-only with a parallel new workflow added for the partition surfaces.

OptionDescriptionTrade-offs
ARename corporate-drift to mail-drift; one workflow asserts Corporate + every partition Signature.One workflow to maintain. Single failure-issue stream. But: future runtime-platform drift checks unrelated to email (e.g., asserting CloudFront cache configuration, asserting Lambda function counts) would need their own naming; mail-drift is mail-centric.
BKeep corporate-drift unchanged. Add a new runtime-platform-drift workflow in parallel, covering partition surfaces. Share logic via reusable shell scripts or GitHub Actions composite actions.Names reflect scope (Corporate is one instance group; runtime-platform is another). Future non-mail runtime-platform drift checks plug into runtime-platform-drift without mail-centric naming. Two workflows to maintain, but shared logic minimizes drift between them.

Recommendation: Option B — parallel workflows with shared logic.

Decision: Option B. corporate-drift stays as-is. A new .github/workflows/runtime-platform-drift.yml and driver under tools/ (Phase 4 deliverable) asserts the cross-seam Postmark↔DNS↔placement-function invariants for every active partition Signature. The two workflows share reusable shell scripts or GitHub Actions composite actions so the drift-check logic doesn’t drift between them. Future runtime-platform drift checks unrelated to email plug into the same workflow without renaming.

Applied to:


DQ-R1-019: Per-Partition Email Server-Token Encryption Key

Section titled “DQ-R1-019: Per-Partition Email Server-Token Encryption Key”

Context: DQ-012 decided that per-tenant Postmark server tokens are encrypted application-side with a partition-wide symmetric key before INSERT, with the key in AWS Secrets Manager and delivered via ESO. DQ-202 fixed the on-disk format as an AES-256-GCM versioned envelope; DQ-203 specified that the SM value is a 64-byte HKDF input. Phase 4 must close three open sub-questions: (1) how the SM secret is named and declared in CDK, (2) what the envelope’s version prefix tracks (algorithm version, secret material version, or both), (3) how rotation works.

The full design is documented in 4-runtime-platform-updates/design/email-server-key-encryption.md. This entry summarizes the three sub-decisions.

OptionDescriptionTrade-offs
ASingle-axis envelope vN, with vN coupling algorithm and secret material. Sibling SM secrets per rotation (-v1, -v2, …).Operationally simple at the data-model layer. But: every rotation churns the code-side dispatch table, conflating algorithm cadence (rare) with material cadence (frequent).
BTwo-axis envelope a{N}.k{SM-VERSION-ID}. One SM secret per partition; rotation via update-secret (SM-native versioning). Hot-swap via two ExternalSecret mounts (AWSCURRENT + AWSPREVIOUS). Lazy + coroutine migration. SDK fallback for rare older versions.Algorithm and material lifecycles cleanly separated. SM’s native versioning enables future AWS Rotation Lambdas natively. Operationally clean. But: the dispatch model is slightly more elaborate than Option A.

Recommendation: Option B.

Decision: Option B. Phase 4 deploys one aws_secretsmanager.Secret per partition named {fqn}-I-EmailEncryptionKey (the -I- marker matches the convention as practiced for intra-partition resources), passwordLength: 64, RemovalPolicy.RETAIN. The Phase 5b on-disk envelope is a{N}.k{SM-VERSION-ID}:<base64-payload>; a{N} is the algorithm version (code-indexed; bumps require a release; never retired); k{...} is the AWS SM versionId of the SM version used at write time (runtime-indexed via two ExternalSecret mounts for AWSCURRENT and AWSPREVIOUS, plus a SM SDK fallback for rare older versions). Rotation is aws secretsmanager update-secret; migration is lazy on the first non-up-to-date read + a per-pod coroutine mop-up for the rest of the partition. Automated rotation via AWS SM Rotation Lambdas is enabled by this design and deferred to a future deliverable.

Applied to:


DQ-R1-020: DNS-Provisioning + SM-Fallback IAM Roles

Section titled “DQ-R1-020: DNS-Provisioning + SM-Fallback IAM Roles”

Context: Phase 4 introduces two new AWS capabilities that the operations component’s pod must exercise at runtime in each partition:

  1. Route53 ChangeResourceRecordSets on the partition’s mail sub-zone ({partition}.ardamails.com) — consumed by the Phase 5b Email module for per-tenant DKIM / Return-Path / DMARC record provisioning.
  2. secretsmanager:GetSecretValue on {fqn}-I-EmailEncryptionKey — consumed by the Phase 5b TokenCipher SDK-fallback path (DQ-R1-019) for the rare case of decrypting envelopes whose k{SM-VERSION-ID} is older than AWSPREVIOUS.

Both permissions target partition-scoped resources and need to be available to the same workload (the operations pod). The decision is the IAM topology: which mechanism authenticates the pod to AWS, and where the permissions live.

Codebase precedent: A search of infrastructure/src/main/cdk/ shows IRSA is the sole adopted pod-identity mechanism (the partition EksStack already configures an OpenIdConnectProvider and exports {fqn}-EksPodRoleArn). Crucially, the exported pod role is never extended with workload-specific permissions anywhere in the codebase. The established pattern — exemplified by infrastructure/src/main/cdk/constructs/storage/image-asset-bucket.ts and public-upload-bucket.ts — is to create a fresh purpose-specific role with a trust policy that lets the pod role assume it via STS:

const preSigningRole = new iam.Role(this, "ImageUploadPreSigningRole", {
roleName: `${fqn}-ImageUploadPreSigningRole`,
assumedBy: new iam.AccountPrincipal(account).withConditions({
ArnLike: { "aws:PrincipalArn": clientRoleArnPattern }, // e.g. `arn:aws:iam::<acct>:role/<fqn>-*`
}),
});
preSigningRole.addToPolicy(new iam.PolicyStatement({ /* purpose-specific perms */ }));

The pod federates into the partition pod role via IRSA at pod startup; the application code then performs sts:AssumeRole into the purpose-specific role at the call site (DQ-204 STS chain). Permissions live on the purpose role, not the pod role.

OptionDescriptionTrade-offs
αExtend the existing {fqn}-EksPodRole with the new Route53 and SM GetSecretValue statements.Single role to audit per partition; one stack change. But: not how anything else in the codebase is structured — the pod role is treated as an STS-chain origin, not a permission accumulator. Adopting α here would diverge from established practice.
βCreate two fresh per-purpose roles ({fqn}-EmailDnsProvisioningRole, {fqn}-EmailEncryptionKeyFallbackRole) with trust policies that allow the partition pod role to assume them via STS. Mirrors ImageUploadPreSigningRole.Aligns with codebase precedent. Cleanest least-privilege: each call path can only chain into the role it needs. Two new CDK roles and two new exports — a normal Phase 4 cost (Phase 4’s purpose is precisely to provision the partition infrastructure 5b needs).
γAdopt EKS Pod Identity (pods.eks.amazonaws.com) for these new roles.Simpler trust-policy shape. But: not used anywhere else in Arda; introduces a second pod-identity mechanism alongside IRSA; no concrete benefit for this use case. Reject.
δNode-level instance profile / long-lived static keys.Violates DQ-204; reject.

Recommendation: Option β.

Decision: Option β. Phase 4 declares two fresh per-purpose IAM roles in each partition:

  1. {fqn}-EmailDnsProvisioningRole — permissions: route53:ChangeResourceRecordSets, route53:ListResourceRecordSets on the partition’s mail hosted-zone ARN ({partition}.ardamails.com). Exported as {fqn}-EmailDnsProvisioningRoleArn. (route53:GetChange is intentionally omitted: it requires arn:aws:route53:::change/* resource scope rather than the hosted-zone ARN, and the Email module does not wait on Route53 propagation — Postmark verification is API-driven via verifyDkim / verifyReturnPath, which probe DNS from Postmark’s side.)
  2. {fqn}-EmailEncryptionKeyFallbackRole — permission: secretsmanager:GetSecretValue on ${encryptionKeySecret.secretArn}* (full SM-secret ARN; the trailing wildcard tolerates the SM-appended random 6-character suffix — SM versions are selected at API call time via VersionId/VersionStage, not encoded in the resource ARN). Exported as {fqn}-EmailEncryptionKeyFallbackRoleArn.

Both roles share the same trust-policy shape:

assumedBy: new iam.AccountPrincipal(account).withConditions({
ArnLike: { "aws:PrincipalArn": `arn:aws:iam::${account}:role/${fqn}-*` },
}),

This mirrors ImageUploadPreSigningRole: any role in the partition that matches the {fqn}-* name prefix may assume the role. In practice, the partition’s pod role ({fqn}-EksPodRole) is the only such role that an operations-component pod can federate into; the ArnLike condition limits the blast radius to the partition without coupling the role declaration to the pod role’s exact name.

The Phase 5b Email module performs sts:AssumeRole into these roles at the call site — same DQ-204 STS-chain pattern that operations already uses for the image-upload presign flow.

Implementation route — construct reuse with byte-identical Root output. The decision above pins the behavior of the DNS-provisioning role (STS-chained, account-principal + ArnLike trust, partition-scoped permissions). The implementation route refined during analysis: rather than hand-rolling a fresh role, reuse the existing AllowCreatingNSRecordsRole construct (Phase 2; constructs/oam/allow-creating-ns-records-role.ts). Despite the name, the construct’s permissions are already generic Route53 record-set CRUD (ChangeResourceRecordSets, ListResourceRecordSets, ListHostedZonesByName) with allowedParentHostedZoneIds scope-tightening. What needs to change: the trust principal, today hard-coded to iam.ServicePrincipal("lambda.amazonaws.com") with an OrgID condition, must be parameterizable so the Phase-4 instantiation can supply iam.AccountPrincipal(account).withConditions({ ArnLike: ... }).

This generalization carries two hard constraints that must hold simultaneously:

  1. Byte-identical Root-account output. The existing Root-account instantiation in root-dns-stack.ts must produce a CloudFormation template that is byte-identical before and after the construct change. Guarded by a CDK Template.fromStack() snapshot equality unit test (in root-dns-stack.test.ts or allow-creating-ns-records-role.test.ts) that pins the Root resource shape; fails closed if the generalization regresses Root output.
  2. Verified zero drift in deployed Root. A post-deploy verification step (operator-driven; tracked as V-PART-NNN in verification.md) diffs the Root account’s currently-deployed CFN template against the synthesized output post-generalization. Expected diff is empty. Runs before any partition-mail deploy so the Root assertion holds with the construct-as-of-Phase-4 code.

The optional construct rename (e.g., AllowCreatingNSRecordsRoleAllowCreatingDnsRecordsRole) is name-only and reflects the construct’s already-generic Route53 record-set CRUD permissions (the “NSRecords” suffix is a Phase-2 historical artefact).

Update (2026-05-12, applied at design time): the rename can land in the same PR as the construct generalization, provided the CDK construct ID at the call site is preserved. CloudFormation logical IDs derive from the construct’s path (parent ID + construct ID), not from the class name. Concretely: the Root call site new AllowCreatingNSRecordsRole(this, "AllowCreatingNSRecordsRole", …) becomes new AllowCreatingDnsRecordsRole(this, "AllowCreatingNSRecordsRole", …) — the second argument (the construct ID string) stays unchanged, so the synthesized template’s logical IDs are unchanged, and the byte-identity guarantee holds.

The earlier note above (“If the rename is desired, it lands as a separate change after Phase 4’s role-reuse work is verified stable”) is superseded by this update. The Phase 4 design (analysis.md G-IAM-1 + specification.md T-I1 step 2) bundles the rename into PR #1 alongside the byte-identity guard (T-I2), with the call-site mitigation above documented inline. No cascading effect on Phase 4’s spec, requirements, or verification regime — the byte-identity test (T-I2 / V-IAC-002) catches any logical-ID regression regardless of whether it originates in the rename or elsewhere.

Open follow-ups (Phase 4 specification, not blocking the decision):

  • Confirm the arn:aws:iam::${account}:role/${fqn}-* ArnLike pattern matches the partition’s actual pod-role naming convention in every partition (Alpha001 + Alpha002; spot-check both exports during specification).
  • Decide whether the two roles live in the same partition-email stack or split (recommend: same stack — both are Phase 4 partition-mail deliverables, same lifecycle, same RemovalPolicy).
  • Confirm whether route53:ListResourceRecordSets is needed in addition to Change* for the Phase 5b idempotency / pre-check path (recommend: yes; the Email module checks existing records before issuing changes).

Applied to:


Context: Phase 4 fans out across four active partitions. The question is the rollout order across the rollout waves; whether to include the kyle partition (which is suspended at Phase 4 start); and how this aligns with Phase 5b’s deployment cadence.

OptionDescriptionTrade-offs
Adev, kyle first; then stage, demo; then prod. Per the original phases.md Phase 5b recommendation.Standard non-prod-first cascade. But: kyle is suspended at Phase 4 start; including it would mean provisioning a partition that has no operational use case.
Bdevstagedemoprod. Exclude kyle entirely.Matches operational reality (kyle has no live use). dev first still satisfies the arda-nonprod Postmark account-approval prerequisite. Production lands last after non-prod wave validates the pattern.

Recommendation: Option B.

Decision: Option B. Phase 4 rolls out to dev, stage, demo, prod in that order. The kyle partition is excluded from Phase 4 (suspended; the kyle.ardamails.com sub-zone is not provisioned). kyle stays reserved at the ardamails.com level so it cannot be appropriated as a tenant slug while the partition is suspended; the partition can be re-introduced later by replaying the per-partition deploy procedure if it resumes operation. Phase 5b inherits the same order.

Applied to:


Context: Phase 3 introduced tools/corporate-cli.ts (a TypeScript CLI for the Corporate instance group’s two-phase Postmark + DNS provisioning). Phase 4 needs an equivalent operator surface for per-partition mail provisioning. The question is whether to generalize corporate-cli over a partition argument, introduce a parallel partition-mail-cli, or integrate the Phase 4 work into the existing amm.sh operator script that already deploys partition-level resources.

OptionDescriptionTrade-offs
AGeneralize corporate-cli to take an asset+partition pair. Both Corporate and partition mail work flow through the same CLI.One CLI surface. But: stretches corporate-cli beyond its Corporate-instance scope; the partition path mixes with the Corporate path in implementation.
BIntroduce a parallel tools/partition-mail-cli.ts. Each instance group has its own CLI.Scope-aligned naming. But: duplicates corporate-cli’s structure (idempotency, retries, redaction, conflict checks); adds a maintenance surface.
CIntegrate the Phase 4 partition-mail work into amm.sh (the existing Application Runtime deploy script). Phase 4 work follows amm.sh’s rules (idempotency, security, pre-flight checks, partition selection). Extract reusable bash + TypeScript utilities from corporate-cli so both amm.sh and corporate-cli share logic.Aligns with existing operator surface for partition deploys. Familiar workflow. Reusable utilities prevent duplication across the two scripts. Requires refactoring Phase 3 deliverables to extract the shared utilities.

Recommendation: Option C — amm.sh integration with shared utilities.

Decision: Option C. Phase 4 partition-mail provisioning is part of the product runtime platform deployment and is invoked through amm.sh (and its rules: idempotency, security, pre-flight checks, partition selection). Not a standalone partition-mail-cli. Reusable sub-scripts / utilities are extracted from corporate-cli so both amm.sh’s partition path and corporate-cli can share logic; this includes refactoring Phase 3 deliverables as needed to keep each script’s complexity bounded.

Implementation route — TypeScript helpers under tools/, invoked from amm.sh via ts-node. Phase 4 stays with Phase 3’s imperative-then-declarative (Phase A / Phase B) pattern:

  • The extracted utilities (Postmark Account API client, idempotent list-then-create, retry / backoff, output redaction, conflict-check) live as TypeScript modules under tools/lib/ (or equivalent shared location).
  • A new entry script — tools/register-partition-mail-signature.ts — composes these utilities into Phase 4’s partition-mail Phase-A flow: read the Postmark account-level token from the partition’s Arda-{Env}OAM 1P vault (using the Phase 1 1P SDK helper), call the Postmark Account API to register the {partition}.ardamails.com Sender Signature (idempotent: list-then-create), capture the DKIM selector / public key / Return-Path target, and write those values into cdk.context.json (committed). The same utilities back corporate-cli’s Phase-A flow.
  • amm.sh’s direct calls collapse to three: (i) op read the Postmark account-level token (bash; remains in amm.sh for GHA ::add-mask:: hygiene); (ii) npx ts-node tools/register-partition-mail-signature.ts <infrastructure> <partition> (Phase A — Postmark API + context write); (iii) cdk deploy ${infrastructure}-${partition}-Email --parameters PostmarkAccountToken=… (Phase B — declarative CDK deploy).
  • No bash reimplementation of Postmark / 1P logic. amm.sh stays a thin orchestrator; the TS scripts hold the imperative logic. corporate-cli retains its TS entry-point and its Corporate-specific responsibilities (Free Kanban Tool server provisioning, 1P writes for the server token); only the shared helpers move into tools/lib/.
  • CR Lambda migration explicitly deferred (the “future architecture” called out in Phase 3 — the PostmarkSendingDomain thin-wrapper’s public surface is designed to be invariant under that migration). Phase 4 does not pull it forward; doing so would materially expand scope without a forcing function. Future migration is a construct-internals change isolated to platform/constructs/postmark/.

Applied to:

  • phases.md § Phase 4 Scope and Deliverables — “Operator surfaces integrated into amm.sh” bullet; “amm.sh-integrated partition-mail steps” deliverable row.
  • 4-runtime-platform-updates/goal.md Open Design Questions row 6.
  • Phase 4 implementation work — includes refactoring Phase 3’s corporate-cli to extract reusable utilities consumed by both amm.sh and corporate-cli.

Pre-design follow-ups closed (Round R1-Phase4)

Section titled “Pre-design follow-ups closed (Round R1-Phase4)”

After DQ-R1-017..022 were resolved, planning surfaced eight smaller follow-ups (B1..B5, C1..C3) that needed pinning before Phase 4 design could start. Each is “pick the default and move on” rather than load-bearing; collectively they are recorded here for traceability without individual DQ-R1-NNN entries. Full text in 4-runtime-platform-updates/goal.md § Pre-design follow-ups.

IDItemResolution
B1Phase 5a TokenCipher locationShips in common-module as a general-purpose encrypted-field utility (not Email-specific)
B2Postmark account-token deploy-time deliveryδ.1 — amm.sh reads via op, passes to cdk deploy as NoEcho parameter; partition-email stack uses SecretValue.cfnParameter(). Mirrors partitionSecrets.cfn.yaml
B3amm.sh extraction scope from corporate-cliMinimal: extract only what amm.sh’s partition-mail steps need; backfill on demand
B4kyle reservation registryExtend the Phase 3 mechanism used to reserve arda at the ardamails.com level
B5Cross-partition deploy gating in CIOperator-enforced via amm.sh; no tools/cdk-runner.js matrix change
C1CDK stack name${infrastructure}-${partition}-Email (parallels existing -Secrets, -Amplify stacks); immutable — locked at first deploy
C2Per-partition DMARC reporting mailboxReuse dmarc-reports@arda.cards for all partitions (DMARC report content already identifies the source domain)
C3runtime-platform-drift schedule + labelsDaily cron; failure-issue labels drift + runtime-platform; mirrors corporate-drift shape

These resolutions also drive a new Phase 4 deliverable: current-system/oam/security/secret-delivery-pattern.md, documenting the canonical opamm.sh → CFN NoEcho parameter → SM secret → consumer flow with partitionSecrets.cfn.yaml and the Phase 4 Postmark token as worked examples.


DQ-R1-023: Per-Tenant Postmark Sender Signature Introduction (Phase 5b)

Section titled “DQ-R1-023: Per-Tenant Postmark Sender Signature Introduction (Phase 5b)”

Status: Open — to be confirmed at Phase 5b planning. No Phase 4 dependency; Phase 4 provisions the enabling infrastructure (EmailDnsProvisioningRole, partition mail sub-zone) regardless of which way this decision goes.

Context: DQ-R1-017 (Round R1-Phase4) decided that Phase 4 ships one Postmark Sender Signature per partition ({partition}.ardamails.com) and defers per-tenant Signatures to Phase 5b. The Phase 4 design works for sending — tenants sending from {config}.{tenant}.{partition}.ardamails.com use the partition’s DKIM key via Postmark sub-domain inheritance and DMARC relaxed alignment. The trade-off: all tenants in a partition share the partition’s DKIM-domain reputation at the receiver side (Gmail, Microsoft, Yahoo, etc., track reputation by the DKIM d= domain, not by the Postmark Server identifier).

The question is whether Phase 5b should introduce per-tenant Sender Signatures to give per-tenant reputation isolation, and if so, on what schedule.

OptionDescriptionTrade-offs
αStatus quo — all tenants in a partition share the partition Signature; per-tenant Servers exist for token / activity-log isolation but DKIM-domain reputation is shared.No additional Phase 5b work for sending. But: one bad tenant degrades reputation for every tenant in that partition. No remediation path for tenants with persistent bounce / spam issues.
βPer-tenant Signature from v1 — every tenant onboarded in Phase 5b gets its own Sender Signature registered via the Postmark Account API; per-tenant DKIM TXT + Return-Path CNAME records written at tenant onboarding via EmailDnsProvisioningRole.Best reputation isolation. But: additional tenant-onboarding cost (Postmark API call + DNS write per tenant); operational surface grows linearly with tenant count.
γHybrid — opt-in per-tenant Signature — Phase 5b ships with partition Signature as the default; tenants flagged as high-volume or reputation-sensitive (operator-driven or automated based on send volume) are migrated to per-tenant Signatures on demand.Balances cost and isolation. But: introduces an operator decision per tenant; migration path needs design.
δRemediation-only per-tenant Signature — partition Signature is the default; per-tenant Signature is the remediation when a tenant generates a reputation incident.Lowest operational cost. But: by the time remediation is needed, reputation damage has already affected siblings.

Recommendation: To be made at Phase 5b planning, informed by:

  • Actual tenant send volume and bounce / spam rates in Phase 5b’s pilot phase.
  • Postmark’s own guidance at the time (their best practices may evolve).
  • Compliance / contractual requirements specific to tenant cohorts (e.g., enterprise tenants may contractually require reputation isolation).

Phase 4 work that this affects: None. The EmailDnsProvisioningRole (G-IAM-3 in 4-runtime-platform-updates/design/analysis.md) is provisioned regardless — it is the explicit enabler for whichever way this decision goes. Phase 4 ships the infrastructure; Phase 5b decides when to exercise it.

Applied to:


#SummaryStatusDownstream ImpactDecision
DQ-001Tenant sending domain shapeResolvedDNS zone structure, CDK stacks, tenant provisioning scripts, supplier-facing FQDNs<tenant>.<partition>.{mail-root-domain} uniformly (revised per DQ-010)
DQ-002Multi-config domain strategy for v2+Resolvedtenant_email_config schema (nullable config_slug), DNS record structure, v2+ provisioningSub-subdomain (<conf>.<tenant>.<partition>.{mail-root-domain}); v1 provisions at tenant level only
DQ-003Tenant slug sourceResolvedProvisioning request shape, slug derivation logicFrom request (tenantEId, tenantName, tenantSlug); derivation algorithm deferred
DQ-004Reply-To editability in send dialogResolvedSend dialog UI, BFF route contract, GEN::EML and PRO::EML use casesRead-only; system-resolved from procurement contact or user email
DQ-005Email order send paths (copy-paste vs system)ResolvedSPA side panel UX, backend submit signal handler, PRO::EML use casesBoth coexist; copy-paste preserved for email orders, system send added as new path
DQ-006CS alerting scope in v1ResolvedObservability infrastructure, GEN::EML::0004 use case scopingESP OOTB alerting in v1; Arda-built is v2+
DQ-007Document generation responsibilityResolvedEmail service interface contract, PO submit workflow, GEN::EML::0002 use caseCalling feature generates document, passes Blob/URL to email capability
DQ-008Send dialog interaction modelResolvedSPA dialog component, GEN::EML::0001 scenario structureSingle-step dialog; cancel prompts if edits were made
DQ-009Mail root domain choiceResolvedDNS zone creation, registrar delegation, all tenant FQDNs, infrastructure.md parameter resolutionardamails.com (standalone, already owned); implementation parametric
DQ-010Prod tenant zone placementResolvedRoot zone content, IAM scoping, cross-account access, DQ-001 FQDN shapeOwn partition zone; root zone stays static/CDK-only
DQ-011Webhook authentication mechanismResolvedPostmark-events endpoint auth, provisioning flow (Step 5), Webhooks API usageBearer token via modern Webhooks API; reuses existing ARDA_API_KEY validation
DQ-012Per-tenant server token storageResolvedSecrets Manager scope, IAM roles, provisioning flow, emailConfiguration service, DB schemaEncrypted in DB with partition-wide key (via ESO); no per-tenant SM writes; emailConfiguration decrypts for emailJob
DQ-013IAM role extraction from root stackResolvedRoot CDK application structure, deployment procedureDo not extract; role stays in RootDnsStack (CF name: RootConfiguration). Extraction deferred to future need.
DQ-R1-006Locus of cross-zone NS-delegation writesResolvedPhase 2 / Phase 3 / Phase 4 ownership boundaries; deploy-order dependency between Root and child stacksChild stack writes upstream via WriteNSRecordsToUpstreamDns; Root only owns the assume-role IAM target
DQ-R1-007Vault separation for Free Kanban Tool server tokenResolvedPhase 1 typed surface (item removed); Phase 3 reintroduces with new location; threat model — credential out of OP_SERVICE_ACCOUNT_TOKEN blast radiusop://Arda-CorporateOAM/Free-Kanban-Generator-Postmark-Server/credential (separate vault from Arda-SystemsOAM)
DQ-R1-008Adopt vs create the existing ardamails.com zoneResolvedRootConfiguration stack composition; deployment workflow (IMPORT change-set + normal deploy); registrar NS chain preservedAdopt via cdk import against Z0721066239FWCD47EJDX; CDK code mirrors the live AWS-default comment to keep the import read-only; RemovalPolicy.RETAIN defends against accidental destroy
DQ-R1-009Postmark domain-verification target (parent vs leaf)ResolvedPostmarkSendingDomain configuration; operator companion; future Corporate-consumer onboardingVerify at the Corporate-zone parent (arda.ardamails.com); leaves inherit DKIM
DQ-R1-010Locus of Corporate’s NS-delegation write (same-account)ResolvedCorporateMailDns stack composition; behavior under future Corporate-account migrationAlways go through WriteNSRecordsToUpstreamDns and assume the Root role even when same-account
DQ-R1-011route-53-hosted-zone.tsdns-zone.ts migration shapeResolvedConstruct catalogue; Phase 3 PR scope; existing callers (partitions + Root)Rename in place; existing callers updated in the same PR
DQ-R1-012Corporate drift-workflow filename and scopeResolved.github/workflows/ shape; tools/corporate-drift.ts driver design; future Corporate-asset onboardingcorporate-drift.yml — one workflow per instance group, exercising every asset listed in instances/Corporate/
DQ-R1-013Phase A failure ordering for the Postmark server tokenResolvedcorporate-cli.ts Phase A semantics; recovery path on 1P-write failure; testabilityIn-memory buffer + retries on the 1P write; fail loud with redacted summary on permanent failure; manual operator action to recover
DQ-R1-014cdk.context.json commit policyResolvedRepo .gitignore; CI re-synth determinismCommit cdk.context.json with the postmark.free-kanban.* keys (public values)
DQ-R1-015DMARC reporting mailboxResolvedDMARC TXT record content at _dmarc.arda.ardamails.com; operator companion (mailbox provisioning prerequisite)rua=mailto:dmarc-reports@arda.cards; operator provisions the mailbox in Arda’s Google Workspace before Phase B deploy
DQ-R1-016Reserved-name registry scope at arda.ardamails.comResolvedCross-instance-group import coupling; corporate-cli.ts Phase A acceptance criteriaDocumentation-only registry; CLI enforces locally via a Phase-A conflict-check against pre-existing Sender Signatures, servers, and 1P items
DQ-R1-017Postmark Sender Signature granularity per partitionResolvedPhase 4 partition-email stacks; Postmark account split; per-tenant deferral to Phase 5bOne Signature per partition sub-zone; leaves inherit DKIM; per-tenant Signatures deferred
DQ-R1-018corporate-drift rename and scopeResolved.github/workflows/ shape; future runtime-platform drift checks unrelated to emailKeep corporate-drift; add parallel runtime-platform-drift with shared reusable scripts
DQ-R1-019Per-partition email server-token encryption keyResolvedPhase 4 SM secret; Phase 5b TokenCipher + Helm ExternalSecret mounts; future AWS Rotation LambdaSingle SM secret per partition with native versioning; two-axis envelope a{N}.k{SM-VERSION-ID}; hot-swap dual-mount; lazy + coroutine migration; SDK fallback
DQ-R1-020DNS-provisioning + SM-fallback IAM rolesResolvedPhase 4 partition-email stack IAM declarations; Phase 5b STS-chain call sites in the Email module; AllowCreatingNSRecordsRole construct generalization (R-4) with Root no-drift guardTwo per-purpose roles per partition: DNS-records role via reuse of the existing AllowCreatingNSRecordsRole construct (generalized for a configurable trust principal; Root output byte-identical, guarded by unit test + verification); EmailEncryptionKeyFallbackRole fresh. Both STS-chained from the partition pod role; trust policy = account principal + ArnLike on {fqn}-*; mirrors the ImageUploadPreSigningRole pattern
DQ-R1-021Order of partition rolloutResolvedPhase 4 + Phase 5b deploy order; kyle suspensiondevstagedemoprod; kyle excluded
DQ-R1-022Operator CLI shape for Phase 4ResolvedPhase 4 operator surface; refactoring of Phase 3 corporate-cli to extract shared utilitiesIntegrate into amm.sh; share utilities with corporate-cli; no standalone partition-mail-cli
DQ-R1-023Per-tenant Postmark Sender Signature introduction (Phase 5b)Open — TBC at Phase 5b planningPhase 5b tenant-onboarding flow; per-tenant reputation isolation strategy; whether EmailDnsProvisioningRole is exercised per-tenant or held in reserveFour options (α / β / γ / δ); no Phase 4 dependency. To be confirmed when Phase 5b sees pilot data on tenant send volume and bounce / spam rates.