Skip to content

Email Integration -- Revision 1: Phase Structure

This document defines the revised phase structure for the email-integration project, replacing the prior Phase 0 / Phase 1 / Phases 2-6 numbering. The revision is driven by:

The revision keeps the project’s overall goal unchanged. It restructures how the work is split and ordered so that each phase produces a deployable artifact.


#PhaseInstance group / targetDepends onOutput
1External Resources ProvisioningPlatform-levelPostmark accounts created; 1Password items populated; platform/ references populated.
2Root UpdatesRoot instance1 (platform/ reference shape)NS-delegation entry for arda.ardamails.com in the ardamails.com zone; apps/Root/ rename in place.
3Corporate UpdatesCorporate instance1 (Postmark account refs); 2 (arda NS delegation in place)arda.ardamails.com zone live; Free Kanban Tool sending from freekanban.arda.ardamails.com; Corporate App + CLI in place.
4Runtime Platform UpdatesApplication Runtime instances (Alpha001 / Alpha002 / SandboxKyle002)1 (Postmark account refs); 2 (NS-delegation pattern)Per-partition mail sub-zones ({partition}.ardamails.com), per-partition Postmark token secrets, encryption-key secrets, IAM roles for DNS provisioning.
5aComponent Library Updatescommon-module repositoryCross-cutting library additions (sanitizeHeader, AppError.Application, etc.) consumed by the email module.
5bEmail Moduleoperations repository4 (per-partition secrets + IAM); 5a (common-module minor release)Backend ShopAccess/Email module: per-tenant server provisioning, sending, bounce / complaint handling.

Phases must be deployed in the dependency order shown above; once a phase has been deployed, the platform is in a coherent state without requiring later phases to land. Phases 3 and 4 do not depend on each other and may proceed in either order, or in parallel, after Phase 2.


Phase 1 — External Resources Provisioning

Section titled “Phase 1 — External Resources Provisioning”

Goal: ensure all third-party resources Arda consumes exist and have their references captured in the repository.

  • Postmark accounts (PostmarkProd, PostmarkNonProd) created with Platform plan, owner mailbox 2FA enabled, account-level API tokens generated.
  • 1Password items populated:
    • Postmark-Prod and Postmark-NonProd in Arda-SystemsOAM vault (qualified names for the global-utility items; both Postmark accounts in one vault require the qualifier for disambiguation).
    • IAC-SCRIPTS Service Account Token in Arda-SystemsOAM (used by CI for unattended 1Password access).
    • Per-partition copies under item title Postmark in each Arda-{Env}OAM vault are created during Phase 4 partition provisioning (not Phase 1). They follow the standard partition-vault convention: service-name-only title, vault name carries the environment.
  • src/main/cdk/platform/postmark-service.ts populated with the typed account references (POSTMARK_PROD_ACCOUNT, POSTMARK_NONPROD_ACCOUNT).
  • src/main/cdk/platform/one-password.ts populated with the vault and item names that downstream code will reference.
  • GitHub Actions secret provisioned for unattended CI access to 1Password:
    • OP_SERVICE_ACCOUNT_TOKEN
  • Operator runbook sign-off recorded.

Postmark account tokens are not provisioned as separate GHA secrets. CI workflows resolve them at runtime via the 1Password SDK using OP_SERVICE_ACCOUNT_TOKEN and the op://... references in platform/postmark-service.ts. This avoids checking in token values, eliminates a redundant GHA-secret-vs-1P-reference indirection, and matches how local-dev resolves the same tokens (DesktopAuth + 1P SDK).

ArtifactPath
Operator runbookcurrent-system/oam/postmark-service/operator-runbook.md (canonical operator runbook in the documentation repo)
Postmark service-references filesrc/main/cdk/platform/postmark-service.ts
1Password references filesrc/main/cdk/platform/one-password.ts
GHA-secret transition tooltools/gha-secret.ts (per scratch.md N1+ context, kept as a transition utility)
Drift-detection workflow scaffold.github/workflows/<external-resources>.yml (asserts that the external references resolve and the accounts respond)
Postmark service overview + API observations notecurrent-system/oam/postmark-service/index.md and current-system/oam/postmark-service/postmark-api-observations.md (in the documentation repo). The observations note captures authentication models for the API and Webhooks, the error model, idempotency / retry conventions, webhook payload shapes, and version-pin assumptions.

Exactly one repository-scoped secret is required in Arda-cards/infrastructure:

Secret nameSource (1Password reference)Purpose
OP_SERVICE_ACCOUNT_TOKENop://Arda-SystemsOAM/IAC-SCRIPTS Service Account Token/credential1Password service-account auth for unattended CI access; scoped read-only to Arda-SystemsOAM. CI uses this to resolve all other downstream secret references at runtime.

Postmark account tokens (Postmark-Prod / Postmark-NonProd credentials) are resolved by CI at runtime via the 1Password SDK; they are never persisted as GHA secrets, env vars in checked-in files, or context values. Local-dev resolves the same tokens via DesktopAuth + 1P SDK, so the resolution path is identical across environments.

POSTMARK_PROD_ACCOUNT_TOKEN access is restricted to local-dev operator runs and to the future Custom-Resource Lambda’s IAM-scoped secret retrieval — never to CI workflows.

Provisioning of OP_SERVICE_ACCOUNT_TOKEN is performed via tools/gha-secret.ts (libsodium-encrypted upload via Octokit). Rotation is a manual operator step using the same tool.

  • All operator runbook B.1-B.4 items checked.
  • platform/postmark-service.ts and platform/one-password.ts content reviewed and merged.
  • OP_SERVICE_ACCOUNT_TOKEN provisioned in the repository; the secret fail-fast precondition step in CI passes.
  • An operator can read each Postmark account token via the 1Password reference (validated by a thin connectivity test invoking client.secrets.resolve(...)); CI can perform the same resolution using OP_SERVICE_ACCOUNT_TOKEN.

None. This is the foundation.


Goal: prepare the Root instance to accept and delegate the new Corporate sub-domain.

CFN stack-name immutability rule. Renames in TypeScript (folder names, class names, file names) must not change the underlying CloudFormation stack name — the id parameter passed to the Stack constructor stays as published. Changing it forces CFN to delete and recreate the stack, destroying its resources. Any TypeScript-side rename in this phase carries an inline comment at the construct site that calls out the preserved CFN name. This applies generally across every phase and is restated here because Phase 2’s scope contains the most renames.

  • apps/rootConfiguration/apps/Root/ folder rename, with the corresponding minimal edit to deploy-root.sh (path-only, per the §2 scope constraint in runtime-design-review.md). The CDK App’s published CFN stack name ("RootConfiguration") is unchanged.
  • TS class rename: RootConfigurationStackRootDnsStack (at src/main/cdk/stacks/root/root-dns-stack.ts, formerly root-configuration-stack.ts). The constructor’s id argument continues to pass "RootConfiguration" so the CFN stack name is preserved. An inline comment immediately above the constructor call documents the constraint:
    // CFN stack name MUST remain "RootConfiguration" -- changing it would
    // force CloudFormation to delete and recreate the stack.
    new RootDnsStack(app, "ROOT", "RootConfiguration", { ... });
  • instances/Root/dns.ts populated with the declaration of zones owned by Root: the existing arda.cards family (app, io, auth, assets) and the new ardamails.com mail-root zone.
  • ardamails.com PublicHostedZone added in RootDnsStack and exported as arda-ardamails-zone so Phase 3 (Corporate) and Phase 4 (per-partition) can address it as the upstream parent.
  • AllowCreatingNSRecordsRole preserved through the rename, including its export arda-allow-create-ns-record-role. No NS-delegation records for child zones are written by Phase 2 — the child zone owner writes upstream using WriteNSRecordsToUpstreamDns against the Root role, matching the existing per-partition pattern. See DQ-R1-006.
  • No other Root-level resource changes in this phase; Phase 3 writes the arda.ardamails.com NS record into Phase 2’s ardamails.com zone, and Phase 4 does the same per partition sub-zone.
ArtifactPath
App folder renamesrc/main/cdk/apps/rootConfiguration/src/main/cdk/apps/Root/ (CFN stack name "RootConfiguration" preserved)
TS class + file renamesrc/main/cdk/stacks/root/root-configuration-stack.tsroot-dns-stack.ts; class RootConfigurationStackRootDnsStack. Inline comment documents CFN-name preservation at the constructor site.
Script updatedeploy-root.sh — path-only update for the new app folder
Root instance declarationsrc/main/cdk/instances/Root/dns.ts — typed configuration consumed by apps/Root/r53-zones.ts (zone names, expected exports)
ardamails.com zone declarationNew r53.PublicHostedZone for ardamails.com in RootDnsStack, with export arda-ardamails-zone (zone ID) for downstream phases.
AllowCreatingNSRecordsRolePreserved through the rename; export arda-allow-create-ns-record-role unchanged. Phase 2 writes no NS-delegation records; child zone owners (Phase 3, Phase 4) write upstream themselves.
  • apps/Root/ synthesises and deploys successfully against a non-prod environment.
  • cdk diff against the deployed Root stack shows only an additive ardamails.com zone (and its export); no deletions or replacements.
  • Existing Root resources (root zones, IAM role, exports) remain unchanged in behaviour.

The live dig NS arda.ardamails.com assertion belongs to Phase 3 — Phase 3 is what creates the arda.ardamails.com zone and writes the parent NS record using WriteNSRecordsToUpstreamDns against Phase 2’s ardamails.com zone and AllowCreatingNSRecordsRole. See DQ-R1-006.

  • Phase 1 (Postmark account references are not strictly needed by Phase 2, but the platform/postmark-service.ts / platform/one-password.ts files are used as conventions; merging Phase 1 first keeps the instance-declaration shape consistent).

Goal: stand up the Corporate instance group with its first asset (free-kanban-tool) using the new declarative pattern.

  • platform/constructs/postmark/ thin-wrapper constructs (Construct-line shape; lowercase provider folder; leaner than full L2 CDK Constructs).
    • Initial: PostmarkServer, PostmarkSendingDomain. Each follows a Configuration / Built shape but is not a full CDK Construct.
  • constructs/xgress/dns-zone.ts generalised hosted-zone construct (extends current route-53-hosted-zone.ts to cleanly support arda.ardamails.com).
  • constructs/xgress/dns-email-records.ts — relocated and generalised from constructs/email/free-platform-server-records.ts. Generic for any sending sub-domain.
  • stacks/corporate/corporate-mail-dns.tsCorporateMailDns Stack class. Owns the arda.ardamails.com zone via DnsZone. Future: SPF/DMARC at the corporate-zone root.
  • stacks/corporate/free-kanban-tool-mail-dns.tsFreeKanbanToolMailDns Stack class. Composes DnsEmailRecords + PostmarkServer thin-wrapper construct. The Stack passes Built values from PostmarkServer to DnsEmailRecords; neither construct depends on the other.
  • apps/Corporate/index.ts — reusable CorporateApp class; no side effects at module load. Entry script: tools/cdk-corporate.ts — calls new cdk.App() and wires both stacks.
  • instances/Corporate/free-kanban-tool.ts — declarative configuration: Postmark account reference (from platform/postmark-service.ts), sending sub-domain (freekanban.arda.ardamails.com), 1Password item reference for the server token, plan attributes.
  • tools/corporate-cli.ts — operator entry point. Per class of resource. Implements the two-phase orchestration described in the J1 decision:
    • Phase A: invoke the Postmark thin-wrapper (creates / reconciles the Postmark server in the configured account, captures DKIM and Return-Path values). Idempotent. Writes the resulting public values to cdk.context.json. Writes the Postmark server token directly to 1Password as Free-Kanban-Generator-Postmark-Server in the Arda-CorporateOAM vault (canonical ref op://Arda-CorporateOAM/Free-Kanban-Generator-Postmark-Server/credential); the token never traverses CDK context, file artifacts, env vars in the deploy pipeline, or Arda-SystemsOAM (the OAM vault OP_SERVICE_ACCOUNT_TOKEN reads from). Vault separation is recorded in DQ-R1-007.
    • Phase B: invoke cdk deploy on apps/Corporate/; the stacks read the captured values from cdk.context.json and emit the DNS records.
    • The two phases are conceptually two Apps deployed in sequence; the CLI does not formally declare them as CDK Apps. Comments in the CLI source name the phases explicitly so the eventual migration to Custom Resources is unambiguous.

J1 interim mechanism — decision recorded

Section titled “J1 interim mechanism — decision recorded”
AspectChoice
Orchestration locusCLI (option α from the J1 evaluation), not instance-logic. Keeps cdk synth offline-safe and CI-credential-free, preserves the declarative instances/ convention, and bounds the migration cost to (a).
Value-transfer channelcdk.context.json — CDK’s native context mechanism. No invented file format, deterministic re-synth, idempotent across runs.
Channel contentPublic values only: postmark.free-kanban.serverId, postmark.free-kanban.dkimSelector, postmark.free-kanban.dkimKey, postmark.free-kanban.returnPathTarget. The DKIM key is the public half (safe to commit; published in DNS anyway). The Postmark server token is written by Phase A directly to 1Password and does not enter CDK context.
Construct shapeThe PostmarkServer thin-wrapper at platform/constructs/postmark/server.ts exposes its values via a Built interface. In the interim it surfaces the values from CDK context. In the target (a) the construct’s internals emit a Custom Resource Lambda; the Built interface is unchanged. Stack composition code is identical between interim and target.
Migration triggerWhen Lambda-backed Custom Resources become a wider repo pattern. Only the PostmarkServer construct’s internals change; no caller code is touched.

The Stack composition is therefore:

// stacks/corporate/free-kanban-tool-mail-dns.ts -- identical between interim and target
const server = new PostmarkServer(this, 'Server', config);
const records = new DnsEmailRecords(this, 'Records', {
zone,
dkimSelector: server.built.dkimSelector,
dkimKey: server.built.dkimKey,
returnPathTarget: server.built.returnPathTarget,
});

The apps/Corporate/index.ts (CorporateApp class), instances/Corporate/corporate.ts, and the instance-level asset files are pure declarative configuration; orchestration logic does not leak into them.

  • Drift-detection workflow for Corporate — monthly schedule, auto-issue on failure, follows the Phase-0 postmark-foundations:integration template.
  • Reserved-words update: arda is added to the list of zone-names reserved at the ardamails.com level so future tenants in partitions cannot collide with it.
  • Documentation: new pages under current-system/oam/corporate/ (Free Kanban Tool service page, runbook, drift notes); update of runtime/ pages per the runtime-design-review proposal.
ArtifactPath
Postmark thin-wrapperssrc/main/cdk/platform/constructs/postmark/{server,sending-domain}.ts (+ tests)
Postmark Sender Signaturearda.ardamails.com registered in PostmarkProd via PostmarkSendingDomain thin-wrapper at CLI Phase A; verified at the parent (per DQ-R1-009). Leaves inherit DKIM.
Generic DNS zone constructsrc/main/cdk/constructs/xgress/dns-zone.ts (rename-in-place of route-53-hosted-zone.ts per DQ-R1-011; 5 callers in ingress-stack.ts migrated in the same PR)
DNS email records constructsrc/main/cdk/constructs/xgress/dns-email-records.ts (new, generic for any sending sub-domain; props-driven, no env-var bridge)
CorporateMailDns stacksrc/main/cdk/stacks/corporate/corporate-mail-dns.ts (also instantiates WriteNSRecordsToUpstreamDns against Phase 2’s ardamails.com zone, subdomain: "arda", nameServers from the Corporate zone’s own hostedZoneNameServers — per DQ-R1-006)
FreeKanbanToolMailDns stacksrc/main/cdk/stacks/corporate/free-kanban-tool-mail-dns.ts
Corporate App classsrc/main/cdk/apps/Corporate/index.ts
Corporate App entry scriptsrc/main/cdk/tools/cdk-corporate.ts
Corporate Instance declarationsrc/main/cdk/instances/Corporate/free-kanban-tool.ts
Corporate CLItools/corporate-cli.ts
Drift-detection workflow.github/workflows/corporate-drift.yml (instance-group-scoped per DQ-R1-012)
Reserved-words updatesrc/main/cdk/platform/ari-configuration.ts
Typed source-of-truth for sender-domain placementsrc/main/cdk/platform/constructs/postmark/sending-domain.tssendingDomainPlacement() plus dkimRecordFqdn() and returnPathRecordFqdn() helpers. Encodes DQ-R1-009 as a typed function consumed identically by the CLI, the CDK construct, and the drift check. Added during implementation after the placement divergence surfaced in post-deploy verification (see 3-corporate-updates/implementation/dqr1009-divergence.md).
Cross-seam drift assertionstools/corporate-drift.tspostmark:sender-signature-name:*, postmark-dns-agreement:dkim-host, postmark-dns-agreement:return-path-domain checks comparing Postmark’s reported Name / DKIMPendingHost / DKIMHost / ReturnPathDomain against the placement function. Closes the structural test gap that hid the DQ-R1-009 divergence.
Phase 3 implementation byproductsroadmap/in-progress/email-integration/3-corporate-updates/implementation/: phase-b-deploy.md, dqr1009-divergence.md, learnings.md, suggestions.md. Run-time record of what was built, what diverged, what was learned, and what should follow.
Documentationcurrent-system/oam/corporate/{index,free-kanban-tool}.md and updates per the runtime-design-review
Decision-log entryNew DQ for the J1 tradeoff (interim mechanism (b), target mechanism (a), migration trigger)
  • Phase 1 has merged: platform/postmark-service.ts (POSTMARK_PROD_ACCOUNT, POSTMARK_NONPROD_ACCOUNT) and platform/one-password.ts (vault + item references) are present on main. OP_SERVICE_ACCOUNT_TOKEN resolves at CI runtime.
  • Phase 2 has merged: RootDnsStack is deployed; the ardamails.com zone is exported as arda-ardamails-zone; the AllowCreatingNSRecordsRole is exported as arda-allow-create-ns-record-role. cdk diff against the deployed Root stack reports zero differences.
  • The Arda-CorporateOAM 1Password vault exists (provisioned 2026-05-05; per DQ-R1-007). The Free-Kanban-Generator-Postmark-Server item does not yet exist — Phase A creates it.
  • The Postmark PostmarkProd account is reachable via POSTMARK_PROD_ACCOUNT_TOKEN from the operator workstation; no Free Kanban server has been created in any Postmark account yet.
  • apps/Corporate/ synthesises and deploys successfully against the Root account (where the arda.ardamails.com zone resides for now).
  • dig confirms arda.ardamails.com is delegated and the freekanban.arda.ardamails.com records resolve.
  • Postmark verifies DKIM and Return-Path for the Free Kanban Tool sending domain.
  • Drift-detection workflow is registered on main and reports its state on its first scheduled trigger (success or a structured failure that the workflow surfaces via the auto-issue path). First-run failures attributable to CI environment / token format are tracked under PDEV-455 and do not re-open Phase 3.

The Corporate CLI runs in two phases (J1 interim mechanism, this section); each is independently re-runnable.

  • Phase A failure (Postmark thin-wrapper). Phase A is idempotent: re-running with an existing Postmark server returns the captured DKIM / Return-Path values from the Postmark API without creating a duplicate, and re-writes cdk.context.json and the 1Password item to the same values. If Phase A fails after creating the Postmark server but before writing the 1Password item, re-run — the Postmark API call returns the existing server, and the 1Password write completes. If Phase A fails after writing the 1Password item but before updating cdk.context.json, re-run — the captured values are deterministic from the existing server.
  • Phase A succeeds, Phase B not yet run. Coherent intermediate state: the Postmark server exists, the 1Password token is written, but no DNS records exist and no email is delivered. Re-run Phase B (cdk deploy apps/Corporate/) against the same cdk.context.json to complete.
  • Phase B failure (cdk deploy). CFN rolls the stack back automatically. Diagnose (WriteNSRecordsToUpstreamDns Lambda CR is the most failure-prone step — it assumes the Root role across accounts; check the CW log group for the CR Lambda). Re-run.
  • Roll-back. The DNS records and the arda.ardamails.com zone are deletable via cdk destroy apps/Corporate/; the NS-delegation record in ardamails.com is removed by the same WriteNSRecordsToUpstreamDns CR on stack delete. Deleting the Postmark server is supported by the Account API (DELETE /servers/<id>) but is destructive (irrecoverable history loss) — the operator invokes it deliberately, not as part of an automatic rollback. The 1Password item is preserved across cdk destroy and is removed manually by the operator only when the Free Kanban Tool is being decommissioned permanently.
  • Phase 1 (external references in platform/postmark-service.ts).
  • Phase 2 (Root accepts the arda.ardamails.com NS delegation).

Goal: bring per-partition mail capability online for the Application Runtime instance group.

Scope (largely the original “Phase 1” of email-integration, refit to the new structure)

Section titled “Scope (largely the original “Phase 1” of email-integration, refit to the new structure)”
  • Per-partition mail sub-zones: prod.ardamails.com, demo.ardamails.com, dev.ardamails.com, stage.ardamails.com. Each created in the partition’s AWS account via the new DnsZone xgress construct. kyle.ardamails.com is deferred per DQ-R1-021 (kyle partition suspended at Phase 4 start); kyle stays reserved at the ardamails.com level so it cannot be appropriated as a tenant slug while the partition is suspended.
  • NS-delegation entries in the ardamails.com zone (Root) for each new partition sub-zone. Reuses Phase 2 mechanisms.
  • Per-partition Postmark Sender Signatures. One Postmark Sender Signature per partition, anchored at the partition sub-zone ({partition}.ardamails.com). Production partitions (prod, demo) on the PostmarkProd account; non-production partitions (dev, stage) on PostmarkNonProd. Each partition has its own DKIM key (independent receiver-side reputation per environment); leaves under each partition (per-tenant sub-domains) inherit DKIM by default. The granularity decision is pinned in DQ-R1-017 (proposed in the Phase 4 goal artefact under 4-runtime-platform-updates/); the first non-prod Signature also satisfies Postmark Compliance’s pending approval for arda-nonprod.
  • Per-partition Postmark account-token secrets in Secrets Manager (encrypted, ESO-projected to pods). Token sourced from platform/postmark-service.ts references at deploy time via the new partition-aware postmarkCredentialOpReference(partition) accessor.
  • Per-partition encryption-key secrets for tenant token encryption (per DQ-012).
  • IAM roles for DNS provisioning by the runtime emailConfiguration service.
  • Updates to apps/Al1x/partition.ts to instantiate the new partition-mail stacks.
  • Updates to amm.sh — minimal, per §2 scope constraint, individually flagged.
  • Parallel runtime-platform-drift workflow. A new .github/workflows/runtime-platform-drift.yml plus driver under tools/, running alongside the existing corporate-drift (which is not renamed). The new workflow asserts the cross-seam Postmark↔DNS↔placement invariants for every active partition Signature. Logic shared between the two workflows is factored into reusable shell scripts or GitHub Actions composite actions, so subsequent runtime-platform drift checks unrelated to email can plug into the same workflow without mail-centric naming (DQ-R1-018).
  • Operator surfaces integrated into amm.sh. Per DQ-R1-022, Phase 4’s partition-mail provisioning is part of the product runtime platform deployment and is invoked through amm.sh (and its rules: idempotency, security, pre-flight checks, partition selection). Phase 4 does not introduce a standalone partition-mail CLI. Reusable sub-scripts / utilities (bash or TypeScript) are extracted from corporate-cli so both amm.sh’s partition path and corporate-cli can share logic; this includes refactoring Phase 3 deliverables as needed to keep each script’s complexity bounded.
ArtifactPath
Partition email stacksrc/main/cdk/stacks/purpose/partition-email.ts
Partition email instance configextension of src/main/cdk/instances/Alpha001/{prod,demo}.ts and Alpha002/{dev,stage}.ts
Updated appssrc/main/cdk/apps/Al1x/partition.ts
Partition-aware Postmark credential accessorsrc/main/cdk/platform/postmark-service.tspostmarkCredentialOpReference(partition: PartitionId): string returning the op://Arda-{Env}OAM/Postmark/credential reference for the partition’s environment. Consumed by amm.sh (via op read), not by CDK — the resolved value is passed to cdk deploy as a NoEcho parameter (δ.1 pattern, mirrors partitionSecrets.cfn.yaml).
Partition Postmark Sender SignaturesOne per partition, registered via PostmarkSendingDomain thin-wrapper using sendingDomainPlacement() with partition-shaped inputs
Per-partition Postmark account-token SM secretaws_secretsmanager.Secret per partition, name {fqn}-I-EmailPostmarkAccountToken, RemovalPolicy.RETAIN. Declared in partition-email.ts (CDK); value populated via SecretValue.cfnParameter() from a NoEcho CFN parameter that amm.sh supplies on cdk deploy after reading the 1Password reference returned by postmarkCredentialOpReference(partition). Same SM secret serves both the CR Lambda (Sender Signature registration at deploy time) and the runtime ESO mount (operations pod at request time). Pattern documented in current-system/oam/security/secret-delivery-pattern.md.
Per-partition email-token encryption-key SM secretaws_secretsmanager.Secret per partition, name {fqn}-I-EmailEncryptionKey, passwordLength: 64, RemovalPolicy.RETAIN. Single SM secret per partition; rotation uses AWS SM native versioning (AWSCURRENT / AWSPREVIOUS stages). Full design in 4-runtime-platform-updates/design/email-server-key-encryption.md per DQ-R1-019.
Per-partition DNS-records role (via generalized AllowCreatingNSRecordsRole)Reuses the existing AllowCreatingNSRecordsRole construct (Phase 2; constructs/oam/allow-creating-ns-records-role.ts), generalized to accept a configurable trust principal. Instantiated in partition-email.ts with the pod-STS-chain trust principal (iam.AccountPrincipal(account).withConditions({ ArnLike: ... })) and allowedParentHostedZoneIds scoped to the partition’s mail sub-zone. Permissions are already generic Route53 record-set CRUD on the construct side: route53:ChangeResourceRecordSets, route53:ListResourceRecordSets, route53:ListHostedZonesByName. (DQ-R1-020.) route53:GetChange is intentionally omitted (requires arn:aws:route53:::change/* scope; the Email module does not wait on Route53 propagation — Postmark verification is API-driven). The existing Root-account instantiation must remain byte-identical post-generalization — guarded by a CDK Template-equality unit test and a post-deploy Root no-drift verification.
Per-partition EmailEncryptionKeyFallbackRoleFresh purpose-specific IAM role declared in partition-email.ts. Same trust-policy shape as EmailDnsProvisioningRole. Permission: secretsmanager:GetSecretValue on ${encryptionKeySecret.secretArn}* (full SM-secret ARN; the trailing wildcard tolerates the SM-appended random 6-character suffix — SM versions are selected at API call time via VersionId/VersionStage, not encoded in the resource ARN). Used by the Phase 5b TokenCipher SDK-fallback path for envelopes older than AWSPREVIOUS (DQ-R1-019). The operations pod role is not extended; permissions live on the purpose-specific role. (DQ-R1-020.)
runtime-platform-drift workflow (parallel).github/workflows/runtime-platform-drift.yml + driver under tools/. Shares reusable scripts / composite actions with corporate-drift; corporate-drift is not renamed (DQ-R1-018)
amm.sh-integrated partition-mail stepsPhase 4 operator work lives inside amm.sh (or its callees) per DQ-R1-022, following its idempotency / security / check rules. Reusable bash + TypeScript utilities extracted from corporate-cli are shared between amm.sh and corporate-cli; includes refactoring Phase 3 deliverables as needed
amm.sh minimal updatesrepo root
Decision-log entriesDQ-R1-017..022 (Round R1-Phase4): Sender Signature granularity, drift workflow shape, encryption-key derivation, STS-chained IAM roles for DNS provisioning + SM fallback, partition rollout order, operator CLI shape
Documentationpartition mail sections in current-system/runtime/
Secret-delivery pattern docnew current-system/oam/security/secret-delivery-pattern.md documenting the opamm.sh → CFN NoEcho parameter → SM secret → consumer flow (worked examples: partitionSecrets.cfn.yaml + Phase 4 Postmark token); cross-linked from secrets-vault.md

Phase 4 infrastructure prerequisite: partition-aware Postmark credential reference

Section titled “Phase 4 infrastructure prerequisite: partition-aware Postmark credential reference”

platform/postmark-service.ts currently exposes a single credentialReference per Postmark account (op://Arda-SystemsOAM/Postmark-{Prod,NonProd}/credential). Before Phase 4’s deploy tooling can consume per-partition vault copies, the file must be extended with a partition-aware accessor — for example:

postmarkCredentialOpReference(partition: PartitionId): string
// returns "op://Arda-ProdOAM/Postmark/credential" for prod,
// "op://Arda-DevOAM/Postmark/credential" for dev, etc.

This change is scoped to platform/postmark-service.ts in the infrastructure repository; it does not affect Phase 3. The accessor is consumed by amm.sh (which calls op read on the returned reference and supplies the resolved value to cdk deploy as a NoEcho parameter); CDK itself has no 1Password dependency. Pattern: see current-system/oam/security/secret-delivery-pattern.md.

  • Phases 1 and 2 have merged (Postmark account references in platform/postmark-service.ts; ardamails.com zone + AllowCreatingNSRecordsRole exported from Root).
  • platform/postmark-service.ts exposes a partition-aware credential accessor (see prerequisite above).
  • Each partition’s Arda-{Env}OAM vault holds an item titled Postmark with credential field set to the relevant Postmark account token (following the service-name-only convention; provisioned as part of Phase 4 operator work, analogous to Phase 1’s provisioning of the Arda-SystemsOAM global-utility items).
  • Each partition’s AWS account has the IAM permissions to (a) create hosted zones, (b) create / read Secrets Manager entries, and (c) assume the AllowCreatingNSRecordsRole in the Root account.
  • The ESO ClusterSecretStore in each partition cluster is configured to read from the partition’s AWS Secrets Manager (already in place; restated as a precondition).
  • Each partition’s mail sub-zone is delegated and populated with required base records.
  • Per-partition Postmark token + encryption-key secrets are accessible to the operations service via ESO.
  • IAM role allows runtime DNS provisioning for tenant sub-domains.
  • Component repositories (Phase 5) can read the cross-stack exports.

Phase 4 fans out across four active partitions (prod, demo, dev, stage) hosted in two AWS accounts: Alpha001 carries prod + demo; Alpha002 carries dev + stage. Each partition’s mail stack deploys independently within its Infrastructure’s account; partitions sharing an account remain CFN-independent. The kyle partition is suspended at Phase 4 start (DQ-R1-021) and is not included in the rollout; replay the per-partition deploy procedure when/if kyle resumes operation.

  • Partition independence. A failed deploy in one partition does not block the others. Re-run cdk deploy apps/Al1x/<partition> for the affected partition only; the remaining partitions are unaffected by the failure or the retry.
  • NS-delegation atomicity. Each partition’s stack writes its NS-delegation record into the Root ardamails.com zone on stack creation via WriteNSRecordsToUpstreamDns (per DQ-R1-006). If the stack create succeeds but the CR Lambda fails the cross-account assume-role, the partition zone exists in its Infrastructure’s account but is not delegated — dig NS <partition>.ardamails.com returns NODATA. Re-deploying the stack invokes the CR again on stack update; once the assume-role succeeds, the NS record set is created and propagation is normal.
  • Secrets Manager retention. Per-partition Postmark account-token secrets and encryption-key secrets carry RemovalPolicy.RETAIN so cdk destroy does not delete them. Intentional removal is a deliberate operator step (delete the Secrets Manager entry through the AWS console or CLI). This defends against accidental loss of the encryption key, which would render every encrypted-at-rest tenant token unrecoverable.
  • Recommended deploy order (DQ-R1-021): devstagedemoprod. dev first because it also satisfies the arda-nonprod Postmark account-approval prerequisite (a Sender Signature on dev.ardamails.com is the account’s first); the lower-blast-radius non-production partitions land before production. The order can be condensed if early partitions are clean; do not skip the validation cycles. kyle is excluded from the rollout per DQ-R1-021.
  • Phase 1 (Postmark account references).
  • Phase 2 (Root accepts new NS delegations).
  • Independent of Phase 3 (Corporate). Phases 3 and 4 can land in either order or in parallel.

Goal: deliver the runtime email capability in the application stack.

Cross-cutting library additions consumed by the email module: sanitizeHeader, AppError.Application, idempotent-key helpers, etc. Released as a common-module minor version. The operations repository consumes this library.

  • New / updated classes and helpers in common-module.
  • Unit tests + version bump + CHANGELOG.
  • common-module main is at a state where the new helpers can be added without conflicting with in-flight work; no library work is required to land first.
  • common-module published with the new APIs.
  • operations builds against the new version without regression.
  • None on Phases 1-4 (library work is purely Kotlin); but its consumers in 5b need it merged first.

Backend ShopAccess/Email module: per-tenant Postmark-server provisioning, sending APIs, bounce / complaint webhook handling, suppression-list maintenance.

  • Module code (shopaccess/email/...) + Flyway migrations + integration tests + Helm chart updates (apis.system.shopAccess.email entry; ESO ExternalSecret entries).
  • API Gateway routes declared in the operations repo’s CloudFormation files for the three L4 endpoints (email-job, email-configuration, postmark-events webhook). API-gateway-level authorisers are out of scope for this project; authentication for each route is performed in-component by the receiving Ktor server (Bearer-token validation for the webhook route, the Application Runtime’s existing scheme for consumer routes).
  • Gradle property updates in the operations repo (and in common-module if applicable) to provide the property values needed for linting, local deployments, and CI test runs of the new module.
  • Phase 4 has deployed to every partition that 5b will target: per-partition Postmark token secrets, encryption-key secrets, and DNS-provisioning IAM roles exist in Secrets Manager / IAM and are projected through ESO into the partition cluster.
  • Phase 5a has published a common-module minor with the new APIs to Artifactory; the operations repo’s gradle.properties is ready to bump to that version.
  • API Gateway path slots for email-job, email-configuration, postmark-events are reservable in the partition’s API Gateway CFN stacks (no path collision with existing routes).
  • Module deployable to all partitions; passes integration tests against a real PostmarkNonProd surface.
  • Webhooks reachable from Postmark to the partition’s API gateway.
  • Webhook-route registration failure. If the postmark-events route fails to register at API Gateway during deploy, Postmark’s webhook calls return 404. Postmark retries failed webhook calls for 7 days; events accumulate in its retry queue and replay automatically once the route comes up. Re-deploy the API Gateway CFN stack; verify with a manual Postmark test event from the Postmark Console.
  • Flyway migration failure mid-deploy. Flyway runs forward-only. A migration that fails partway leaves the partition’s database in an intermediate schema state. Recovery is fix-forward (author the next migration to restore invariants and deploy it); manual SQL intervention is an escalation path, not a routine deploy step. Production migrations run in a single transaction where possible to constrain the blast radius.
  • Per-partition rollout. 5b deploys to one partition at a time; a failure in one partition does not affect the others. Re-deploy the failed partition once the cause is fixed. Recommended order mirrors Phase 4 (per DQ-R1-021): devstagedemoprod. kyle is excluded for as long as the partition is suspended.
  • Phase 4 (per-partition Postmark token + encryption-key secrets exist in Secrets Manager).
  • Phase 5a (common-module minor with the new APIs is published).

Open design questions (to be confirmed at Phase 5b planning)

Section titled “Open design questions (to be confirmed at Phase 5b planning)”
  • DQ-R1-023 — Per-tenant Postmark Sender Signature introduction. Phase 4 ships one Signature per partition ({partition}.ardamails.com) per DQ-R1-017. All tenants in a partition share that Signature’s DKIM-domain reputation under DMARC relaxed alignment. Phase 5b decides whether to introduce per-tenant Signatures (and the associated per-tenant DKIM TXT + Return-Path CNAME records) for per-tenant reputation isolation. Four options: α status quo (partition Signature for all), β per-tenant from v1, γ hybrid opt-in, δ remediation-only. No Phase 4 dependency — Phase 4 provisions EmailDnsProvisioningRole (the runtime DNS-write capability) regardless, so whichever way DQ-R1-023 resolves, no Phase 4 re-work is needed.

PlantUML diagram

  • Phase 1 must land before any other phase (its platform/ references and OP_SERVICE_ACCOUNT_TOKEN are dependencies for every later phase).
  • Phase 2 must land before Phase 3 and Phase 4 (both consume Root’s NS-delegation mechanism).
  • Phase 3 and Phase 4 are independent of each other and may run in either order or in parallel.
  • Phase 5a is independent of Phases 1-4 (Kotlin library work).
  • Phase 5b depends on both Phase 4 (per-partition infrastructure) and Phase 5a (consumed library APIs).

Prior labelNew mapping
Phase 0 (Postmark Foundations)Distributed across Phases 1, 2, 3. The current PR #445 corresponds approximately to a partial Phase 1 + partial Phase 3 (with the partition coupling that needs unwinding). PR-#445 disposition is deferred per scratch.md E.
Phase 1 (Infrastructure — partition zones, secrets, IAM)Phase 4 in the new numbering, with the per-partition mail sub-zones still owned by the partition (per the Q5 answer in runtime-design-review.md §7).
Phases 2-6 (Backend module)Phase 5b.
Common-module additionsPhase 5a.

The detailed contents of the current 0-postmark-foundations/ and 1-infrastructure/ specification trees will be merged / reorganised under the new phase folders as part of the consolidation in follow-up-updates.md.