Email Integration -- Revision 1: Phase Structure
This document defines the revised phase structure for the email-integration project, replacing the prior Phase 0 / Phase 1 / Phases 2-6 numbering. The revision is driven by:
- The runtime-design principles adopted in the Runtime Overview.
- The architectural decisions captured in the design reviews under
reviews/R1/— seeruntime-design-review.md,phase-0-infrastructure-pr-review.md, andscratch.md. - The need to sequence work so that each phase is independently codable and deployable, with no “hold” dependencies on subsequent phases.
The revision keeps the project’s overall goal unchanged. It restructures how the work is split and ordered so that each phase produces a deployable artifact.
Phase Sequence
Section titled “Phase Sequence”| # | Phase | Instance group / target | Depends on | Output |
|---|---|---|---|---|
| 1 | External Resources Provisioning | Platform-level | — | Postmark accounts created; 1Password items populated; platform/ references populated. |
| 2 | Root Updates | Root instance | 1 (platform/ reference shape) | NS-delegation entry for arda.ardamails.com in the ardamails.com zone; apps/Root/ rename in place. |
| 3 | Corporate Updates | Corporate instance | 1 (Postmark account refs); 2 (arda NS delegation in place) | arda.ardamails.com zone live; Free Kanban Tool sending from freekanban.arda.ardamails.com; Corporate App + CLI in place. |
| 4 | Runtime Platform Updates | Application Runtime instances (Alpha001 / Alpha002 / SandboxKyle002) | 1 (Postmark account refs); 2 (NS-delegation pattern) | Per-partition mail sub-zones ({partition}.ardamails.com), per-partition Postmark token secrets, encryption-key secrets, IAM roles for DNS provisioning. |
| 5a | Component Library Updates | common-module repository | — | Cross-cutting library additions (sanitizeHeader, AppError.Application, etc.) consumed by the email module. |
| 5b | Email Module | operations repository | 4 (per-partition secrets + IAM); 5a (common-module minor release) | Backend ShopAccess/Email module: per-tenant server provisioning, sending, bounce / complaint handling. |
Phases must be deployed in the dependency order shown above; once a phase has been deployed, the platform is in a coherent state without requiring later phases to land. Phases 3 and 4 do not depend on each other and may proceed in either order, or in parallel, after Phase 2.
Phase 1 — External Resources Provisioning
Section titled “Phase 1 — External Resources Provisioning”Goal: ensure all third-party resources Arda consumes exist and have their references captured in the repository.
- Postmark accounts (PostmarkProd, PostmarkNonProd) created with Platform plan, owner mailbox 2FA enabled, account-level API tokens generated.
- 1Password items populated:
Postmark-ProdandPostmark-NonProdinArda-SystemsOAMvault (qualified names for the global-utility items; both Postmark accounts in one vault require the qualifier for disambiguation).IAC-SCRIPTS Service Account TokeninArda-SystemsOAM(used by CI for unattended 1Password access).- Per-partition copies under item title
Postmarkin eachArda-{Env}OAMvault are created during Phase 4 partition provisioning (not Phase 1). They follow the standard partition-vault convention: service-name-only title, vault name carries the environment.
src/main/cdk/platform/postmark-service.tspopulated with the typed account references (POSTMARK_PROD_ACCOUNT,POSTMARK_NONPROD_ACCOUNT).src/main/cdk/platform/one-password.tspopulated with the vault and item names that downstream code will reference.- GitHub Actions secret provisioned for unattended CI access to 1Password:
OP_SERVICE_ACCOUNT_TOKEN
- Operator runbook sign-off recorded.
Postmark account tokens are not provisioned as separate GHA secrets. CI workflows resolve them at runtime via the 1Password SDK using
OP_SERVICE_ACCOUNT_TOKENand theop://...references inplatform/postmark-service.ts. This avoids checking in token values, eliminates a redundant GHA-secret-vs-1P-reference indirection, and matches how local-dev resolves the same tokens (DesktopAuth + 1P SDK).
Deliverables
Section titled “Deliverables”| Artifact | Path |
|---|---|
| Operator runbook | current-system/oam/postmark-service/operator-runbook.md (canonical operator runbook in the documentation repo) |
| Postmark service-references file | src/main/cdk/platform/postmark-service.ts |
| 1Password references file | src/main/cdk/platform/one-password.ts |
| GHA-secret transition tool | tools/gha-secret.ts (per scratch.md N1+ context, kept as a transition utility) |
| Drift-detection workflow scaffold | .github/workflows/<external-resources>.yml (asserts that the external references resolve and the accounts respond) |
| Postmark service overview + API observations note | current-system/oam/postmark-service/index.md and current-system/oam/postmark-service/postmark-api-observations.md (in the documentation repo). The observations note captures authentication models for the API and Webhooks, the error model, idempotency / retry conventions, webhook payload shapes, and version-pin assumptions. |
Required GitHub Actions secrets
Section titled “Required GitHub Actions secrets”Exactly one repository-scoped secret is required in Arda-cards/infrastructure:
| Secret name | Source (1Password reference) | Purpose |
|---|---|---|
OP_SERVICE_ACCOUNT_TOKEN | op://Arda-SystemsOAM/IAC-SCRIPTS Service Account Token/credential | 1Password service-account auth for unattended CI access; scoped read-only to Arda-SystemsOAM. CI uses this to resolve all other downstream secret references at runtime. |
Postmark account tokens (Postmark-Prod / Postmark-NonProd credentials) are resolved by CI at runtime via the 1Password SDK; they are never persisted as GHA secrets, env vars in checked-in files, or context values. Local-dev resolves the same tokens via DesktopAuth + 1P SDK, so the resolution path is identical across environments.
POSTMARK_PROD_ACCOUNT_TOKEN access is restricted to local-dev operator runs and to the future Custom-Resource Lambda’s IAM-scoped secret retrieval — never to CI workflows.
Provisioning of OP_SERVICE_ACCOUNT_TOKEN is performed via tools/gha-secret.ts (libsodium-encrypted upload via Octokit). Rotation is a manual operator step using the same tool.
Exit criteria
Section titled “Exit criteria”- All operator runbook B.1-B.4 items checked.
platform/postmark-service.tsandplatform/one-password.tscontent reviewed and merged.OP_SERVICE_ACCOUNT_TOKENprovisioned in the repository; the secret fail-fast precondition step in CI passes.- An operator can read each Postmark account token via the 1Password reference (validated by a thin connectivity test invoking
client.secrets.resolve(...)); CI can perform the same resolution usingOP_SERVICE_ACCOUNT_TOKEN.
Dependencies
Section titled “Dependencies”None. This is the foundation.
Phase 2 — Root Updates
Section titled “Phase 2 — Root Updates”Goal: prepare the Root instance to accept and delegate the new Corporate sub-domain.
CFN stack-name immutability rule. Renames in TypeScript (folder names, class names, file names) must not change the underlying CloudFormation stack name — the
idparameter passed to theStackconstructor stays as published. Changing it forces CFN to delete and recreate the stack, destroying its resources. Any TypeScript-side rename in this phase carries an inline comment at the construct site that calls out the preserved CFN name. This applies generally across every phase and is restated here because Phase 2’s scope contains the most renames.
apps/rootConfiguration/→apps/Root/folder rename, with the corresponding minimal edit todeploy-root.sh(path-only, per the §2 scope constraint in runtime-design-review.md). The CDK App’s published CFN stack name ("RootConfiguration") is unchanged.- TS class rename:
RootConfigurationStack→RootDnsStack(atsrc/main/cdk/stacks/root/root-dns-stack.ts, formerlyroot-configuration-stack.ts). The constructor’sidargument continues to pass"RootConfiguration"so the CFN stack name is preserved. An inline comment immediately above the constructor call documents the constraint:// CFN stack name MUST remain "RootConfiguration" -- changing it would// force CloudFormation to delete and recreate the stack.new RootDnsStack(app, "ROOT", "RootConfiguration", { ... }); instances/Root/dns.tspopulated with the declaration of zones owned by Root: the existingarda.cardsfamily (app,io,auth,assets) and the newardamails.commail-root zone.ardamails.comPublicHostedZoneadded inRootDnsStackand exported asarda-ardamails-zoneso Phase 3 (Corporate) and Phase 4 (per-partition) can address it as the upstream parent.AllowCreatingNSRecordsRolepreserved through the rename, including its exportarda-allow-create-ns-record-role. No NS-delegation records for child zones are written by Phase 2 — the child zone owner writes upstream usingWriteNSRecordsToUpstreamDnsagainst the Root role, matching the existing per-partition pattern. SeeDQ-R1-006.- No other Root-level resource changes in this phase; Phase 3 writes the
arda.ardamails.comNS record into Phase 2’sardamails.comzone, and Phase 4 does the same per partition sub-zone.
Deliverables
Section titled “Deliverables”| Artifact | Path |
|---|---|
| App folder rename | src/main/cdk/apps/rootConfiguration/ → src/main/cdk/apps/Root/ (CFN stack name "RootConfiguration" preserved) |
| TS class + file rename | src/main/cdk/stacks/root/root-configuration-stack.ts → root-dns-stack.ts; class RootConfigurationStack → RootDnsStack. Inline comment documents CFN-name preservation at the constructor site. |
| Script update | deploy-root.sh — path-only update for the new app folder |
| Root instance declaration | src/main/cdk/instances/Root/dns.ts — typed configuration consumed by apps/Root/r53-zones.ts (zone names, expected exports) |
ardamails.com zone declaration | New r53.PublicHostedZone for ardamails.com in RootDnsStack, with export arda-ardamails-zone (zone ID) for downstream phases. |
AllowCreatingNSRecordsRole | Preserved through the rename; export arda-allow-create-ns-record-role unchanged. Phase 2 writes no NS-delegation records; child zone owners (Phase 3, Phase 4) write upstream themselves. |
Exit criteria
Section titled “Exit criteria”apps/Root/synthesises and deploys successfully against a non-prod environment.cdk diffagainst the deployed Root stack shows only an additiveardamails.comzone (and its export); no deletions or replacements.- Existing Root resources (root zones, IAM role, exports) remain unchanged in behaviour.
The live
dig NS arda.ardamails.comassertion belongs to Phase 3 — Phase 3 is what creates thearda.ardamails.comzone and writes the parent NS record usingWriteNSRecordsToUpstreamDnsagainst Phase 2’sardamails.comzone andAllowCreatingNSRecordsRole. SeeDQ-R1-006.
Dependencies
Section titled “Dependencies”- Phase 1 (Postmark account references are not strictly needed by Phase 2, but the
platform/postmark-service.ts/platform/one-password.tsfiles are used as conventions; merging Phase 1 first keeps the instance-declaration shape consistent).
Phase 3 — Corporate Updates
Section titled “Phase 3 — Corporate Updates”Goal: stand up the Corporate instance group with its first asset (free-kanban-tool) using the new declarative pattern.
platform/constructs/postmark/thin-wrapper constructs (Construct-line shape; lowercase provider folder; leaner than full L2 CDK Constructs).- Initial:
PostmarkServer,PostmarkSendingDomain. Each follows aConfiguration/Builtshape but is not a full CDK Construct.
- Initial:
constructs/xgress/dns-zone.tsgeneralised hosted-zone construct (extends currentroute-53-hosted-zone.tsto cleanly supportarda.ardamails.com).constructs/xgress/dns-email-records.ts— relocated and generalised fromconstructs/email/free-platform-server-records.ts. Generic for any sending sub-domain.stacks/corporate/corporate-mail-dns.ts—CorporateMailDnsStack class. Owns thearda.ardamails.comzone viaDnsZone. Future: SPF/DMARC at the corporate-zone root.stacks/corporate/free-kanban-tool-mail-dns.ts—FreeKanbanToolMailDnsStack class. ComposesDnsEmailRecords+PostmarkServerthin-wrapper construct. The Stack passesBuiltvalues fromPostmarkServertoDnsEmailRecords; neither construct depends on the other.apps/Corporate/index.ts— reusableCorporateAppclass; no side effects at module load. Entry script:tools/cdk-corporate.ts— callsnew cdk.App()and wires both stacks.instances/Corporate/free-kanban-tool.ts— declarative configuration: Postmark account reference (fromplatform/postmark-service.ts), sending sub-domain (freekanban.arda.ardamails.com), 1Password item reference for the server token, plan attributes.tools/corporate-cli.ts— operator entry point. Per class of resource. Implements the two-phase orchestration described in the J1 decision:- Phase A: invoke the Postmark thin-wrapper (creates / reconciles the Postmark server in the configured account, captures DKIM and Return-Path values). Idempotent. Writes the resulting public values to
cdk.context.json. Writes the Postmark server token directly to 1Password asFree-Kanban-Generator-Postmark-Serverin theArda-CorporateOAMvault (canonical refop://Arda-CorporateOAM/Free-Kanban-Generator-Postmark-Server/credential); the token never traverses CDK context, file artifacts, env vars in the deploy pipeline, orArda-SystemsOAM(the OAM vaultOP_SERVICE_ACCOUNT_TOKENreads from). Vault separation is recorded inDQ-R1-007. - Phase B: invoke
cdk deployonapps/Corporate/; the stacks read the captured values fromcdk.context.jsonand emit the DNS records. - The two phases are conceptually two Apps deployed in sequence; the CLI does not formally declare them as CDK Apps. Comments in the CLI source name the phases explicitly so the eventual migration to Custom Resources is unambiguous.
- Phase A: invoke the Postmark thin-wrapper (creates / reconciles the Postmark server in the configured account, captures DKIM and Return-Path values). Idempotent. Writes the resulting public values to
J1 interim mechanism — decision recorded
Section titled “J1 interim mechanism — decision recorded”| Aspect | Choice |
|---|---|
| Orchestration locus | CLI (option α from the J1 evaluation), not instance-logic. Keeps cdk synth offline-safe and CI-credential-free, preserves the declarative instances/ convention, and bounds the migration cost to (a). |
| Value-transfer channel | cdk.context.json — CDK’s native context mechanism. No invented file format, deterministic re-synth, idempotent across runs. |
| Channel content | Public values only: postmark.free-kanban.serverId, postmark.free-kanban.dkimSelector, postmark.free-kanban.dkimKey, postmark.free-kanban.returnPathTarget. The DKIM key is the public half (safe to commit; published in DNS anyway). The Postmark server token is written by Phase A directly to 1Password and does not enter CDK context. |
| Construct shape | The PostmarkServer thin-wrapper at platform/constructs/postmark/server.ts exposes its values via a Built interface. In the interim it surfaces the values from CDK context. In the target (a) the construct’s internals emit a Custom Resource Lambda; the Built interface is unchanged. Stack composition code is identical between interim and target. |
| Migration trigger | When Lambda-backed Custom Resources become a wider repo pattern. Only the PostmarkServer construct’s internals change; no caller code is touched. |
The Stack composition is therefore:
// stacks/corporate/free-kanban-tool-mail-dns.ts -- identical between interim and targetconst server = new PostmarkServer(this, 'Server', config);const records = new DnsEmailRecords(this, 'Records', { zone, dkimSelector: server.built.dkimSelector, dkimKey: server.built.dkimKey, returnPathTarget: server.built.returnPathTarget,});The apps/Corporate/index.ts (CorporateApp class), instances/Corporate/corporate.ts, and the instance-level asset files are pure declarative configuration; orchestration logic does not leak into them.
- Drift-detection workflow for Corporate — monthly schedule, auto-issue on failure, follows the Phase-0
postmark-foundations:integrationtemplate. - Reserved-words update:
ardais added to the list of zone-names reserved at theardamails.comlevel so future tenants in partitions cannot collide with it. - Documentation: new pages under
current-system/oam/corporate/(Free Kanban Tool service page, runbook, drift notes); update ofruntime/pages per the runtime-design-review proposal.
Deliverables
Section titled “Deliverables”| Artifact | Path |
|---|---|
| Postmark thin-wrappers | src/main/cdk/platform/constructs/postmark/{server,sending-domain}.ts (+ tests) |
| Postmark Sender Signature | arda.ardamails.com registered in PostmarkProd via PostmarkSendingDomain thin-wrapper at CLI Phase A; verified at the parent (per DQ-R1-009). Leaves inherit DKIM. |
| Generic DNS zone construct | src/main/cdk/constructs/xgress/dns-zone.ts (rename-in-place of route-53-hosted-zone.ts per DQ-R1-011; 5 callers in ingress-stack.ts migrated in the same PR) |
| DNS email records construct | src/main/cdk/constructs/xgress/dns-email-records.ts (new, generic for any sending sub-domain; props-driven, no env-var bridge) |
CorporateMailDns stack | src/main/cdk/stacks/corporate/corporate-mail-dns.ts (also instantiates WriteNSRecordsToUpstreamDns against Phase 2’s ardamails.com zone, subdomain: "arda", nameServers from the Corporate zone’s own hostedZoneNameServers — per DQ-R1-006) |
FreeKanbanToolMailDns stack | src/main/cdk/stacks/corporate/free-kanban-tool-mail-dns.ts |
| Corporate App class | src/main/cdk/apps/Corporate/index.ts |
| Corporate App entry script | src/main/cdk/tools/cdk-corporate.ts |
| Corporate Instance declaration | src/main/cdk/instances/Corporate/free-kanban-tool.ts |
| Corporate CLI | tools/corporate-cli.ts |
| Drift-detection workflow | .github/workflows/corporate-drift.yml (instance-group-scoped per DQ-R1-012) |
| Reserved-words update | src/main/cdk/platform/ari-configuration.ts |
| Typed source-of-truth for sender-domain placement | src/main/cdk/platform/constructs/postmark/sending-domain.ts — sendingDomainPlacement() plus dkimRecordFqdn() and returnPathRecordFqdn() helpers. Encodes DQ-R1-009 as a typed function consumed identically by the CLI, the CDK construct, and the drift check. Added during implementation after the placement divergence surfaced in post-deploy verification (see 3-corporate-updates/implementation/dqr1009-divergence.md). |
| Cross-seam drift assertions | tools/corporate-drift.ts — postmark:sender-signature-name:*, postmark-dns-agreement:dkim-host, postmark-dns-agreement:return-path-domain checks comparing Postmark’s reported Name / DKIMPendingHost / DKIMHost / ReturnPathDomain against the placement function. Closes the structural test gap that hid the DQ-R1-009 divergence. |
| Phase 3 implementation byproducts | roadmap/in-progress/email-integration/3-corporate-updates/implementation/: phase-b-deploy.md, dqr1009-divergence.md, learnings.md, suggestions.md. Run-time record of what was built, what diverged, what was learned, and what should follow. |
| Documentation | current-system/oam/corporate/{index,free-kanban-tool}.md and updates per the runtime-design-review |
| Decision-log entry | New DQ for the J1 tradeoff (interim mechanism (b), target mechanism (a), migration trigger) |
Entry criteria
Section titled “Entry criteria”- Phase 1 has merged:
platform/postmark-service.ts(POSTMARK_PROD_ACCOUNT,POSTMARK_NONPROD_ACCOUNT) andplatform/one-password.ts(vault + item references) are present onmain.OP_SERVICE_ACCOUNT_TOKENresolves at CI runtime. - Phase 2 has merged:
RootDnsStackis deployed; theardamails.comzone is exported asarda-ardamails-zone; theAllowCreatingNSRecordsRoleis exported asarda-allow-create-ns-record-role.cdk diffagainst the deployed Root stack reports zero differences. - The
Arda-CorporateOAM1Password vault exists (provisioned 2026-05-05; perDQ-R1-007). TheFree-Kanban-Generator-Postmark-Serveritem does not yet exist — Phase A creates it. - The Postmark
PostmarkProdaccount is reachable viaPOSTMARK_PROD_ACCOUNT_TOKENfrom the operator workstation; no Free Kanban server has been created in any Postmark account yet.
Exit criteria
Section titled “Exit criteria”apps/Corporate/synthesises and deploys successfully against the Root account (where thearda.ardamails.comzone resides for now).digconfirmsarda.ardamails.comis delegated and thefreekanban.arda.ardamails.comrecords resolve.- Postmark verifies DKIM and Return-Path for the Free Kanban Tool sending domain.
- Drift-detection workflow is registered on
mainand reports its state on its first scheduled trigger (successor a structured failure that the workflow surfaces via the auto-issue path). First-run failures attributable to CI environment / token format are tracked under PDEV-455 and do not re-open Phase 3.
Recovery / partial-failure handling
Section titled “Recovery / partial-failure handling”The Corporate CLI runs in two phases (J1 interim mechanism, this section); each is independently re-runnable.
- Phase A failure (Postmark thin-wrapper). Phase A is idempotent: re-running with an existing Postmark server returns the captured DKIM / Return-Path values from the Postmark API without creating a duplicate, and re-writes
cdk.context.jsonand the 1Password item to the same values. If Phase A fails after creating the Postmark server but before writing the 1Password item, re-run — the Postmark API call returns the existing server, and the 1Password write completes. If Phase A fails after writing the 1Password item but before updatingcdk.context.json, re-run — the captured values are deterministic from the existing server. - Phase A succeeds, Phase B not yet run. Coherent intermediate state: the Postmark server exists, the 1Password token is written, but no DNS records exist and no email is delivered. Re-run Phase B (
cdk deploy apps/Corporate/) against the samecdk.context.jsonto complete. - Phase B failure (
cdk deploy). CFN rolls the stack back automatically. Diagnose (WriteNSRecordsToUpstreamDnsLambda CR is the most failure-prone step — it assumes the Root role across accounts; check the CW log group for the CR Lambda). Re-run. - Roll-back. The DNS records and the
arda.ardamails.comzone are deletable viacdk destroy apps/Corporate/; the NS-delegation record inardamails.comis removed by the sameWriteNSRecordsToUpstreamDnsCR on stack delete. Deleting the Postmark server is supported by the Account API (DELETE /servers/<id>) but is destructive (irrecoverable history loss) — the operator invokes it deliberately, not as part of an automatic rollback. The 1Password item is preserved acrosscdk destroyand is removed manually by the operator only when the Free Kanban Tool is being decommissioned permanently.
Dependencies
Section titled “Dependencies”- Phase 1 (external references in
platform/postmark-service.ts). - Phase 2 (Root accepts the
arda.ardamails.comNS delegation).
Phase 4 — Runtime Platform Updates
Section titled “Phase 4 — Runtime Platform Updates”Goal: bring per-partition mail capability online for the Application Runtime instance group.
Scope (largely the original “Phase 1” of email-integration, refit to the new structure)
Section titled “Scope (largely the original “Phase 1” of email-integration, refit to the new structure)”- Per-partition mail sub-zones:
prod.ardamails.com,demo.ardamails.com,dev.ardamails.com,stage.ardamails.com. Each created in the partition’s AWS account via the newDnsZonexgress construct.kyle.ardamails.comis deferred per DQ-R1-021 (kyle partition suspended at Phase 4 start);kylestays reserved at theardamails.comlevel so it cannot be appropriated as a tenant slug while the partition is suspended. - NS-delegation entries in the
ardamails.comzone (Root) for each new partition sub-zone. Reuses Phase 2 mechanisms. - Per-partition Postmark Sender Signatures. One Postmark Sender Signature per partition, anchored at the partition sub-zone (
{partition}.ardamails.com). Production partitions (prod,demo) on thePostmarkProdaccount; non-production partitions (dev,stage) onPostmarkNonProd. Each partition has its own DKIM key (independent receiver-side reputation per environment); leaves under each partition (per-tenant sub-domains) inherit DKIM by default. The granularity decision is pinned in DQ-R1-017 (proposed in the Phase 4 goal artefact under4-runtime-platform-updates/); the first non-prod Signature also satisfies Postmark Compliance’s pending approval forarda-nonprod. - Per-partition Postmark account-token secrets in Secrets Manager (encrypted, ESO-projected to pods). Token sourced from
platform/postmark-service.tsreferences at deploy time via the new partition-awarepostmarkCredentialOpReference(partition)accessor. - Per-partition encryption-key secrets for tenant token encryption (per DQ-012).
- IAM roles for DNS provisioning by the runtime
emailConfigurationservice. - Updates to
apps/Al1x/partition.tsto instantiate the new partition-mail stacks. - Updates to
amm.sh— minimal, per §2 scope constraint, individually flagged. - Parallel
runtime-platform-driftworkflow. A new.github/workflows/runtime-platform-drift.ymlplus driver undertools/, running alongside the existingcorporate-drift(which is not renamed). The new workflow asserts the cross-seam Postmark↔DNS↔placement invariants for every active partition Signature. Logic shared between the two workflows is factored into reusable shell scripts or GitHub Actions composite actions, so subsequent runtime-platform drift checks unrelated to email can plug into the same workflow without mail-centric naming (DQ-R1-018). - Operator surfaces integrated into
amm.sh. Per DQ-R1-022, Phase 4’s partition-mail provisioning is part of the product runtime platform deployment and is invoked throughamm.sh(and its rules: idempotency, security, pre-flight checks, partition selection). Phase 4 does not introduce a standalone partition-mail CLI. Reusable sub-scripts / utilities (bash or TypeScript) are extracted fromcorporate-cliso bothamm.sh’s partition path andcorporate-clican share logic; this includes refactoring Phase 3 deliverables as needed to keep each script’s complexity bounded.
Deliverables
Section titled “Deliverables”| Artifact | Path |
|---|---|
| Partition email stack | src/main/cdk/stacks/purpose/partition-email.ts |
| Partition email instance config | extension of src/main/cdk/instances/Alpha001/{prod,demo}.ts and Alpha002/{dev,stage}.ts |
| Updated apps | src/main/cdk/apps/Al1x/partition.ts |
| Partition-aware Postmark credential accessor | src/main/cdk/platform/postmark-service.ts — postmarkCredentialOpReference(partition: PartitionId): string returning the op://Arda-{Env}OAM/Postmark/credential reference for the partition’s environment. Consumed by amm.sh (via op read), not by CDK — the resolved value is passed to cdk deploy as a NoEcho parameter (δ.1 pattern, mirrors partitionSecrets.cfn.yaml). |
| Partition Postmark Sender Signatures | One per partition, registered via PostmarkSendingDomain thin-wrapper using sendingDomainPlacement() with partition-shaped inputs |
| Per-partition Postmark account-token SM secret | aws_secretsmanager.Secret per partition, name {fqn}-I-EmailPostmarkAccountToken, RemovalPolicy.RETAIN. Declared in partition-email.ts (CDK); value populated via SecretValue.cfnParameter() from a NoEcho CFN parameter that amm.sh supplies on cdk deploy after reading the 1Password reference returned by postmarkCredentialOpReference(partition). Same SM secret serves both the CR Lambda (Sender Signature registration at deploy time) and the runtime ESO mount (operations pod at request time). Pattern documented in current-system/oam/security/secret-delivery-pattern.md. |
| Per-partition email-token encryption-key SM secret | aws_secretsmanager.Secret per partition, name {fqn}-I-EmailEncryptionKey, passwordLength: 64, RemovalPolicy.RETAIN. Single SM secret per partition; rotation uses AWS SM native versioning (AWSCURRENT / AWSPREVIOUS stages). Full design in 4-runtime-platform-updates/design/email-server-key-encryption.md per DQ-R1-019. |
Per-partition DNS-records role (via generalized AllowCreatingNSRecordsRole) | Reuses the existing AllowCreatingNSRecordsRole construct (Phase 2; constructs/oam/allow-creating-ns-records-role.ts), generalized to accept a configurable trust principal. Instantiated in partition-email.ts with the pod-STS-chain trust principal (iam.AccountPrincipal(account).withConditions({ ArnLike: ... })) and allowedParentHostedZoneIds scoped to the partition’s mail sub-zone. Permissions are already generic Route53 record-set CRUD on the construct side: route53:ChangeResourceRecordSets, route53:ListResourceRecordSets, route53:ListHostedZonesByName. (DQ-R1-020.) route53:GetChange is intentionally omitted (requires arn:aws:route53:::change/* scope; the Email module does not wait on Route53 propagation — Postmark verification is API-driven). The existing Root-account instantiation must remain byte-identical post-generalization — guarded by a CDK Template-equality unit test and a post-deploy Root no-drift verification. |
Per-partition EmailEncryptionKeyFallbackRole | Fresh purpose-specific IAM role declared in partition-email.ts. Same trust-policy shape as EmailDnsProvisioningRole. Permission: secretsmanager:GetSecretValue on ${encryptionKeySecret.secretArn}* (full SM-secret ARN; the trailing wildcard tolerates the SM-appended random 6-character suffix — SM versions are selected at API call time via VersionId/VersionStage, not encoded in the resource ARN). Used by the Phase 5b TokenCipher SDK-fallback path for envelopes older than AWSPREVIOUS (DQ-R1-019). The operations pod role is not extended; permissions live on the purpose-specific role. (DQ-R1-020.) |
runtime-platform-drift workflow (parallel) | .github/workflows/runtime-platform-drift.yml + driver under tools/. Shares reusable scripts / composite actions with corporate-drift; corporate-drift is not renamed (DQ-R1-018) |
amm.sh-integrated partition-mail steps | Phase 4 operator work lives inside amm.sh (or its callees) per DQ-R1-022, following its idempotency / security / check rules. Reusable bash + TypeScript utilities extracted from corporate-cli are shared between amm.sh and corporate-cli; includes refactoring Phase 3 deliverables as needed |
amm.sh minimal updates | repo root |
| Decision-log entries | DQ-R1-017..022 (Round R1-Phase4): Sender Signature granularity, drift workflow shape, encryption-key derivation, STS-chained IAM roles for DNS provisioning + SM fallback, partition rollout order, operator CLI shape |
| Documentation | partition mail sections in current-system/runtime/ |
| Secret-delivery pattern doc | new current-system/oam/security/secret-delivery-pattern.md documenting the op → amm.sh → CFN NoEcho parameter → SM secret → consumer flow (worked examples: partitionSecrets.cfn.yaml + Phase 4 Postmark token); cross-linked from secrets-vault.md |
Phase 4 infrastructure prerequisite: partition-aware Postmark credential reference
Section titled “Phase 4 infrastructure prerequisite: partition-aware Postmark credential reference”platform/postmark-service.ts currently exposes a single credentialReference per Postmark account (op://Arda-SystemsOAM/Postmark-{Prod,NonProd}/credential). Before Phase 4’s deploy tooling can consume per-partition vault copies, the file must be extended with a partition-aware accessor — for example:
postmarkCredentialOpReference(partition: PartitionId): string// returns "op://Arda-ProdOAM/Postmark/credential" for prod,// "op://Arda-DevOAM/Postmark/credential" for dev, etc.This change is scoped to platform/postmark-service.ts in the infrastructure repository; it does not affect Phase 3. The accessor is consumed by amm.sh (which calls op read on the returned reference and supplies the resolved value to cdk deploy as a NoEcho parameter); CDK itself has no 1Password dependency. Pattern: see current-system/oam/security/secret-delivery-pattern.md.
Entry criteria
Section titled “Entry criteria”- Phases 1 and 2 have merged (Postmark account references in
platform/postmark-service.ts;ardamails.comzone +AllowCreatingNSRecordsRoleexported from Root). platform/postmark-service.tsexposes a partition-aware credential accessor (see prerequisite above).- Each partition’s
Arda-{Env}OAMvault holds an item titledPostmarkwithcredentialfield set to the relevant Postmark account token (following the service-name-only convention; provisioned as part of Phase 4 operator work, analogous to Phase 1’s provisioning of theArda-SystemsOAMglobal-utility items). - Each partition’s AWS account has the IAM permissions to (a) create hosted zones, (b) create / read Secrets Manager entries, and (c) assume the
AllowCreatingNSRecordsRolein the Root account. - The
ESOClusterSecretStore in each partition cluster is configured to read from the partition’s AWS Secrets Manager (already in place; restated as a precondition).
Exit criteria
Section titled “Exit criteria”- Each partition’s mail sub-zone is delegated and populated with required base records.
- Per-partition Postmark token + encryption-key secrets are accessible to the operations service via ESO.
- IAM role allows runtime DNS provisioning for tenant sub-domains.
- Component repositories (Phase 5) can read the cross-stack exports.
Recovery / partial-failure handling
Section titled “Recovery / partial-failure handling”Phase 4 fans out across four active partitions (prod, demo, dev, stage) hosted in two AWS accounts: Alpha001 carries prod + demo; Alpha002 carries dev + stage. Each partition’s mail stack deploys independently within its Infrastructure’s account; partitions sharing an account remain CFN-independent. The kyle partition is suspended at Phase 4 start (DQ-R1-021) and is not included in the rollout; replay the per-partition deploy procedure when/if kyle resumes operation.
- Partition independence. A failed deploy in one partition does not block the others. Re-run
cdk deploy apps/Al1x/<partition>for the affected partition only; the remaining partitions are unaffected by the failure or the retry. - NS-delegation atomicity. Each partition’s stack writes its NS-delegation record into the Root
ardamails.comzone on stack creation viaWriteNSRecordsToUpstreamDns(perDQ-R1-006). If the stack create succeeds but the CR Lambda fails the cross-account assume-role, the partition zone exists in its Infrastructure’s account but is not delegated —dig NS <partition>.ardamails.comreturns NODATA. Re-deploying the stack invokes the CR again on stack update; once the assume-role succeeds, the NS record set is created and propagation is normal. - Secrets Manager retention. Per-partition Postmark account-token secrets and encryption-key secrets carry
RemovalPolicy.RETAINsocdk destroydoes not delete them. Intentional removal is a deliberate operator step (delete the Secrets Manager entry through the AWS console or CLI). This defends against accidental loss of the encryption key, which would render every encrypted-at-rest tenant token unrecoverable. - Recommended deploy order (DQ-R1-021):
dev→stage→demo→prod.devfirst because it also satisfies thearda-nonprodPostmark account-approval prerequisite (a Sender Signature ondev.ardamails.comis the account’s first); the lower-blast-radius non-production partitions land before production. The order can be condensed if early partitions are clean; do not skip the validation cycles.kyleis excluded from the rollout per DQ-R1-021.
Dependencies
Section titled “Dependencies”- Phase 1 (Postmark account references).
- Phase 2 (Root accepts new NS delegations).
- Independent of Phase 3 (Corporate). Phases 3 and 4 can land in either order or in parallel.
Phase 5 — Component Updates
Section titled “Phase 5 — Component Updates”Goal: deliver the runtime email capability in the application stack.
5a — Library Updates (common-module)
Section titled “5a — Library Updates (common-module)”Cross-cutting library additions consumed by the email module: sanitizeHeader, AppError.Application, idempotent-key helpers, etc. Released as a common-module minor version. The operations repository consumes this library.
Deliverables
Section titled “Deliverables”- New / updated classes and helpers in
common-module. - Unit tests + version bump + CHANGELOG.
Entry criteria
Section titled “Entry criteria”common-modulemainis at a state where the new helpers can be added without conflicting with in-flight work; no library work is required to land first.
Exit criteria
Section titled “Exit criteria”common-modulepublished with the new APIs.operationsbuilds against the new version without regression.
Dependencies
Section titled “Dependencies”- None on Phases 1-4 (library work is purely Kotlin); but its consumers in 5b need it merged first.
5b — Email Module (operations)
Section titled “5b — Email Module (operations)”Backend ShopAccess/Email module: per-tenant Postmark-server provisioning, sending APIs, bounce / complaint webhook handling, suppression-list maintenance.
Deliverables
Section titled “Deliverables”- Module code (
shopaccess/email/...) + Flyway migrations + integration tests + Helm chart updates (apis.system.shopAccess.emailentry; ESOExternalSecretentries). - API Gateway routes declared in the
operationsrepo’s CloudFormation files for the three L4 endpoints (email-job,email-configuration,postmark-eventswebhook). API-gateway-level authorisers are out of scope for this project; authentication for each route is performed in-component by the receiving Ktor server (Bearer-token validation for the webhook route, the Application Runtime’s existing scheme for consumer routes). - Gradle property updates in the
operationsrepo (and incommon-moduleif applicable) to provide the property values needed for linting, local deployments, and CI test runs of the new module.
Entry criteria
Section titled “Entry criteria”- Phase 4 has deployed to every partition that 5b will target: per-partition Postmark token secrets, encryption-key secrets, and DNS-provisioning IAM roles exist in Secrets Manager / IAM and are projected through ESO into the partition cluster.
- Phase 5a has published a
common-moduleminor with the new APIs to Artifactory; theoperationsrepo’sgradle.propertiesis ready to bump to that version. - API Gateway path slots for
email-job,email-configuration,postmark-eventsare reservable in the partition’s API Gateway CFN stacks (no path collision with existing routes).
Exit criteria
Section titled “Exit criteria”- Module deployable to all partitions; passes integration tests against a real PostmarkNonProd surface.
- Webhooks reachable from Postmark to the partition’s API gateway.
Recovery / partial-failure handling
Section titled “Recovery / partial-failure handling”- Webhook-route registration failure. If the
postmark-eventsroute fails to register at API Gateway during deploy, Postmark’s webhook calls return 404. Postmark retries failed webhook calls for 7 days; events accumulate in its retry queue and replay automatically once the route comes up. Re-deploy the API Gateway CFN stack; verify with a manual Postmark test event from the Postmark Console. - Flyway migration failure mid-deploy. Flyway runs forward-only. A migration that fails partway leaves the partition’s database in an intermediate schema state. Recovery is fix-forward (author the next migration to restore invariants and deploy it); manual SQL intervention is an escalation path, not a routine deploy step. Production migrations run in a single transaction where possible to constrain the blast radius.
- Per-partition rollout. 5b deploys to one partition at a time; a failure in one partition does not affect the others. Re-deploy the failed partition once the cause is fixed. Recommended order mirrors Phase 4 (per DQ-R1-021):
dev→stage→demo→prod.kyleis excluded for as long as the partition is suspended.
Dependencies
Section titled “Dependencies”- Phase 4 (per-partition Postmark token + encryption-key secrets exist in Secrets Manager).
- Phase 5a (
common-moduleminor with the new APIs is published).
Open design questions (to be confirmed at Phase 5b planning)
Section titled “Open design questions (to be confirmed at Phase 5b planning)”- DQ-R1-023 — Per-tenant Postmark Sender Signature introduction. Phase 4 ships one Signature per partition (
{partition}.ardamails.com) per DQ-R1-017. All tenants in a partition share that Signature’s DKIM-domain reputation under DMARC relaxed alignment. Phase 5b decides whether to introduce per-tenant Signatures (and the associated per-tenant DKIM TXT + Return-Path CNAME records) for per-tenant reputation isolation. Four options: α status quo (partition Signature for all), β per-tenant from v1, γ hybrid opt-in, δ remediation-only. No Phase 4 dependency — Phase 4 provisionsEmailDnsProvisioningRole(the runtime DNS-write capability) regardless, so whichever way DQ-R1-023 resolves, no Phase 4 re-work is needed.
Phase ordering and parallelism
Section titled “Phase ordering and parallelism”- Phase 1 must land before any other phase (its
platform/references andOP_SERVICE_ACCOUNT_TOKENare dependencies for every later phase). - Phase 2 must land before Phase 3 and Phase 4 (both consume Root’s NS-delegation mechanism).
- Phase 3 and Phase 4 are independent of each other and may run in either order or in parallel.
- Phase 5a is independent of Phases 1-4 (Kotlin library work).
- Phase 5b depends on both Phase 4 (per-partition infrastructure) and Phase 5a (consumed library APIs).
Disposition of prior numbering
Section titled “Disposition of prior numbering”| Prior label | New mapping |
|---|---|
| Phase 0 (Postmark Foundations) | Distributed across Phases 1, 2, 3. The current PR #445 corresponds approximately to a partial Phase 1 + partial Phase 3 (with the partition coupling that needs unwinding). PR-#445 disposition is deferred per scratch.md E. |
| Phase 1 (Infrastructure — partition zones, secrets, IAM) | Phase 4 in the new numbering, with the per-partition mail sub-zones still owned by the partition (per the Q5 answer in runtime-design-review.md §7). |
| Phases 2-6 (Backend module) | Phase 5b. |
| Common-module additions | Phase 5a. |
The detailed contents of the current 0-postmark-foundations/ and 1-infrastructure/ specification trees will be merged / reorganised under the new phase folders as part of the consolidation in follow-up-updates.md.
References
Section titled “References”- Runtime Overview (principles)
- Runtime documentation design review
- Phase 0 PR review
- Action scratch
- Decision log (existing project-wide decisions)
Copyright: © Arda Systems 2025-2026, All rights reserved