Email Integration -- Cross-Cutting Design (Revision 1)

Security and operations concerns that span the whole email-integration project: authentication, authorisation, secret management, drift detection, OAM (operations / administration / management), and compliance. Aligned with the Revision 1 goal.md, architecture-overview.md, and phases.md.

Revision 1 note. Supersedes the prior cross-cutting design (now superseded by this document). Substantive changes:

Authentication for the email service is performed in-component (Ktor server configuration of each route), not at the API Gateway. API-gateway-level authorisers for the email service are out of scope.

Postmark Console is acknowledged as the primary OAM surface for the email service; Arda-side metrics, logs, and runbooks supplement it but do not replace it.

Credential resolution uses 1Password as the system of record. CI and operator runs resolve credentials via the 1Password SDK; runtime pods receive partition-scoped credentials via Secrets Manager + ESO.

Drift detection is added as a routine, scheduled assertion of declared-vs-observed state for external resources.

Corporate Resource Group is treated as a peer security domain to Application Runtime tenants, with its own credential scope, sending zone, and operational surface.

1. Threat model summary

The email integration adds defence-in-depth on top of the existing platform-level security posture. This layer specifically protects against:

DB exposure with read-only privilege (backups leaving the trust boundary, SQL-injection-style read leaks, analyst sessions): per-tenant Postmark server tokens are application-encrypted (AES-256-GCM) before persistence, so a SELECT * on email_configuration yields ciphertext, not usable tokens.
Provisioning replay attacks: pre-flight checkAvailability plus the persist-first lifecycle (DQ-205) prevent silent orphan creation.
Webhook spoofing: in-component Bearer-token validation on the postmark-events route (DQ-011).
Free Kanban Tool sending integrity: the Corporate consumer’s Postmark server token is held in 1Password (Free-Kanban-Generator-Postmark-Server in the Arda-CorporateOAM vault, distinct from the Arda-SystemsOAM vault that holds deploy-time / OAM credentials), never persisted to the infrastructure repository, and never transmitted through CDK context or CI environment variables.
Drift between declared and observed state: scheduled drift-detection asserts that external resources match the declarations in the infrastructure repository; surfaced divergences trigger an automated investigation issue.

Out of scope at this layer (covered elsewhere or accepted):

Pod / process compromise — an attacker with pod memory has both the encryption key and the DB connection; the application-layer encryption does not defend against this. Platform-level controls (IRSA, network segmentation, container hardening) own this.
Postmark account compromise — an attacker holding the Postmark account-level token can read tenant tokens directly from Postmark and send arbitrary email. Platform-level secret management owns this; the project mitigates by sourcing the account token only from 1Password and never from a long-lived environment variable.
1Password compromise (service-account or operator) — an attacker holding OP_SERVICE_ACCOUNT_TOKEN (CI) or operator credentials (local-dev) reads every Postmark and tenant credential reachable from Arda-SystemsOAM. The token is scoped read-only to that vault; this project does not introduce write paths to 1Password from automated systems.
Insider with both extras.email.encryptionKey HOCON access and DB write privilege — equivalent to pod compromise.
DDoS / rate-limit abuse — handled by API Gateway and Postmark’s own controls. The API Gateway remains a forwarding layer for inbound traffic even though it does not authenticate the email-service routes.

2. Authentication

2.1 Inbound HTTP — `email-configuration` and `email-job`

The email-configuration and email-job HTTP endpoints reuse the existing operations-component in-component authentication pipeline: Ktor server configuration of each route validates ARDA_API_KEY plus context headers (X-Tenant-Id, X-Author) forwarded by the BFF. No JWT path; no email-specific authentication mechanism. The API Gateway is a passthrough for these routes; no gateway-level authoriser is attached.

2.2 Inbound HTTP — Postmark webhook

The postmark-events endpoint is an inbound webhook from Postmark’s infrastructure. Like other email-service routes it reaches the platform via API Gateway (forwarding only), and authentication is performed in-component by the receiving Ktor route: the route validates the ARDA_API_KEY value Postmark sends in the Authorization: Bearer ... header per DQ-011. Reusing the existing ARDA_API_KEY avoids introducing a new credential type.

Postmark configures this header on each per-tenant webhook via POST /webhooks with an HttpHeaders field setting Authorization: Bearer <ARDA_API_KEY>. Every inbound webhook request carries the header.

If and when the platform later adopts gateway-level authorisation as a cross-cutting concern, the email-service routes (consumer-facing and webhook) migrate alongside the rest. Until then, in-component validation is the contract.

Optional defence-in-depth: Postmark publishes a small set of webhook source IPs; these can be allowlisted at the network layer. IPs may change; treat as supplementary, not authoritative.

2.3 Outbound — Postmark Account API (Application Runtime, runtime)

The Postmark Account API (server / domain / webhook CRUD, DKIM / Return-Path verification) is authenticated with the X-Postmark-Account-Token header. For Application Runtime tenants, the runtime path is:

The token is created in the Postmark console (manual, one-time per account; partition-pair scoped) and stored in 1Password: global-utility items Postmark-Prod and Postmark-NonProd in Arda-SystemsOAM for cross-partition tooling, and per-partition items Postmark in each Arda-{Env}OAM vault for partition-scoped deploy tooling (Phase 4 prerequisite: platform/postmark-service.ts must expose postmarkCredentialOpReference(partition: PartitionId): string so Phase 4 deploy tooling resolves from the partition vault rather than the global item).
A per-partition Secrets Manager entry ({fqn}-I-EmailPostmarkAccountToken) holds the runtime copy. amm.sh reads the value from the partition’s 1Password vault at deploy time (op read "$(postmarkCredentialOpReference <partition>)") and passes it to cdk deploy as a NoEcho CFN parameter; the CDK partition-email stack declares the parameter and creates the SM secret via SecretValue.cfnParameter(). CDK has no 1Password dependency. See current-system/oam/security/secret-delivery-pattern.md for the canonical pattern.
ESO synchronises the secret into the partition’s Kubernetes namespace; the operations pod consumes it as the HOCON property extras.email.postmarkAccountToken at startup.
Read by postmarkAccountProxy only, in-process; never logged.

Two Postmark accounts: PostmarkProd (used by prod / demo partitions, real delivery) and PostmarkNonProd (used by dev / stage partitions, sandbox delivery).

2.4 Outbound — Postmark Server API (per-tenant, runtime)

The Postmark Server API (send email, configure webhook) is authenticated with the X-Postmark-Server-Token header. The token is per-tenant (issued by Postmark when the tenant’s server is created):

Captured at provisioning time from the POST /servers response.
Encrypted application-side (AES-256-GCM versioned envelope per DQ-202) before persistence.
Stored in email_configuration.server_token_encrypted (text column, base64 envelope).
Decrypted on demand by EmailConfigurationService.getUnlockedConfiguration().
Passed by reference (in-memory only) through L2 (EmailSender) and L1 (postmarkServerProxy) as a method argument; never persisted again, never logged.

2.5 Outbound — Postmark Server API (Free Kanban Tool, deploy-time and runtime)

The Free Kanban Tool’s Postmark server is provisioned once by the Corporate CLI at deploy time. Its token follows a different lifecycle than the per-tenant Application Runtime tokens:

The server is created in PostmarkProd (the Corporate consumer is bound to the production-grade Postmark account).
Phase A of the Corporate CLI (per phases.md Phase 3 § J1 interim mechanism) writes the resulting Server API token directly into the Free-Kanban-Generator-Postmark-Server 1Password item in the Arda-CorporateOAM vault. Canonical reference: op://Arda-CorporateOAM/Free-Kanban-Generator-Postmark-Server/credential.
The token never enters CDK context, file artefacts, environment variables in the deploy pipeline, or GitHub Actions secrets. The CDK Stack composes the DNS records using only the public DKIM and Return-Path values surfaced through cdk.context.json.
The Free Kanban Tool itself reads its server token from the 1Password item via its own resolution path (out of scope of this project’s runtime).

This isolation gives the Free Kanban Tool a bounded blast radius: a leak of CDK context, a cloud-runtime credential, a CI secret, or even OP_SERVICE_ACCOUNT_TOKEN does not yield the Free Kanban Tool’s sending credential. OP_SERVICE_ACCOUNT_TOKEN is scoped read-only to Arda-SystemsOAM; the Free Kanban server token lives in a separate vault (Arda-CorporateOAM) reachable only by the Free Kanban Tool’s own runtime resolution path. See DQ-R1-007.

2.6 Outbound — Postmark API (deploy-time, thin-wrappers)

The Postmark thin-wrapper constructs at src/main/cdk/platform/constructs/postmark/ (Phase 3 deliverables) make Postmark Account API calls during operator-driven deploys (Phase A of the Corporate CLI) and during drift-detection runs in CI. The resolution of the account token at these moments uses the 1Password SDK rather than runtime Secrets Manager:

Local-dev operator: DesktopAuth biometric unlock against the operator’s 1Password app. No service-account token leaves the operator’s machine.
CI: the OP_SERVICE_ACCOUNT_TOKEN GitHub Actions secret authenticates the SDK, which then resolves op://Arda-SystemsOAM/Postmark-Prod/credential (or …/Postmark-NonProd/credential).

The same dual-auth path is used by every operator-facing tool that needs Postmark account access at deploy time.

2.7 Outbound — AWS (runtime)

Route53 access traverses an STS role chain settled in DQ-204:

The pod’s IRSA service-account role (existing operations-component pattern) provides base AWS credentials.
The AWS SDK’s StsAssumeRoleCredentialsProvider (configured at module startup with 15-minute session duration) auto-chains an AssumeRole call to the partition’s EmailDnsProvisioningRole.
The provider caches credentials lazily and refreshes on first use after expiry. The pod possesses Route53 write credentials for at most ~15 minutes after the last call.
The L1 route53ZoneProxy makes Route53 SDK calls without per-call AssumeRole; STS is handled by the credentials provider transparently.

Aurora PostgreSQL access uses the existing operations-component DB credential pattern; no email-specific change.

2.8 Outbound — 1Password (deploy-time, CI)

The 1Password service-account token (OP_SERVICE_ACCOUNT_TOKEN) is the one and only GitHub Actions repository secret provisioned for this project. CI workflows resolve every other downstream credential at runtime via the 1Password SDK. The token is:

Issued from the 1Password admin console as a service-account token.
Scoped read-only to the Arda-SystemsOAM vault. The token cannot read other vaults and cannot write anywhere.
Provisioned into the Arda-cards/infrastructure repository’s secrets via the tools/gha-secret.ts operator utility (libsodium-encrypted upload via Octokit).
Rotatable through the same operator utility; the new token replaces the old in a single API call.

The token is never logged, never echoed in CI output (workflow steps mask it via the standard GitHub Actions secret-masking), and never carried into pod runtime.

2.9 Outbound — GitHub (deploy-time, operator tools)

The tools/gha-secret.ts operator CLI provisions GitHub Actions repository secrets. It is operator-driven (not part of any automated pipeline), takes --repo, --name, and --op-ref as inputs, and is the only outbound-to-GitHub credential-write surface in this project. Authentication to GitHub uses an operator-supplied PAT or OAuth token; encryption of the secret value uses libsodium against the repository’s public key.

This tool is a transition-state utility (scratch.md N1+ context). When declarative GHA-secret management is adopted as a wider repo pattern, the tool is retired.

3. Authorisation

3.1 `email-configuration` endpoint

The email-configuration endpoint is CS-only in v1. It is accessed by CS scripts directly using ARDA_API_KEY, validated in-component. The endpoint is not exposed through the BFF in v1.

Future state: a CS administration UI may surface this endpoint behind a privileged role check. v1 accepts that any caller with ARDA_API_KEY can invoke the endpoint, consistent with the existing platform pattern. The Postmark Console covers operational visibility in the meantime (§ 5.1).

3.2 `email-job` endpoint

The email-job endpoint is tenant-scoped via the X-Tenant-Id header forwarded by the BFF after JWT validation. The L3 EmailJobService:

Trusts the X-Tenant-Id header (the BFF is the trust boundary, per existing platform conventions).
Verifies that the requested emailConfigurationId belongs to the asserted tenant (otherwise 403).

The trust-boundary delegation matches the existing pattern in other modules; revalidating the tenant on every request would duplicate work the BFF already performs.

3.3 `postmark-events` webhook

The postmark-events route authenticates via Bearer token only (in-component validation). There is no caller-identity concept beyond “the Bearer token is valid”; the route trusts that a request with a valid Bearer token comes from Postmark (Postmark IP allowlisting is the optional second factor).

Tenant scoping at the webhook is implicit: the inbound payload contains a MessageID that maps to an EmailJob row, which carries the tenant via email_configuration_id. No header-level tenant assertion.

3.4 Free Kanban Tool sending

The Free Kanban Tool’s send operations are governed by the application running on the Free Kanban Tool’s own infrastructure (out of scope of this project). The Postmark server-token-based authorisation to the Postmark Server API matches the per-tenant model: possession of the server token authorises sending from freekanban.arda.ardamails.com. Token issuance and lifecycle are owned by the Corporate Resource Group (§ 4.1).

4. Secret management

4.1 Secret inventory and lifecycle

Secret	Issued by	Stored in	Delivered as	Used by	Rotation
Postmark account token (per Postmark account)	Postmark console (manual)	1Password: global-utility items `Postmark-Prod` / `Postmark-NonProd` in `Arda-SystemsOAM` (qualified names — both accounts in one vault); per-partition copies under item title `Postmark` in each `Arda-{Env}OAM` vault (e.g., `op://Arda-ProdOAM/Postmark/credential`); per-partition runtime copy in AWS Secrets Manager (`{fqn}-I-EmailPostmarkAccountToken`)	ESO → HOCON `extras.email.postmarkAccountToken` (runtime); 1Password SDK (deploy-time / CI / drift)	`postmarkAccountProxy` (runtime); `tools/corporate-cli.ts`, drift workflows (deploy-time)	Manual: regenerate in Postmark console → update 1Password (both the `Arda-SystemsOAM` item and the affected `Arda-{Env}OAM` item) → rerun deploy-time tooling to refresh Secrets Manager → refresh ESO
1Password service-account token	1Password admin console (manual)	Local: operator’s 1Password client. CI: `OP_SERVICE_ACCOUNT_TOKEN` repository secret in `Arda-cards/infrastructure`	Environment variable to CI workflows; read-only access to `Arda-SystemsOAM`	All deploy-time and drift-detection workflows	Manual: regenerate in 1Password admin → rerun `tools/gha-secret.ts` to update the GitHub Actions secret
Per-partition encryption key	CDK `GeneratedSecret` (`passwordLength: 64`, DQ-203.c)	AWS Secrets Manager (`{fqn}-I-EmailEncryptionKey` per partition); native SM versioning carries rotation history (`AWSCURRENT`, `AWSPREVIOUS`, historical versionIds) per DQ-R1-019	ESO → two `ExternalSecret` resources (`AWSCURRENT` and `AWSPREVIOUS`) → HOCON `extras.email.encryptionKeys` (list keyed by SM versionId) → HKDF derivation in `TokenCipher`. Rare rows older than `AWSPREVIOUS` fall back to a direct AWS SM SDK fetch via the `EmailEncryptionKeyFallbackRole`, which the operations pod assumes via STS from its IRSA-bound pod role; the pod role itself does not carry `secretsmanager:GetSecretValue` (DQ-R1-020).	`EmailConfigurationService` (only)	`aws secretsmanager update-secret` creates new versionId → ESO refreshes both mounts → pod’s `TokenCipher` holds both AWSCURRENT and AWSPREVIOUS keys → first non-up-to-date read synchronously re-encrypts its own row and launches a per-pod coroutine to mop up the rest of the partition → operator verifies completion (`SELECT count(*) WHERE NOT LIKE 'a1.k${currentVersionId}:%'`) → optionally retires the prior SM version (DQ-R1-019; full design in `4-runtime-platform-updates/design/email-server-key-encryption.md`).
Per-tenant Postmark server token	Postmark API (`POST /servers` response)	DB `email_configuration.server_token_encrypted` (AES-256-GCM versioned envelope)	Decrypted on demand by `EmailConfigurationService.getUnlockedConfiguration()`; passed in-memory through L2 / L1	`postmarkServerProxy` (via method argument)	Per-tenant via Postmark `POST /servers/{id}/rotateToken`; deferred to v2
Free Kanban Tool Postmark server token	Postmark API (Phase A of Corporate CLI; one-time, idempotent)	1Password (`Free-Kanban-Generator-Postmark-Server` in `Arda-CorporateOAM`; ref `op://Arda-CorporateOAM/Free-Kanban-Generator-Postmark-Server/credential`)	1Password SDK at the Free Kanban Tool’s runtime resolution (out of scope of this project)	Free Kanban Tool runtime (out of scope)	Manual: regenerate in Postmark console → update 1Password item
`ARDA_API_KEY`	Existing platform mechanism	Existing platform store	Existing platform delivery	Inbound HTTP auth (in-component) + outbound Postmark webhook Bearer header	Existing platform rotation procedure

Note: the encryption-key secret is HKDF-derived in the application before use (DQ-203). The HKDF info string is "arda.email.serverToken.a{N}" where a{N} is the algorithm version (the first axis of the two-axis envelope per DQ-R1-019). v1 of the algorithm uses info = "arda.email.serverToken.a1". Future algorithm changes bump a{N} (rare, code-released); secret-material rotations are tracked independently by AWS SM’s native versionId mechanism and surface in the envelope as the k{...} axis.

4.2 Logging and redaction

Three classes of values must never appear in logs:

The per-partition encryption key (raw secret value or HKDF-derived key bytes).
Per-tenant Postmark server tokens (plaintext or any decrypted form).
Email body content (HTML / text body).

Plus, in deploy-time tooling:

The 1Password service-account token (OP_SERVICE_ACCOUNT_TOKEN).
Postmark account-level tokens.
The Free Kanban Tool’s Postmark server token.

Redaction is enforced at:

L1 proxies: per-surface application logging contract pinned in DQ-220.h. Highlights: postmarkServerProxy.sendEmail logs only recipient, subject, and MessageID; postmarkAccountProxy.* logs only operation name, resource ids, and HTTP status.
Transport layer (Ktor Logging plugin): common-lib httpClient (in common-module) installs sanitizeHeader { ... } covering Authorization, X-Postmark-Account-Token, X-Postmark-Server-Token, and api_key. One-line addition that benefits every proxy in the platform.
L3 services: structured-log frameworks must mark relevant fields as redacted; entity field-by-field log helpers exclude serverTokenEncrypted and serverTokenPlaintext.
Deploy-time tooling: a cross-cutting redact() utility under src/main/cdk/utils/logging.ts is consumed by the Postmark thin-wrapper constructs and the Corporate CLI. The same utility is the basis for drift-workflow log scrubbing.

What may be logged (acceptable): Postmark resource IDs (serverId, domainId, webhookId), MessageID, sending domain FQDN, recipient email addresses (these are PII; see § 6.2), bounce diagnostics.

4.3 Drift detection

The project introduces scheduled drift detection that asserts external resources match their declared state:

A monthly GitHub Actions workflow exercises the live Postmark Account API surface for each Postmark account, enumerates servers and sending domains, and asserts that the visible state matches what the infrastructure repository declares.
The workflow authenticates to 1Password via OP_SERVICE_ACCOUNT_TOKEN and resolves Postmark account tokens at runtime; no Postmark token is persisted as a GitHub Actions secret.
On any divergence, the workflow opens a labelled GitHub issue with the run URL and the observed-vs-declared diff. The repository’s existing on-call routing handles triage.

The same pattern applies to other external surfaces (1Password vault contents; GitHub repository configuration); the initial implementation covers the Postmark surface and serves as the template.

5. OAM (operations, administration, management)

5.1 Postmark Console — the primary OAM surface

Aggregate operational management of the email service is performed through the Postmark Console, not through Arda-built tooling. The Postmark Console is the source of truth for:

Server-by-server delivery / bounce / complaint statistics.
Per-domain DKIM / SPF / DMARC verification status.
Suppression-list management.
Message-level diagnostics (search by recipient, by subject, by MessageID).
Webhook configuration changes (post-provisioning).

Arda-side telemetry (§ 5.2, § 5.3) supplements the Postmark Console with information that is specific to Arda’s use of Postmark (per-tenant aggregates, polling-task health, lifecycle transitions). It does not replace the Postmark Console.

This project does not build an Arda-side OAM UI for the email service; that is explicitly out of scope (goal.md Out of scope).

5.2 Logging conventions

Structured JSON via the existing operations-component logging stack. Standard fields:

Correlation ID (per request).
Tenant ID (where relevant).
Configuration ID, Job ID (where relevant).
Layer marker (L1 / L2 / L3) for traceability across boundaries.
Severity per the standard SLF4J levels.

Logging contract per layer:

L1 proxies: INFO per remote call (request path + response status); WARN on non-2xx; ERROR on parse / connection failures. Body content excluded for tokens / email body.
L2 capability composers: INFO per capability operation (start / success); WARN on Result.failure(PartialProgress) with the failedAt step.
L3 services: INFO on lifecycle transitions; WARN on retry-with-backoff (DQ-205.e step-9 path); ERROR on persistent UPDATE failures with diagnostic naming the orphans.

Deploy-time tooling (Corporate CLI, Postmark thin-wrappers, drift workflows) follows the same redaction contract via the cross-cutting src/main/cdk/utils/logging.ts utility.

5.3 Metrics

Metric	Source	Use
`email_send_total{tenant, status}`	EmailJob transitions	Per-tenant send volume and outcome distribution
`email_delivery_rate{tenant}`	DELIVERED / total	Deliverability tracking
`email_bounce_rate{tenant, type}`	BOUNCED breakdown	Bounce-rate alerting (in v2; ESP-OOTB in v1 per DQ-006)
`email_complaint_rate{tenant}`	COMPLAINED / total	Spam-complaint alerting (v2)
`email_provisioning_duration_seconds`	provision call latency	Provisioning health
`email_dns_verification_attempts_total{outcome}`	bounded polling rounds	Verification health
`email_polling_active_count{pod}`	per-pod activePolling map size	In-flight verification visibility

Drift-detection workflows do not emit pod-level metrics; their signal is the absence of a failing run plus the absence of an open drift issue in the repository.

v1 emits the metrics above via the existing operations metrics pipeline; no email-specific Prometheus exporter or CloudWatch namespace.

5.4 Operator alerts and runbooks

Alert	Query	Threshold	Recipient	Runbook
`email_configuration_pending_stale`	`count(*) WHERE status='PENDING_VERIFICATION' AND verification_started_at < now() - interval '15 minutes'`	result > 0	CS / on-call	(1) Inspect `diagnostic_message`. (2) Verify Postmark domain status via `GET /domains/{id}` (or in the Postmark Console). (3) Hit `PUT /retry-verification` if DNS is now ready, or `DELETE` if known-broken. (4) Confirm alert clears within ~15 min. (DQ-207.j)
`email_configuration_provisioning_stuck`	`count(*) WHERE status='PROVISIONING' AND provisioning_started_at < now() - interval '5 minutes'`	result > 0	on-call	(1) Identify orphan external resources (server name pattern, sending-domain FQDN) — the Postmark Console is the source of truth here. (2) Manually transition row to `PROVISIONING_FAILED` with diagnostic. (3) Run `DELETE` to invoke best-effort decommission. (DQ-205.f)
Drift-detection workflow failure	Auto-issue opened by the drift workflow	issue created	on-call	(1) Read the run logs at the link in the issue body. (2) Compare observed Postmark state to the declared state in `infrastructure`. (3) Either reconcile manually (Postmark Console + tooling) or open a follow-up to update the declarations. (4) Close the issue when reconciled.
Future v2: `email_bounce_rate_high`	bounce rate > 5% per tenant per hour	exceeded	CS	Postmark Console for diagnostics; tenant outreach (DQ-006)
Future v2: `email_complaint_rate_high`	complaint rate > 0.1% per tenant per hour	exceeded	CS	Same

5.5 Manual operations

CS / on-call workflows surfaced as endpoints, operational queries, or operator scripts:

PUT /email-configuration/<configId>/retry-verification — kicks off a fresh bounded DNS verification round (DQ-207.b). Allowed from PENDING_VERIFICATION or VERIFICATION_FAILED; refreshes verification_started_at.
PUT /email-configuration/<configId>/lock and /unlock — pure DB transitions; lock prevents new sends through that configuration (Scenario 6).
DELETE /email-configuration/<configId> — runs best-effort decommission (R53 deletes first, then Postmark per DQ-205.k); deletes the row unconditionally (DQ-205.d).
Manual stuck-row triage (no endpoint) — operator runs the stuck-row queries above; for PROVISIONING rows, manually transitions to PROVISIONING_FAILED then DELETE. The Postmark Console is the source of truth for whether orphans exist on the Postmark side.
Postmark API change watch (continuous) — the #dev-team Slack channel is subscribed to https://postmarkapp.com/updates/type/api. Triage rule: any post mentioning Servers, Domains, Webhooks, Email, or transport-layer changes is reviewed against the L1 proxy implementations and the Postmark thin-wrapper constructs; if a contract change affects either surface, a follow-up issue is filed.
First-deploy credential verification (per partition, per setup) — on the first deploy of the email module to a new partition (or after rotating any of the Postmark account token / encryption key / DNS provisioning role ARN), the operator runs an ad-hoc smoke test: provision a known-disposable test tenant via POST /email-configuration, observe the row reach PENDING_VERIFICATION, exercise /retry-verification, observe verification success, send a test email, observe the webhook firing, then DELETE the row. The runbook substitutes for live integration testing of the L1 proxies (per DQ-220.g).
Corporate CLI operator runs — the Corporate CLI (tools/corporate-cli.ts) invokes Phase A (Postmark thin-wrapper calls) followed by Phase B (cdk deploy). On a Phase-A failure the operator can re-run Phase A; on a Phase-B failure the operator can re-run Phase B without re-issuing Phase A’s Postmark calls (idempotent reconcile).

5.6 Rotation procedures

Postmark account token — out-of-band, manual:

Regenerate the token in the Postmark Console.
Update both 1Password copies: the qualified item in Arda-SystemsOAM (Postmark-Prod or Postmark-NonProd) and the per-partition item Postmark in each affected Arda-{Env}OAM vault (e.g., Arda-ProdOAM/Postmark for a PostmarkProd rotation). The two stores are independent; both must be updated so CI drift-detection and partition runtime tooling resolve the same token.
Re-run the per-partition deploy (or a dedicated reconcile command) to refresh the Secrets Manager copy in each affected partition.
Refresh ESO sync (or wait for the 1h interval).
Restart pods if needed (the new value is picked up on next pod startup; existing pods continue using the cached value until restart).

Drift-detection workflows continue to authenticate via the 1Password service-account token after the rotation (no separate update required).

1Password service-account token — manual:

Regenerate the service-account token in the 1Password admin console.
Run tools/gha-secret.ts with the new value to update the OP_SERVICE_ACCOUNT_TOKEN repository secret.
Confirm the next CI run authenticates successfully.

Encryption key — hot-swap via SM-native versioning + lazy migration (DQ-R1-019; full design in 4-runtime-platform-updates/design/email-server-key-encryption.md):

Generate the new key material and write it to the SM secret (the AWS CLI’s update-secret subcommand does not accept --generate-random-password — that flag belongs to get-random-password):
Terminal window
```
NEW=$(aws secretsmanager get-random-password \
  --password-length 64 --exclude-characters '"@/\' \
  --require-each-included-type \
  --output text --query RandomPassword)
aws secretsmanager put-secret-value \
  --secret-id "{fqn}-I-EmailEncryptionKey" --secret-string "$NEW"
```
put-secret-value creates a new versionId, promotes it to AWSCURRENT, and demotes the prior version to AWSPREVIOUS.
Within the ESO refreshInterval (~1 min), both ExternalSecret resources (AWSCURRENT and AWSPREVIOUS) refresh; the corresponding Kubernetes Secrets update with the new versionIds and material.
Operations component pod picks up the change on its next refresh (rolling restart, or in-pod TokenCipher.reload() tick). The TokenCipher now holds both old and new derived keys; new writes encrypt as a1.k{new-versionId}:….
Migration runs automatically. The first EmailConfigurationService.getUnlockedConfiguration() call that encounters a row still tagged with the old versionId synchronously re-encrypts that row and launches a per-pod coroutine that mops up the rest. Idempotent and self-healing across pod restarts.
Operator verifies completion: SELECT COUNT(*) FROM email_configuration WHERE server_token_encrypted NOT LIKE 'a1.k${currentVersionId}:%' returns zero (an admin endpoint will expose this in Phase 5b).
(Optional) aws secretsmanager update-secret-version-stage --remove-from-version-id <old> retires the AWSPREVIOUS label. The version stays in SM history (still SDK-fetchable via the EmailEncryptionKeyFallbackRole STS hop); full deletion is delete-secret --version-id <old> and triggers the 7-day SM-deletion window.

A row encrypted under a version older than AWSPREVIOUS (rare; happens only after two consecutive rotations within an un-migrated window) triggers a one-off secretsmanager:GetSecretValue SDK call from the pod, cached for the pod’s lifetime. If the SM version has been deleted, TokenCipher throws RetiredSecretVersion; the operator runbook covers manual remediation.

Automated rotation (AWS SM Rotation Lambda) is enabled by this design and deferred to a future deliverable.

Free Kanban Tool Postmark server token — out-of-band, manual:

Regenerate the server token in the Postmark Console (or via the Postmark Server API).
Update the Free-Kanban-Generator-Postmark-Server 1Password item in the Arda-CorporateOAM vault.
Restart / refresh the Free Kanban Tool runtime so it re-resolves the token.

Per-tenant Postmark server tokens — v2 only. Postmark’s POST /servers/{id}/rotateToken issues a new token and invalidates the old. v1 does not exercise this.

6. Compliance and audit

6.1 Bitemporal audit trail

Both email_configuration and email_job are persisted via Arda’s bitemporal DataAuthority pattern. Every state transition produces a new version with valid_from / valid_to and transaction_time columns. Operators can query “what was the state of this config at time T” or “what was the latest state as known at time T” without losing history.

The exact bitemporal application (what counts as a state-change event) is a substantive open decision — see DQ-240.a / DQ-250.c in phased-design-requirements.md.

6.2 PII handling

The system handles the following PII classes:

Recipient email addresses (to, cc on EmailJob): stored plaintext in DB. Required for resend / audit.
Reply-To address (often the user’s own email): stored plaintext.
Email body (HTML / text): stored plaintext for resend support.
Bounce / complaint diagnostics: may contain recipient address fragments and bounce reason codes.

GDPR-shaped data-subject rights flows (right to erasure, right to access) are out of scope for v1. v2 will address retention and erasure procedures; v1 retains all data indefinitely via the bitemporal history.

6.3 Data retention

EmailJob: retained indefinitely in v1. Bitemporal history preserves all status transitions.
EmailConfiguration: retained indefinitely while the row exists; on DELETE, the row is removed but bitemporal versions of the deleted row persist in history (this is the DataAuthority default).
Postmark-side data: Postmark retains messages per its own retention policies; we do not control or override.

A retention policy (e.g., archive jobs older than N years; purge after M years) is deferred to v2.

6.4 Free Kanban Tool data scope

The Free Kanban Tool’s send-side data resides in the Free Kanban Tool’s own runtime (out of scope of this project). On the Postmark side, the Free Kanban Tool’s server stores send history per Postmark’s retention policies. The infrastructure repository declares only the sending-domain DNS records and the Postmark server’s identifying metadata (server name, signing identity); no per-message data crosses into Arda’s email-integration scope.

7. Defence-in-depth posture

Layer	Protection
Network	TLS everywhere (HTTPS for inbound, HTTPS / SDK-over-TLS for outbound). No plaintext within VPC.
Application — in-component authentication	Each route validates its own credential in Ktor (no API-gateway-level authoriser dependency for the email service).
Application — error pathway	All proxy methods return `Result<T>`; failures captured via `runCatching`; no exception propagation outside `Result.failure`. Token redaction at the transport and L1 layers.
Application — secret handling	Encryption key held only by L3 service. Per-tenant tokens encrypted at rest; plaintext lives only in the in-memory call stack during a send.
Storage — Aurora	KMS-backed volume encryption (existing platform default).
Storage — application-level encryption	AES-256-GCM versioned envelope on per-tenant tokens (DQ-202). Defence against DB dumps, SQL-injection-style read leaks, analyst sessions.
Credential resolution	1Password as system of record; SDK-mediated reads at deploy time and CI; ESO-projected at runtime. No long-lived token environments outside 1Password.
IAM	Pod IRSA service-account role; STS-chained credentials with 15-min session duration; no long-lived AWS credentials in pods (DQ-204).
External-resource state	Drift-detection workflows assert declared-vs-observed state on a schedule; divergences raise an automated investigation issue.
Operational	Operator alerts on stale / stuck rows; manual triage workflow; Postmark Console as source of truth for delivery diagnostics; bitemporal audit trail for forensics.

The combination addresses the threat model in § 1; gaps acknowledged in that section (pod compromise, Postmark account compromise, 1Password compromise) are out of scope at this layer.

7a. Message stream discipline

All Arda outbound email is transactional. Every Postmark server provisioned for Arda (the Free Kanban Tool server, future per-partition operations servers, future per-tenant servers) uses the default Transactional Message Stream and must not be repurposed for marketing, newsletter, or bulk/broadcast sends. Postmark’s policy (Best Practices for Bulk/Broadcast Sending) requires bulk traffic to flow through a dedicated Broadcast Message Stream; co-mingling transactional and broadcast on the same stream both violates that policy and risks degrading transactional reputation.

If a future use case requires bulk send (e.g., tenant announcements, product updates), provision a separate Broadcast stream on the relevant Postmark server, treat it as a distinct OAM surface (its own throttle, suppression list, reputation), and expose it through a separate L2 API in the operations component — do not route through the transactional EmailSender interface. Phase 5b’s EmailSender is explicitly transactional-only.

8. References

Project goal
Architecture overview
Phase structure
Decision log — upstream decisions (DQ-001 — DQ-013) and Revision-1 decisions (DQ-R1-NNN)
Application-layer open decisions — DQ-201 — DQ-208
Application-layer functional design
Application-layer decision log — DQ-220 series