Email Integration -- Functional Design
First-pass identification of the functionality needed in each subsystem to implement the email features defined in the product feature documents product/features/general-behaviors/email-communications.md and product/features/procurement/email-orders.md (not yet authored).
Subsystem Overview
Section titled “Subsystem Overview”Arda is a three-tier platform. Requests flow from the SPA through the BFF to the Backend; the Backend is the only layer that communicates with external services (ESP, AWS).
Frontend SPA (React / Next.js Pages)
Section titled “Frontend SPA (React / Next.js Pages)”The user-facing web interface. Renders UI components, manages client-side state (Redux), and collects user input. Does not call the Backend directly — all API traffic routes through the BFF. Authentication state (Cognito JWT) is managed client-side and passed to the BFF on every request.
Frontend BFF (Next.js API Routes)
Section titled “Frontend BFF (Next.js API Routes)”The security and orchestration layer between the SPA and the Backend. Validates the user’s Cognito JWT, extracts user context (userId, tenantId, email, role), and forwards requests to the Backend with system credentials (ARDA_API_KEY) and context headers (X-Tenant-Id, X-Author). The BFF is the trust boundary — the Backend trusts the headers the BFF provides.
Backend (Kotlin / Ktor — Operations Service)
Section titled “Backend (Kotlin / Ktor — Operations Service)”Core business logic, persistence, and external service integration. Organized as a modular monolith where each functional area (procurement/orders, reference/items, resources/kanban) is a module with its own routes, services, persistence, and configuration. Follows the Data Authority pattern with bitemporal persistence (Exposed ORM, PostgreSQL). All domain validation, lifecycle management, and transactional guarantees live here.
ESP (Postmark)
Section titled “ESP (Postmark)”External Email Service Provider. Accepts messages via REST API, handles MTA routing, DKIM signing, deliverability, and returns delivery status via incoming REST requests to the Backend. Arda does not run its own mail infrastructure — Postmark is the mail delivery network.
Frontend SPA
Section titled “Frontend SPA”Email Composition Surface
Section titled “Email Composition Surface”- Side panel component for editing email content before send
- For email orders (
orderMethod=EMAIL): renders full order data as editable HTML - For purchase orders (
orderMethod=PURCHASE_ORDER): renders brief introduction text, editable
- For email orders (
- “Send via System” action button (invokes send dialog via BFF)
- “Copy to Clipboard” action button (existing path, email orders only — no system involvement)
- Content editing: rich-text or structured-field editing of the email body
Send Dialog
Section titled “Send Dialog”- Modal or panel implementing
GEN::EML::0001::0003(defined in the futureproduct/use-cases/general-behaviors/email-communications.mduse-case document). - Fields:
- To (editable, pre-populated from order’s sales contact email)
- Cc (editable, initially empty)
- Reply-To (read-only, resolved by system)
- Subject (read-only in v1)
- Body preview (read-only in dialog, reflects edits from composition step)
- Attachment list (read-only, shows filename and size)
- Sender identity line (read-only)
- “Send” and “Cancel” buttons
- Cancel prompts for confirmation if any editable field was modified
- Address validation (syntactic) before allowing Send
- Send button disabled if To is empty
Send Status Display
Section titled “Send Status Display”- Status indicator on order detail view: Queued, Sent, Delivered, Bounced, Complained, Failed
- For adverse statuses (Bounced, Complained, Failed): diagnostic message + CS reference ID
- For Queued with retries: retry count and last attempt info
- “Re-send” action available on Bounced or Failed orders (reopens send dialog with previous addresses)
State Management
Section titled “State Management”- Send status fetched via polling from the BFF or on entity detail load (no WebSocket/SSE — not available in the current platform)
- Redux slice for email send state (per entity): status, diagnosticMessage, referenceId, lastUpdated
Frontend BFF
Section titled “Frontend BFF”New API Routes
Section titled “New API Routes”BFF routes mirror the backend ShopAccess/Email module endpoints for the generic email capability:
-
POST /api/arda/shop-access/email/email-job— Create an email sending job- Receives: addressing (To, Cc, Reply-To), subject, body content, attachments
- Validates JWT, extracts user context
- Forwards to Backend with system credentials and context headers
- Returns: EmailJob record with initial status
-
GET /api/arda/shop-access/email/email-job/{jobId}— Get email job status- Proxies to Backend
- Returns: status, diagnostic info, retry info
-
PUT /api/arda/shop-access/email/email-job/{jobId}/resend— Re-send after bounce/failure- Same shape as create, creates new EmailJob
-
POST /api/arda/shop-access/email/email-job/query— Query email jobs- Proxies to Backend
Note: Detailed BFF route definitions and SPA component specifications are deferred to the definition of the Procurement use cases that exercise the email capability. The email general behaviors are specified at the backend Endpoint API level. The BFF and SPA sections here provide context for how the email module will be consumed.
BFF Responsibilities (email-specific)
Section titled “BFF Responsibilities (email-specific)”- No email-specific business logic — pure proxy with JWT validation and header forwarding
- Passes the user’s email (from JWT claims) to Backend so Backend can resolve Reply-To
- Does not interact with ESP or generate PDFs
- The
email-configurationendpoint is not exposed through the BFF in v1 (CS-only, accessed directly) - The
postmark-eventswebhook endpoint is called by Postmark directly, not through the BFF
Backend
Section titled “Backend”ShopAccess/Email Module
Section titled “ShopAccess/Email Module”Organized into three layers, three endpoints, and a small DataAuthority. Layering is the binding architectural decision — see DQ-201:
- L1 — Protocol proxies. Stateless. One per external API surface; one credential strategy each.
runCatching-bodied methods returningResult<T>. - L2 — Capability composers. Stateless. Choreograph L1 calls into capability operations (
provision,decommission,verifyDns,sendOne). Map external errors to capability errors. No DB access. Hide nothing — external IDs and credentials flow through to L3 (DQ-201.c). - L3 — Application services. Hold all DB access via DataAuthorities. Hold the encryption key (DQ-202, DQ-203). Encrypt before INSERT; decrypt on demand for sending. Spawn bounded DNS-verification polling rounds via per-pod
activePollingmap (DQ-207).
Three triggers feed the same bounded-polling primitive in L3; see Scenarios 1b.1, 1b.2, 1b.3 for the trigger-specific flows.
Endpoint: email-job
Section titled “Endpoint: email-job”Provides the interface for creating, querying, and managing email sending jobs. See Scenario 2.
Routes
Section titled “Routes”| Method | Path | Description |
|---|---|---|
| POST | /v1/shop-access/email/email-job | Create and send an email job |
| GET | /v1/shop-access/email/email-job/<jobId> | Get job status and summary (no body/attachments) |
| GET | /v1/shop-access/email/email-job/<jobId>/details | Get full job details including body and attachments |
| PUT | /v1/shop-access/email/email-job/<jobId>/cancel | Cancel a pending email job (action verb; job row is transitioned, not deleted) |
| PUT | /v1/shop-access/email/email-job/<jobId>/resend | Re-send a bounced or failed job |
| POST | /v1/shop-access/email/email-job/query | Query email jobs by criteria |
| POST | /v1/shop-access/email/email-job/<jobId>/history | Get bitemporal history of a job |
Request/Response Sketches
Section titled “Request/Response Sketches”POST /email-job (create):
// Request{ "emailConfigurationId": "<UUID>", "to": "supplier@example.com", "cc": "manager@example.com", "replyTo": "procurement@tenant.com", "subject": "Order ORD-00123 from TenantName", "htmlBody": "<html>...</html>", "textBody": "Plain text fallback (optional)", "attachments": [ { "name": "PO-ORD-00123.pdf", "contentType": "application/pdf", "content": "<base64>" // or "url": "https://..." } ]}
// Response 201 Created{ "jobId": "<UUID>", "status": "NEW", "messageId": null, "diagnosticMessage": null, "createdAt": "2026-04-21T10:30:00Z"}GET /email-job/<jobId> (status):
// Response 200 OK{ "jobId": "<UUID>", "status": "QUEUED", "messageId": "b7bc2f4a-...", "to": "supplier@example.com", "subject": "Order ORD-00123 from TenantName", "diagnosticMessage": null, "createdAt": "2026-04-21T10:30:00Z", "updatedAt": "2026-04-21T10:30:01Z"}PUT /email-job/<jobId>/resend:
// Request (optional overrides){ "to": "new-supplier@example.com", "cc": null}
// Response 201 Created (new job){ "jobId": "<new UUID>", "status": "NEW", "originalJobId": "<UUID>"}Endpoint: email-configuration
Section titled “Endpoint: email-configuration”DataAuthority endpoint for managing tenant email configurations. See Scenario 1.
Routes
Section titled “Routes”| Method | Path | Description |
|---|---|---|
| POST | /v1/shop-access/email/email-configuration | Provision a new tenant email configuration |
| GET | /v1/shop-access/email/email-configuration/<configId> | Get configuration and current status |
| PUT | /v1/shop-access/email/email-configuration/<configId>/retry-verification | Kick off a fresh bounded DNS verification polling round; allowed from PENDING_VERIFICATION or VERIFICATION_FAILED; refreshes verification_started_at (DQ-207.b) |
| PUT | /v1/shop-access/email/email-configuration/<configId>/lock | Lock configuration (only from UNLOCKED) |
| PUT | /v1/shop-access/email/email-configuration/<configId>/unlock | Unlock configuration (only from LOCKED) |
| DELETE | /v1/shop-access/email/email-configuration/<configId> | Delete configuration (allowed from any non-PROVISIONING terminal-or-stable state: PENDING_VERIFICATION, UNLOCKED, LOCKED, VERIFICATION_FAILED, PROVISIONING_FAILED); runs best-effort decommission of external resources (DQ-205.d) |
| POST | /v1/shop-access/email/email-configuration/query | Query configurations |
Request/Response Sketches
Section titled “Request/Response Sketches”POST /email-configuration (provision):
// Request{ "tenantEId": "<UUID>", "tenantName": "Acme Manufacturing", "tenantSlug": "acme", "configSlug": "orders"}
// Response 201 Created{ "configId": "<UUID>", "status": "PENDING_VERIFICATION", "tenantSlug": "acme", "configSlug": "orders", "sendingDomain": "orders.acme.prod.ardamails.com", "dkimVerified": false, "returnPathVerified": false, "dmarcPolicy": "none", "provisionedAt": "2026-04-21T10:30:00Z"}GET /email-configuration/<configId>:
// Response 200 OK{ "configId": "<UUID>", "status": "UNLOCKED", "tenantSlug": "acme", "configSlug": "orders", "sendingDomain": "orders.acme.prod.ardamails.com", "postmarkServerId": 12345, "dkimVerified": true, "returnPathVerified": true, "dmarcPolicy": "none", "diagnosticMessage": null, "provisionedAt": "2026-04-21T10:30:00Z"}PUT /email-configuration/<configId>/retry-verification:
// Response 200 OK// - From PENDING_VERIFICATION: refreshes verification_started_at; kicks off a fresh bounded polling round// - From VERIFICATION_FAILED: transitions to PENDING_VERIFICATION; kicks off bounded polling// Response 409 Conflict (if status is PROVISIONING, PROVISIONING_FAILED, UNLOCKED, or LOCKED)Endpoint: Message Events (webhook)
Section titled “Endpoint: Message Events (webhook)”Receives delivery status events from Postmark. Authenticated via Authorization: Bearer header configured on each Postmark server’s webhook. See Scenario 3 and Postmark Service Design.
Routes
Section titled “Routes”| Method | Path | Description |
|---|---|---|
| POST | /v1/shop-access/email/postmark-events | Receive Postmark delivery event |
Request/Response
Section titled “Request/Response”// Request (from Postmark, varies by RecordType)// Headers: Authorization: Bearer <ARDA_API_KEY>{ "RecordType": "Delivery", "MessageID": "b7bc2f4a-...", "Recipient": "supplier@example.com", "DeliveredAt": "2026-04-21T10:30:05Z"}
// Response: 200 OK (processed)// Response: 403 Forbidden (auth failure, stops retries)// Response: 500 Internal Server Error (transient, triggers retry)Service: emailJob
Section titled “Service: emailJob”Provider-agnostic service for creating and managing email sending jobs. Delegates to the emailConfiguration service for tenant configuration and credential access.
Interface
Section titled “Interface”interface EmailJobService {
/** Create and send an email job. See Scenario 2. */ suspend fun createAndSend(request: CreateEmailJobRequest): Result<EmailJob>
/** Get job by ID. */ suspend fun getJob(jobId: UUID): Result<EmailJob>
/** Get job with full details (body, attachments). */ suspend fun getJobDetails(jobId: UUID): Result<EmailJobDetails>
/** Cancel a pending job. */ suspend fun cancelJob(jobId: UUID): Result<EmailJob>
/** Re-send a bounced or failed job, optionally with address overrides. */ suspend fun resendJob(jobId: UUID, overrides: ResendOverrides?): Result<EmailJob>
/** Query jobs by criteria. */ suspend fun queryJobs(query: EmailJobQuery): Result<PageResult<EmailJob>>
/** Handle a delivery event from the ESP. See Scenario 3. */ suspend fun handleDeliveryEvent(event: PostmarkEvent): Result<Unit>}Data Types
Section titled “Data Types”data class CreateEmailJobRequest( val emailConfigurationId: UUID, val to: String, val cc: String? = null, val replyTo: String, val subject: String, val htmlBody: String, val textBody: String? = null, val attachments: List<EmailAttachment> = emptyList())
data class EmailAttachment( val name: String, val contentType: String, val content: String? = null, // base64-encoded blob val url: String? = null // or URL to fetch)
data class ResendOverrides( val to: String? = null, val cc: String? = null)Event Notifications
Section titled “Event Notifications”The emailJob service receives delivery events from the Message Events endpoint:
/** Normalized event from Postmark webhook payload. */data class PostmarkEvent( val recordType: PostmarkEventType, val messageId: String, val recipient: String, val deliveredAt: Instant? = null, val bouncedAt: Instant? = null, val bounceType: String? = null, val bounceDescription: String? = null, val complaintType: String? = null)
enum class PostmarkEventType { DELIVERY, BOUNCE, SPAM_COMPLAINT}Domain: EmailJob
Section titled “Domain: EmailJob”A bitemporal entity representing an email sending job. Persisted in the email_job table.
NEW
: The email job has been acknowledged by the ShopAccess/Email module but it has not yet been transmitted to the ESP.
QUEUED
: The email job has been sent to the ESP but the ESP has not confirmed it has sent it through to the email network (SMTP or equivalent).
SENT
: The ESP has confirmed that it has sent the email through to the email network (SMTP or equivalent).
DELIVERED
: The ESP has confirmed that the email has been delivered to the recipient’s inbox.
BOUNCED
: The ESP has confirmed that the email has been bounced.
COMPLAINED
: The ESP has confirmed that the email has been marked as spam.
FAILED
: The email job has failed due to unavailability of the ESP or internal ESP errors.
CANCELLED
: The email job was cancelled by the user before it was sent to the ESP. Only jobs in NEW status can be cancelled.
Persistence
Section titled “Persistence”A DataAuthority containing EmailJob entities.
Service: emailConfiguration
Section titled “Service: emailConfiguration”Manages tenant email configurations, including provisioning, DNS verification, and secure credential storage. The only service that handles the encryption key and Postmark account token.
Interface
Section titled “Interface”interface EmailConfigurationService {
/** Provision a new tenant email configuration. See Scenario 1. * Returns immediately with PENDING_VERIFICATION status. * DNS verification proceeds asynchronously. */ suspend fun provision(request: ProvisionRequest): Result<EmailConfiguration>
/** Get a configuration by ID (any status). */ suspend fun getConfiguration(configId: UUID): Result<EmailConfiguration>
/** Get an UNLOCKED configuration with decrypted server token. * Returns error if configuration is not UNLOCKED. * Called by emailJob service (internal, not an HTTP endpoint). See Scenario 2. */ suspend fun getUnlockedConfiguration( tenantId: UUID, configurationId: UUID? = null ): Result<UnlockedEmailConfiguration>
/** Kick off a fresh bounded DNS verification polling round. * Allowed from PENDING_VERIFICATION (refreshes verificationStartedAt) or * VERIFICATION_FAILED (transitions to PENDING_VERIFICATION). * Returns 409-equivalent error from any other status. See DQ-207. */ suspend fun retryVerification(configId: UUID): Result<EmailConfiguration>
/** Lock a configuration (UNLOCKED -> LOCKED). Prevents email sending. */ suspend fun lock(configId: UUID): Result<EmailConfiguration>
/** Unlock a configuration (LOCKED -> UNLOCKED). Re-enables email sending. */ suspend fun unlock(configId: UUID): Result<EmailConfiguration>
/** Delete a configuration (from UNLOCKED or VERIFICATION_FAILED). */ suspend fun delete(configId: UUID): Result<Unit>
/** Query configurations by criteria. */ suspend fun queryConfigurations(query: EmailConfigQuery): Result<PageResult<EmailConfiguration>>}Data Types
Section titled “Data Types”data class ProvisionRequest( val tenantEId: UUID, val tenantName: String? = null, val tenantSlug: String? = null, // if null, derived from tenantEId + tenantName val configSlug: String)
/** Returned by getUnlockedConfiguration -- includes decrypted server token. */data class UnlockedEmailConfiguration( val configId: UUID, val sendingDomain: String, val serverToken: String, // decrypted, in-memory only val postmarkServerId: Int, val postmarkDomainId: Int)
/** Full configuration entity (without decrypted token). * External IDs are nullable because the persist-first lifecycle (DQ-205) inserts the row * in PROVISIONING status before any external mutation; IDs are populated as resources are created. */data class EmailConfiguration( val configId: UUID, val status: EmailConfigurationStatus, val tenantSlug: String, val configSlug: String, val sendingDomain: String, val postmarkServerId: Int? = null, // null while in PROVISIONING; nullable on PROVISIONING_FAILED val postmarkDomainId: Int? = null, // null while in PROVISIONING; nullable on PROVISIONING_FAILED val postmarkWebhookId: Int? = null, // null until createWebhook succeeds; nullable on PROVISIONING_FAILED val dkimVerified: Boolean = false, val returnPathVerified: Boolean = false, val dmarcPolicy: String = "none", val diagnosticMessage: String? = null, val provisioningStartedAt: Instant, // set at initial INSERT, always populated val provisionedAt: Instant? = null, // set on transition from PROVISIONING to PENDING_VERIFICATION val verificationStartedAt: Instant? = null // set on entry to PENDING_VERIFICATION; refreshed by /retry-verification. // Used solely for the operator-alert query (DQ-207.j); not for any // automatic transition.)
enum class EmailConfigurationStatus { PROVISIONING, PENDING_VERIFICATION, UNLOCKED, VERIFICATION_FAILED, LOCKED, PROVISIONING_FAILED}EmailConfiguration Lifecycle
Section titled “EmailConfiguration Lifecycle”The lifecycle uses a persist-first model: the row is inserted in PROVISIONING status before any external mutation, so the database always has an anchor for the in-flight operation. See DQ-205.
DNS verification is trigger-driven rather than continuously polled: bounded polling rounds (5 attempts × 60 s by default) are kicked off by three events — successful provisioning, manual /retry-verification, or a send attempt against a PENDING_VERIFICATION row. A row that exhausts a bounded round without verifying stays in PENDING_VERIFICATION (no automatic transition to VERIFICATION_FAILED) and recovers when the next trigger fires. See DQ-207.
PROVISIONING
: Entry state. Row inserted before any external mutation. No manual transitions out of this state — only the emailConfiguration service moves it (to PENDING_VERIFICATION on success or PROVISIONING_FAILED on failure). A row stuck in this state for more than ~5 minutes is presumed orphaned (e.g., pod crashed mid-flight) and requires operator triage.
PROVISIONING_FAILED
: Terminal state for operations that failed during external resource creation. The row carries whichever external IDs were captured before the failure, plus a diagnosticMessage describing the failure point. Operator deletes via the standard DELETE endpoint, which runs best-effort decommission of any captured external resources.
PENDING_VERIFICATION
: Provisioning succeeded. Postmark server, domain, and webhook are created; DNS records are published. DNS verification has not yet completed. Email sending is not allowed in this state. The row may have an active bounded polling task on some pod, or no active task at all — this is operationally invisible at the row level. The next trigger (provision-success on a related row, manual retry, or a send attempt) restarts a bounded polling round.
UNLOCKED
: DNS verification succeeded. The configuration is ready for email sending. Only configurations in this status are returned by getUnlockedConfiguration().
VERIFICATION_FAILED
: Reserved state. Not entered automatically in v1 (DQ-207 does not auto-transition out of PENDING_VERIFICATION). Reserved for v2 operator-marked-failed and async-reconciler scenarios. /retry-verification accepts it as a source for forward compatibility.
LOCKED
: Administratively disabled. CS or admin action. Email sending is not allowed. Can be re-enabled.
Secret Storage
Section titled “Secret Storage”Per-tenant Postmark server tokens are stored in the database, encrypted with a partition-wide encryption key. This avoids per-tenant Secrets Manager writes and runtime Secrets Manager calls.
- Encryption key: A single symmetric key per partition, created by CDK in Secrets Manager and delivered to the pod via the External Secrets Operator (ESO) mechanism (see infrastructure.md). Available to the
emailConfigurationservice as a HOCON config property (email.encryption.key) at startup. - Encrypt on write: During provisioning, the
emailConfigurationservice encrypts the server token returned by Postmark before persisting it in theserverTokenEncryptedcolumn. - Decrypt on read: When the
emailJobservice callsgetUnlockedConfiguration(), theemailConfigurationservice decrypts the server token and returns it as part of theUnlockedEmailConfigurationresponse. TheemailJobservice never handles the encryption key directly. - Key rotation: Rotating the encryption key is a single operation: read all encrypted tokens, re-encrypt with the new key, update in a transaction. No per-tenant Secrets Manager updates.
The Postmark account-level API token (used for provisioning, not per-tenant) is also delivered via ESO as a HOCON config property (email.postmark.accountToken). Only the emailConfiguration service accesses it.
Persistence
Section titled “Persistence”A DataAuthority containing EmailConfiguration entities.
Tenant Provisioning
Section titled “Tenant Provisioning”Synchronous steps (interactive, returns immediately):
-
Validate input locally — slugs DNS-safe, not reserved, FQDN within DNS limits.
-
Open DB transaction:
- Check no row exists for
(tenantId, configSlug)orsendingDomain. - Pre-flight external state checks (
tenantProvisioning.checkAvailability(spec)):- Postmark Account: no server with the planned name exists.
- Postmark Account: no domain with the planned FQDN exists.
- Route53: no records exist at the three target names.
- On any conflict: ROLLBACK and return 409 with diagnostic identifying the orphan/collision.
- On clear: INSERT row with
status=PROVISIONING,provisioning_started_at=now(), all external IDs null. COMMIT.
- Check no row exists for
-
Run external mutations in this order (Postmark first, Route53 second, per DQ-205.k):
- Postmark Account:
createServer→serverId,serverToken. - Postmark Account:
createDomain→domainId, DKIM/return-path values. - Postmark Server:
createWebhook→webhookId. - Route53: UPSERT DKIM TXT record.
- Route53: UPSERT Return-Path CNAME record.
- Route53: UPSERT DMARC TXT record.
(Record writes use
ChangeResourceRecordSetsaction=UPSERT for idempotency; see DQ-205.m.) - Postmark Account:
-
Encrypt the server token with the partition-wide encryption key (AES-256-GCM, see DQ-202).
-
UPDATE row with all external IDs + encrypted token + status
PENDING_VERIFICATION. (Retry with bounded backoff; if persistent failure, leave row in PROVISIONING with diagnostic naming the orphans for manual reconciliation.) -
Return to client.
Failure handling: if any step in (3) fails, L2 returns a structured Failure(PartialProgress) with whichever external IDs were captured. L3 updates the row with the partial IDs + status PROVISIONING_FAILED + diagnostic. No automatic external cleanup; operator triages via the DELETE endpoint, which runs best-effort decommission.
Asynchronous DNS verification (trigger-driven, see DQ-207):
- Successful provisioning kicks off a bounded polling round on the pod that handled the request — 5 attempts × 60 seconds by default (HOCON-configurable).
- Each attempt calls
tenantProvisioning.verifyDns(postmarkDomainId), which delegates topostmarkAccountProxy.verifyDkimandverifyReturnPath. - On verification success: update row to
UNLOCKED(idempotent UPDATE guarded byWHERE status = 'PENDING_VERIFICATION'). - On bounded round exhaustion: row stays in
PENDING_VERIFICATION. Recovery is trigger-driven: a subsequent send attempt (getUnlockedConfiguration) or a manualPUT .../retry-verificationwill kick off a fresh bounded round. There is no automatic transition toVERIFICATION_FAILEDin v1. - Pod restart during a polling round drops the in-flight task (a known coverage gap, mitigated by an operator alert on stale
PENDING_VERIFICATIONrows; see Observability).
See Scenario 1, Scenario 1b, Postmark Service Design, and DQ-205 / DQ-207 for the detailed provisioning flow, API endpoints, and failure-recovery design.
Tenant Decommission
Section titled “Tenant Decommission”DELETE on a configuration in any terminal state (UNLOCKED, VERIFICATION_FAILED, PROVISIONING_FAILED) runs best-effort decommission before removing the DB row:
- Read the row’s captured external IDs (some may be null in the
PROVISIONING_FAILEDcase). - Delete external resources in this order (Route53 first, Postmark second — the inverse of provisioning, per DQ-205.k):
- Route53: delete DMARC, Return-Path, DKIM records (any that exist).
- Postmark Account: delete domain.
- Postmark Account: delete server. (Cascade-deletes the webhook.)
- DELETE the DB row unconditionally, regardless of decommission outcomes.
- Return a result describing which deletions succeeded and which failed; failed deletions are surfaced to the caller for manual cleanup.
Local Development Stub
Section titled “Local Development Stub”- Stub implementation of
EmailJobServiceinterface - Logs send intent to console / in-memory store
- Synthesizes fake delivery events for testing the full lifecycle
- Zero external dependency
Module Configuration
Section titled “Module Configuration”The ShopAccess/Email module runs within the operations component. The following configuration is required at the component level for the module to function.
Helm Values
Section titled “Helm Values”New entry in the component’s values.yaml under apis: to register the module’s routes with the Ingress:
apis: system.shopAccess.email: name: "shop-access/email" version: "v1"This generates the Ingress path /v1/shop-access/email/*, routing all email module traffic (including the Postmark webhook endpoint) through the component’s Kubernetes Service.
Note: The Postmark webhook endpoint (
/v1/shop-access/email/postmark-events) currently has no external authorizer at the API Gateway level. Authentication is handled by the endpoint itself via Bearer token validation. In the future, a Bearer Token authorizer may be added at the API Gateway for this route.
ESO ExternalSecret Entries
Section titled “ESO ExternalSecret Entries”New entries in the component’s Helm secrets.yaml template to sync email secrets from Secrets Manager to Kubernetes:
| Secret Name Pattern | HOCON Property | Used By |
|---|---|---|
<infrastructure>-<partition>-I-EmailPostmarkAccountToken | email.postmark.accountToken | emailConfiguration service (provisioning) |
<infrastructure>-<partition>-I-EmailEncryptionKey | email.encryption.key | emailConfiguration service (encrypt/decrypt server tokens) |
These are templated into secrets.properties alongside existing database credentials, following the same ESO pattern. See infrastructure.md for the delivery mechanism.
HOCON Configuration
Section titled “HOCON Configuration”Module-specific configuration in application.conf (or shopaccess/email/application.conf):
| Key | Type | Description |
|---|---|---|
email.postmark.accountToken | String | From ESO. Postmark account-level API token for provisioning. |
email.postmark.baseUrl | String | https://api.postmarkapp.com. Overridable for testing. |
email.encryption.key | String | From ESO. Partition-wide symmetric encryption key for server tokens. |
email.sending.rootDomain | String | The resolved {mail-root-domain} (e.g., ardamails.com). |
email.sending.partition | String | Current partition (prod, demo, dev, stage). |
email.sending.senderFunction | String | Local-part for From address (e.g., procurement). |
email.dns.hostedZoneId | String | Route53 Hosted Zone ID for the current partition’s mail zone. |
email.dns.provisioningRoleArn | String | ARN of the Route53 DNS provisioning role to assume via STS. |
email.tenantConfig.reservedSlugs | List | Slugs blocked from use as tenant identifiers. |
Database
Section titled “Database”The module creates its own database using the idempotent init container pattern (same as other modules). No infrastructure-level database provisioning is needed beyond the existing Aurora cluster.
Flyway migration location: shopaccess/email/database/migrations/
ESP (Postmark)
Section titled “ESP (Postmark)”See Postmark Service Design for the full Postmark API surface (provisioning, sending, delivery events, query/inspection) and infrastructure.md for DNS zone structure and account mapping.
Cross-Cutting Concerns
Section titled “Cross-Cutting Concerns”DNS, Secrets, and IAM
Section titled “DNS, Secrets, and IAM”See infrastructure.md for the full DNS zone structure (root zone in platformRoot, partition zones in Alpha001/Alpha002), Secrets Manager paths, and IAM role scoping.
Observability
Section titled “Observability”- Per-tenant send counts, delivery rates, bounce rates (from EmailJob data)
- Dispatcher health: queue depth, retry rates, failure rates
- ESP event processing latency
- Existing Arda observability patterns (CloudWatch metrics, dashboards)
Operator alerts (DQ-207)
Section titled “Operator alerts (DQ-207)”Because DNS verification is trigger-driven (no continuous loop), a row that exhausts a bounded polling round without verifying is invisible at the row level until the next trigger fires. The system relies on an operational alert to surface “configurations stuck pending”:
| Alert | Query | Trigger | Runbook |
|---|---|---|---|
email_configuration_pending_stale | SELECT count(*) FROM email_configuration WHERE status = 'PENDING_VERIFICATION' AND verification_started_at < now() - interval '15 minutes' | result > 0 | Page CS / on-call. Triage steps: (1) inspect diagnostic_message if present; (2) verify Postmark domain status via GET /domains/{id} to confirm DNS is or is not propagated; (3) hit PUT /email-configuration/{id}/retry-verification to restart bounded polling, or DELETE /email-configuration/{id} if the configuration is known-broken; (4) confirm the alert clears within ~15 min. |
email_configuration_provisioning_stuck | SELECT count(*) FROM email_configuration WHERE status = 'PROVISIONING' AND provisioning_started_at < now() - interval '5 minutes' | result > 0 | Page on-call. Provisioning should complete in seconds; anything > 5 min indicates a pod crash mid-flight (see DQ-205.f). Triage steps: identify orphan external resources (server name pattern, sending-domain FQDN); manually transition the row to PROVISIONING_FAILED with diagnostic; run DELETE to invoke best-effort decommission. |
These alerts replace the implicit observability provided by a continuous verification loop.
Functional/Runtime Mapping
Section titled “Functional/Runtime Mapping”| Functional | Runtime Resources | Notes |
|---|---|---|
| Frontend | ||
| SPA | Browser, Amplify | Runs on the browser, served by Amplify |
| BFF | Amplify | Next.js server running on Amplify |
| ShopAccess/Email Module | ||
| Endpoints | API Gateway, EKS Ingress & Service | Routes in API Gateway routed through VP Link, NLB and EKS Ingress. |
| Services | EKS Deployment, EKS Pod | Specific deployment to be defined. Initially most likely operations |
| Persistence | Aurora RDS (Postgres) | Currently a Database per Module, Database Server provisioned at the Partition layer. |
| Postman Proxy | EKS Pod, ESP (Postmark Account) | Implemented as a Ktor Client within the EKS Pod that accesses the ESP via an HTTP API. |
Copyright: © Arda Systems 2025-2026, All rights reserved