Skip to content

Email Integration -- Functional Design

First-pass identification of the functionality needed in each subsystem to implement the email features defined in the product feature documents product/features/general-behaviors/email-communications.md and product/features/procurement/email-orders.md (not yet authored).

Arda is a three-tier platform. Requests flow from the SPA through the BFF to the Backend; the Backend is the only layer that communicates with external services (ESP, AWS).

PlantUML diagram

The user-facing web interface. Renders UI components, manages client-side state (Redux), and collects user input. Does not call the Backend directly — all API traffic routes through the BFF. Authentication state (Cognito JWT) is managed client-side and passed to the BFF on every request.

The security and orchestration layer between the SPA and the Backend. Validates the user’s Cognito JWT, extracts user context (userId, tenantId, email, role), and forwards requests to the Backend with system credentials (ARDA_API_KEY) and context headers (X-Tenant-Id, X-Author). The BFF is the trust boundary — the Backend trusts the headers the BFF provides.

Backend (Kotlin / Ktor — Operations Service)

Section titled “Backend (Kotlin / Ktor — Operations Service)”

Core business logic, persistence, and external service integration. Organized as a modular monolith where each functional area (procurement/orders, reference/items, resources/kanban) is a module with its own routes, services, persistence, and configuration. Follows the Data Authority pattern with bitemporal persistence (Exposed ORM, PostgreSQL). All domain validation, lifecycle management, and transactional guarantees live here.

External Email Service Provider. Accepts messages via REST API, handles MTA routing, DKIM signing, deliverability, and returns delivery status via incoming REST requests to the Backend. Arda does not run its own mail infrastructure — Postmark is the mail delivery network.


  • Side panel component for editing email content before send
    • For email orders (orderMethod=EMAIL): renders full order data as editable HTML
    • For purchase orders (orderMethod=PURCHASE_ORDER): renders brief introduction text, editable
  • “Send via System” action button (invokes send dialog via BFF)
  • “Copy to Clipboard” action button (existing path, email orders only — no system involvement)
  • Content editing: rich-text or structured-field editing of the email body
  • Modal or panel implementing GEN::EML::0001::0003 (defined in the future product/use-cases/general-behaviors/email-communications.md use-case document).
  • Fields:
    • To (editable, pre-populated from order’s sales contact email)
    • Cc (editable, initially empty)
    • Reply-To (read-only, resolved by system)
    • Subject (read-only in v1)
    • Body preview (read-only in dialog, reflects edits from composition step)
    • Attachment list (read-only, shows filename and size)
    • Sender identity line (read-only)
  • “Send” and “Cancel” buttons
  • Cancel prompts for confirmation if any editable field was modified
  • Address validation (syntactic) before allowing Send
  • Send button disabled if To is empty
  • Status indicator on order detail view: Queued, Sent, Delivered, Bounced, Complained, Failed
  • For adverse statuses (Bounced, Complained, Failed): diagnostic message + CS reference ID
  • For Queued with retries: retry count and last attempt info
  • “Re-send” action available on Bounced or Failed orders (reopens send dialog with previous addresses)
  • Send status fetched via polling from the BFF or on entity detail load (no WebSocket/SSE — not available in the current platform)
  • Redux slice for email send state (per entity): status, diagnosticMessage, referenceId, lastUpdated

BFF routes mirror the backend ShopAccess/Email module endpoints for the generic email capability:

  • POST /api/arda/shop-access/email/email-job — Create an email sending job

    • Receives: addressing (To, Cc, Reply-To), subject, body content, attachments
    • Validates JWT, extracts user context
    • Forwards to Backend with system credentials and context headers
    • Returns: EmailJob record with initial status
  • GET /api/arda/shop-access/email/email-job/{jobId} — Get email job status

    • Proxies to Backend
    • Returns: status, diagnostic info, retry info
  • PUT /api/arda/shop-access/email/email-job/{jobId}/resend — Re-send after bounce/failure

    • Same shape as create, creates new EmailJob
  • POST /api/arda/shop-access/email/email-job/query — Query email jobs

    • Proxies to Backend

Note: Detailed BFF route definitions and SPA component specifications are deferred to the definition of the Procurement use cases that exercise the email capability. The email general behaviors are specified at the backend Endpoint API level. The BFF and SPA sections here provide context for how the email module will be consumed.

  • No email-specific business logic — pure proxy with JWT validation and header forwarding
  • Passes the user’s email (from JWT claims) to Backend so Backend can resolve Reply-To
  • Does not interact with ESP or generate PDFs
  • The email-configuration endpoint is not exposed through the BFF in v1 (CS-only, accessed directly)
  • The postmark-events webhook endpoint is called by Postmark directly, not through the BFF

Organized into three layers, three endpoints, and a small DataAuthority. Layering is the binding architectural decision — see DQ-201:

  • L1 — Protocol proxies. Stateless. One per external API surface; one credential strategy each. runCatching-bodied methods returning Result<T>.
  • L2 — Capability composers. Stateless. Choreograph L1 calls into capability operations (provision, decommission, verifyDns, sendOne). Map external errors to capability errors. No DB access. Hide nothing — external IDs and credentials flow through to L3 (DQ-201.c).
  • L3 — Application services. Hold all DB access via DataAuthorities. Hold the encryption key (DQ-202, DQ-203). Encrypt before INSERT; decrypt on demand for sending. Spawn bounded DNS-verification polling rounds via per-pod activePolling map (DQ-207).

PlantUML diagram

Three triggers feed the same bounded-polling primitive in L3; see Scenarios 1b.1, 1b.2, 1b.3 for the trigger-specific flows.


Provides the interface for creating, querying, and managing email sending jobs. See Scenario 2.

MethodPathDescription
POST/v1/shop-access/email/email-jobCreate and send an email job
GET/v1/shop-access/email/email-job/<jobId>Get job status and summary (no body/attachments)
GET/v1/shop-access/email/email-job/<jobId>/detailsGet full job details including body and attachments
PUT/v1/shop-access/email/email-job/<jobId>/cancelCancel a pending email job (action verb; job row is transitioned, not deleted)
PUT/v1/shop-access/email/email-job/<jobId>/resendRe-send a bounced or failed job
POST/v1/shop-access/email/email-job/queryQuery email jobs by criteria
POST/v1/shop-access/email/email-job/<jobId>/historyGet bitemporal history of a job

POST /email-job (create):

// Request
{
"emailConfigurationId": "<UUID>",
"to": "supplier@example.com",
"cc": "manager@example.com",
"replyTo": "procurement@tenant.com",
"subject": "Order ORD-00123 from TenantName",
"htmlBody": "<html>...</html>",
"textBody": "Plain text fallback (optional)",
"attachments": [
{
"name": "PO-ORD-00123.pdf",
"contentType": "application/pdf",
"content": "<base64>" // or "url": "https://..."
}
]
}
// Response 201 Created
{
"jobId": "<UUID>",
"status": "NEW",
"messageId": null,
"diagnosticMessage": null,
"createdAt": "2026-04-21T10:30:00Z"
}

GET /email-job/<jobId> (status):

// Response 200 OK
{
"jobId": "<UUID>",
"status": "QUEUED",
"messageId": "b7bc2f4a-...",
"to": "supplier@example.com",
"subject": "Order ORD-00123 from TenantName",
"diagnosticMessage": null,
"createdAt": "2026-04-21T10:30:00Z",
"updatedAt": "2026-04-21T10:30:01Z"
}

PUT /email-job/<jobId>/resend:

// Request (optional overrides)
{
"to": "new-supplier@example.com",
"cc": null
}
// Response 201 Created (new job)
{
"jobId": "<new UUID>",
"status": "NEW",
"originalJobId": "<UUID>"
}

DataAuthority endpoint for managing tenant email configurations. See Scenario 1.

MethodPathDescription
POST/v1/shop-access/email/email-configurationProvision a new tenant email configuration
GET/v1/shop-access/email/email-configuration/<configId>Get configuration and current status
PUT/v1/shop-access/email/email-configuration/<configId>/retry-verificationKick off a fresh bounded DNS verification polling round; allowed from PENDING_VERIFICATION or VERIFICATION_FAILED; refreshes verification_started_at (DQ-207.b)
PUT/v1/shop-access/email/email-configuration/<configId>/lockLock configuration (only from UNLOCKED)
PUT/v1/shop-access/email/email-configuration/<configId>/unlockUnlock configuration (only from LOCKED)
DELETE/v1/shop-access/email/email-configuration/<configId>Delete configuration (allowed from any non-PROVISIONING terminal-or-stable state: PENDING_VERIFICATION, UNLOCKED, LOCKED, VERIFICATION_FAILED, PROVISIONING_FAILED); runs best-effort decommission of external resources (DQ-205.d)
POST/v1/shop-access/email/email-configuration/queryQuery configurations

POST /email-configuration (provision):

// Request
{
"tenantEId": "<UUID>",
"tenantName": "Acme Manufacturing",
"tenantSlug": "acme",
"configSlug": "orders"
}
// Response 201 Created
{
"configId": "<UUID>",
"status": "PENDING_VERIFICATION",
"tenantSlug": "acme",
"configSlug": "orders",
"sendingDomain": "orders.acme.prod.ardamails.com",
"dkimVerified": false,
"returnPathVerified": false,
"dmarcPolicy": "none",
"provisionedAt": "2026-04-21T10:30:00Z"
}

GET /email-configuration/<configId>:

// Response 200 OK
{
"configId": "<UUID>",
"status": "UNLOCKED",
"tenantSlug": "acme",
"configSlug": "orders",
"sendingDomain": "orders.acme.prod.ardamails.com",
"postmarkServerId": 12345,
"dkimVerified": true,
"returnPathVerified": true,
"dmarcPolicy": "none",
"diagnosticMessage": null,
"provisionedAt": "2026-04-21T10:30:00Z"
}

PUT /email-configuration/<configId>/retry-verification:

// Response 200 OK
// - From PENDING_VERIFICATION: refreshes verification_started_at; kicks off a fresh bounded polling round
// - From VERIFICATION_FAILED: transitions to PENDING_VERIFICATION; kicks off bounded polling
// Response 409 Conflict (if status is PROVISIONING, PROVISIONING_FAILED, UNLOCKED, or LOCKED)

Receives delivery status events from Postmark. Authenticated via Authorization: Bearer header configured on each Postmark server’s webhook. See Scenario 3 and Postmark Service Design.

MethodPathDescription
POST/v1/shop-access/email/postmark-eventsReceive Postmark delivery event
// Request (from Postmark, varies by RecordType)
// Headers: Authorization: Bearer <ARDA_API_KEY>
{
"RecordType": "Delivery",
"MessageID": "b7bc2f4a-...",
"Recipient": "supplier@example.com",
"DeliveredAt": "2026-04-21T10:30:05Z"
}
// Response: 200 OK (processed)
// Response: 403 Forbidden (auth failure, stops retries)
// Response: 500 Internal Server Error (transient, triggers retry)

Provider-agnostic service for creating and managing email sending jobs. Delegates to the emailConfiguration service for tenant configuration and credential access.

interface EmailJobService {
/** Create and send an email job. See Scenario 2. */
suspend fun createAndSend(request: CreateEmailJobRequest): Result<EmailJob>
/** Get job by ID. */
suspend fun getJob(jobId: UUID): Result<EmailJob>
/** Get job with full details (body, attachments). */
suspend fun getJobDetails(jobId: UUID): Result<EmailJobDetails>
/** Cancel a pending job. */
suspend fun cancelJob(jobId: UUID): Result<EmailJob>
/** Re-send a bounced or failed job, optionally with address overrides. */
suspend fun resendJob(jobId: UUID, overrides: ResendOverrides?): Result<EmailJob>
/** Query jobs by criteria. */
suspend fun queryJobs(query: EmailJobQuery): Result<PageResult<EmailJob>>
/** Handle a delivery event from the ESP. See Scenario 3. */
suspend fun handleDeliveryEvent(event: PostmarkEvent): Result<Unit>
}
data class CreateEmailJobRequest(
val emailConfigurationId: UUID,
val to: String,
val cc: String? = null,
val replyTo: String,
val subject: String,
val htmlBody: String,
val textBody: String? = null,
val attachments: List<EmailAttachment> = emptyList()
)
data class EmailAttachment(
val name: String,
val contentType: String,
val content: String? = null, // base64-encoded blob
val url: String? = null // or URL to fetch
)
data class ResendOverrides(
val to: String? = null,
val cc: String? = null
)

The emailJob service receives delivery events from the Message Events endpoint:

/** Normalized event from Postmark webhook payload. */
data class PostmarkEvent(
val recordType: PostmarkEventType,
val messageId: String,
val recipient: String,
val deliveredAt: Instant? = null,
val bouncedAt: Instant? = null,
val bounceType: String? = null,
val bounceDescription: String? = null,
val complaintType: String? = null
)
enum class PostmarkEventType {
DELIVERY, BOUNCE, SPAM_COMPLAINT
}

A bitemporal entity representing an email sending job. Persisted in the email_job table.

PlantUML diagram

NEW : The email job has been acknowledged by the ShopAccess/Email module but it has not yet been transmitted to the ESP.

QUEUED : The email job has been sent to the ESP but the ESP has not confirmed it has sent it through to the email network (SMTP or equivalent).

SENT : The ESP has confirmed that it has sent the email through to the email network (SMTP or equivalent).

DELIVERED : The ESP has confirmed that the email has been delivered to the recipient’s inbox.

BOUNCED : The ESP has confirmed that the email has been bounced.

COMPLAINED : The ESP has confirmed that the email has been marked as spam.

FAILED : The email job has failed due to unavailability of the ESP or internal ESP errors.

CANCELLED : The email job was cancelled by the user before it was sent to the ESP. Only jobs in NEW status can be cancelled.

A DataAuthority containing EmailJob entities.


Manages tenant email configurations, including provisioning, DNS verification, and secure credential storage. The only service that handles the encryption key and Postmark account token.

interface EmailConfigurationService {
/** Provision a new tenant email configuration. See Scenario 1.
* Returns immediately with PENDING_VERIFICATION status.
* DNS verification proceeds asynchronously. */
suspend fun provision(request: ProvisionRequest): Result<EmailConfiguration>
/** Get a configuration by ID (any status). */
suspend fun getConfiguration(configId: UUID): Result<EmailConfiguration>
/** Get an UNLOCKED configuration with decrypted server token.
* Returns error if configuration is not UNLOCKED.
* Called by emailJob service (internal, not an HTTP endpoint). See Scenario 2. */
suspend fun getUnlockedConfiguration(
tenantId: UUID,
configurationId: UUID? = null
): Result<UnlockedEmailConfiguration>
/** Kick off a fresh bounded DNS verification polling round.
* Allowed from PENDING_VERIFICATION (refreshes verificationStartedAt) or
* VERIFICATION_FAILED (transitions to PENDING_VERIFICATION).
* Returns 409-equivalent error from any other status. See DQ-207. */
suspend fun retryVerification(configId: UUID): Result<EmailConfiguration>
/** Lock a configuration (UNLOCKED -> LOCKED). Prevents email sending. */
suspend fun lock(configId: UUID): Result<EmailConfiguration>
/** Unlock a configuration (LOCKED -> UNLOCKED). Re-enables email sending. */
suspend fun unlock(configId: UUID): Result<EmailConfiguration>
/** Delete a configuration (from UNLOCKED or VERIFICATION_FAILED). */
suspend fun delete(configId: UUID): Result<Unit>
/** Query configurations by criteria. */
suspend fun queryConfigurations(query: EmailConfigQuery): Result<PageResult<EmailConfiguration>>
}
data class ProvisionRequest(
val tenantEId: UUID,
val tenantName: String? = null,
val tenantSlug: String? = null, // if null, derived from tenantEId + tenantName
val configSlug: String
)
/** Returned by getUnlockedConfiguration -- includes decrypted server token. */
data class UnlockedEmailConfiguration(
val configId: UUID,
val sendingDomain: String,
val serverToken: String, // decrypted, in-memory only
val postmarkServerId: Int,
val postmarkDomainId: Int
)
/** Full configuration entity (without decrypted token).
* External IDs are nullable because the persist-first lifecycle (DQ-205) inserts the row
* in PROVISIONING status before any external mutation; IDs are populated as resources are created. */
data class EmailConfiguration(
val configId: UUID,
val status: EmailConfigurationStatus,
val tenantSlug: String,
val configSlug: String,
val sendingDomain: String,
val postmarkServerId: Int? = null, // null while in PROVISIONING; nullable on PROVISIONING_FAILED
val postmarkDomainId: Int? = null, // null while in PROVISIONING; nullable on PROVISIONING_FAILED
val postmarkWebhookId: Int? = null, // null until createWebhook succeeds; nullable on PROVISIONING_FAILED
val dkimVerified: Boolean = false,
val returnPathVerified: Boolean = false,
val dmarcPolicy: String = "none",
val diagnosticMessage: String? = null,
val provisioningStartedAt: Instant, // set at initial INSERT, always populated
val provisionedAt: Instant? = null, // set on transition from PROVISIONING to PENDING_VERIFICATION
val verificationStartedAt: Instant? = null // set on entry to PENDING_VERIFICATION; refreshed by /retry-verification.
// Used solely for the operator-alert query (DQ-207.j); not for any
// automatic transition.
)
enum class EmailConfigurationStatus {
PROVISIONING, PENDING_VERIFICATION, UNLOCKED, VERIFICATION_FAILED, LOCKED, PROVISIONING_FAILED
}

The lifecycle uses a persist-first model: the row is inserted in PROVISIONING status before any external mutation, so the database always has an anchor for the in-flight operation. See DQ-205.

DNS verification is trigger-driven rather than continuously polled: bounded polling rounds (5 attempts × 60 s by default) are kicked off by three events — successful provisioning, manual /retry-verification, or a send attempt against a PENDING_VERIFICATION row. A row that exhausts a bounded round without verifying stays in PENDING_VERIFICATION (no automatic transition to VERIFICATION_FAILED) and recovers when the next trigger fires. See DQ-207.

PlantUML diagram

PROVISIONING : Entry state. Row inserted before any external mutation. No manual transitions out of this state — only the emailConfiguration service moves it (to PENDING_VERIFICATION on success or PROVISIONING_FAILED on failure). A row stuck in this state for more than ~5 minutes is presumed orphaned (e.g., pod crashed mid-flight) and requires operator triage.

PROVISIONING_FAILED : Terminal state for operations that failed during external resource creation. The row carries whichever external IDs were captured before the failure, plus a diagnosticMessage describing the failure point. Operator deletes via the standard DELETE endpoint, which runs best-effort decommission of any captured external resources.

PENDING_VERIFICATION : Provisioning succeeded. Postmark server, domain, and webhook are created; DNS records are published. DNS verification has not yet completed. Email sending is not allowed in this state. The row may have an active bounded polling task on some pod, or no active task at all — this is operationally invisible at the row level. The next trigger (provision-success on a related row, manual retry, or a send attempt) restarts a bounded polling round.

UNLOCKED : DNS verification succeeded. The configuration is ready for email sending. Only configurations in this status are returned by getUnlockedConfiguration().

VERIFICATION_FAILED : Reserved state. Not entered automatically in v1 (DQ-207 does not auto-transition out of PENDING_VERIFICATION). Reserved for v2 operator-marked-failed and async-reconciler scenarios. /retry-verification accepts it as a source for forward compatibility.

LOCKED : Administratively disabled. CS or admin action. Email sending is not allowed. Can be re-enabled.

Per-tenant Postmark server tokens are stored in the database, encrypted with a partition-wide encryption key. This avoids per-tenant Secrets Manager writes and runtime Secrets Manager calls.

  • Encryption key: A single symmetric key per partition, created by CDK in Secrets Manager and delivered to the pod via the External Secrets Operator (ESO) mechanism (see infrastructure.md). Available to the emailConfiguration service as a HOCON config property (email.encryption.key) at startup.
  • Encrypt on write: During provisioning, the emailConfiguration service encrypts the server token returned by Postmark before persisting it in the serverTokenEncrypted column.
  • Decrypt on read: When the emailJob service calls getUnlockedConfiguration(), the emailConfiguration service decrypts the server token and returns it as part of the UnlockedEmailConfiguration response. The emailJob service never handles the encryption key directly.
  • Key rotation: Rotating the encryption key is a single operation: read all encrypted tokens, re-encrypt with the new key, update in a transaction. No per-tenant Secrets Manager updates.

The Postmark account-level API token (used for provisioning, not per-tenant) is also delivered via ESO as a HOCON config property (email.postmark.accountToken). Only the emailConfiguration service accesses it.

A DataAuthority containing EmailConfiguration entities.


Synchronous steps (interactive, returns immediately):

  1. Validate input locally — slugs DNS-safe, not reserved, FQDN within DNS limits.

  2. Open DB transaction:

    • Check no row exists for (tenantId, configSlug) or sendingDomain.
    • Pre-flight external state checks (tenantProvisioning.checkAvailability(spec)):
      • Postmark Account: no server with the planned name exists.
      • Postmark Account: no domain with the planned FQDN exists.
      • Route53: no records exist at the three target names.
    • On any conflict: ROLLBACK and return 409 with diagnostic identifying the orphan/collision.
    • On clear: INSERT row with status=PROVISIONING, provisioning_started_at=now(), all external IDs null. COMMIT.
  3. Run external mutations in this order (Postmark first, Route53 second, per DQ-205.k):

    1. Postmark Account: createServerserverId, serverToken.
    2. Postmark Account: createDomaindomainId, DKIM/return-path values.
    3. Postmark Server: createWebhookwebhookId.
    4. Route53: UPSERT DKIM TXT record.
    5. Route53: UPSERT Return-Path CNAME record.
    6. Route53: UPSERT DMARC TXT record.

    (Record writes use ChangeResourceRecordSets action=UPSERT for idempotency; see DQ-205.m.)

  4. Encrypt the server token with the partition-wide encryption key (AES-256-GCM, see DQ-202).

  5. UPDATE row with all external IDs + encrypted token + status PENDING_VERIFICATION. (Retry with bounded backoff; if persistent failure, leave row in PROVISIONING with diagnostic naming the orphans for manual reconciliation.)

  6. Return to client.

Failure handling: if any step in (3) fails, L2 returns a structured Failure(PartialProgress) with whichever external IDs were captured. L3 updates the row with the partial IDs + status PROVISIONING_FAILED + diagnostic. No automatic external cleanup; operator triages via the DELETE endpoint, which runs best-effort decommission.

Asynchronous DNS verification (trigger-driven, see DQ-207):

  • Successful provisioning kicks off a bounded polling round on the pod that handled the request — 5 attempts × 60 seconds by default (HOCON-configurable).
  • Each attempt calls tenantProvisioning.verifyDns(postmarkDomainId), which delegates to postmarkAccountProxy.verifyDkim and verifyReturnPath.
  • On verification success: update row to UNLOCKED (idempotent UPDATE guarded by WHERE status = 'PENDING_VERIFICATION').
  • On bounded round exhaustion: row stays in PENDING_VERIFICATION. Recovery is trigger-driven: a subsequent send attempt (getUnlockedConfiguration) or a manual PUT .../retry-verification will kick off a fresh bounded round. There is no automatic transition to VERIFICATION_FAILED in v1.
  • Pod restart during a polling round drops the in-flight task (a known coverage gap, mitigated by an operator alert on stale PENDING_VERIFICATION rows; see Observability).

See Scenario 1, Scenario 1b, Postmark Service Design, and DQ-205 / DQ-207 for the detailed provisioning flow, API endpoints, and failure-recovery design.

DELETE on a configuration in any terminal state (UNLOCKED, VERIFICATION_FAILED, PROVISIONING_FAILED) runs best-effort decommission before removing the DB row:

  1. Read the row’s captured external IDs (some may be null in the PROVISIONING_FAILED case).
  2. Delete external resources in this order (Route53 first, Postmark second — the inverse of provisioning, per DQ-205.k):
    1. Route53: delete DMARC, Return-Path, DKIM records (any that exist).
    2. Postmark Account: delete domain.
    3. Postmark Account: delete server. (Cascade-deletes the webhook.)
  3. DELETE the DB row unconditionally, regardless of decommission outcomes.
  4. Return a result describing which deletions succeeded and which failed; failed deletions are surfaced to the caller for manual cleanup.

  • Stub implementation of EmailJobService interface
  • Logs send intent to console / in-memory store
  • Synthesizes fake delivery events for testing the full lifecycle
  • Zero external dependency

The ShopAccess/Email module runs within the operations component. The following configuration is required at the component level for the module to function.

New entry in the component’s values.yaml under apis: to register the module’s routes with the Ingress:

apis:
system.shopAccess.email:
name: "shop-access/email"
version: "v1"

This generates the Ingress path /v1/shop-access/email/*, routing all email module traffic (including the Postmark webhook endpoint) through the component’s Kubernetes Service.

Note: The Postmark webhook endpoint (/v1/shop-access/email/postmark-events) currently has no external authorizer at the API Gateway level. Authentication is handled by the endpoint itself via Bearer token validation. In the future, a Bearer Token authorizer may be added at the API Gateway for this route.

New entries in the component’s Helm secrets.yaml template to sync email secrets from Secrets Manager to Kubernetes:

Secret Name PatternHOCON PropertyUsed By
<infrastructure>-<partition>-I-EmailPostmarkAccountTokenemail.postmark.accountTokenemailConfiguration service (provisioning)
<infrastructure>-<partition>-I-EmailEncryptionKeyemail.encryption.keyemailConfiguration service (encrypt/decrypt server tokens)

These are templated into secrets.properties alongside existing database credentials, following the same ESO pattern. See infrastructure.md for the delivery mechanism.

Module-specific configuration in application.conf (or shopaccess/email/application.conf):

KeyTypeDescription
email.postmark.accountTokenStringFrom ESO. Postmark account-level API token for provisioning.
email.postmark.baseUrlStringhttps://api.postmarkapp.com. Overridable for testing.
email.encryption.keyStringFrom ESO. Partition-wide symmetric encryption key for server tokens.
email.sending.rootDomainStringThe resolved {mail-root-domain} (e.g., ardamails.com).
email.sending.partitionStringCurrent partition (prod, demo, dev, stage).
email.sending.senderFunctionStringLocal-part for From address (e.g., procurement).
email.dns.hostedZoneIdStringRoute53 Hosted Zone ID for the current partition’s mail zone.
email.dns.provisioningRoleArnStringARN of the Route53 DNS provisioning role to assume via STS.
email.tenantConfig.reservedSlugsListSlugs blocked from use as tenant identifiers.

The module creates its own database using the idempotent init container pattern (same as other modules). No infrastructure-level database provisioning is needed beyond the existing Aurora cluster.

Flyway migration location: shopaccess/email/database/migrations/


See Postmark Service Design for the full Postmark API surface (provisioning, sending, delivery events, query/inspection) and infrastructure.md for DNS zone structure and account mapping.


See infrastructure.md for the full DNS zone structure (root zone in platformRoot, partition zones in Alpha001/Alpha002), Secrets Manager paths, and IAM role scoping.

  • Per-tenant send counts, delivery rates, bounce rates (from EmailJob data)
  • Dispatcher health: queue depth, retry rates, failure rates
  • ESP event processing latency
  • Existing Arda observability patterns (CloudWatch metrics, dashboards)

Because DNS verification is trigger-driven (no continuous loop), a row that exhausts a bounded polling round without verifying is invisible at the row level until the next trigger fires. The system relies on an operational alert to surface “configurations stuck pending”:

AlertQueryTriggerRunbook
email_configuration_pending_staleSELECT count(*) FROM email_configuration WHERE status = 'PENDING_VERIFICATION' AND verification_started_at < now() - interval '15 minutes'result > 0Page CS / on-call. Triage steps: (1) inspect diagnostic_message if present; (2) verify Postmark domain status via GET /domains/{id} to confirm DNS is or is not propagated; (3) hit PUT /email-configuration/{id}/retry-verification to restart bounded polling, or DELETE /email-configuration/{id} if the configuration is known-broken; (4) confirm the alert clears within ~15 min.
email_configuration_provisioning_stuckSELECT count(*) FROM email_configuration WHERE status = 'PROVISIONING' AND provisioning_started_at < now() - interval '5 minutes'result > 0Page on-call. Provisioning should complete in seconds; anything > 5 min indicates a pod crash mid-flight (see DQ-205.f). Triage steps: identify orphan external resources (server name pattern, sending-domain FQDN); manually transition the row to PROVISIONING_FAILED with diagnostic; run DELETE to invoke best-effort decommission.

These alerts replace the implicit observability provided by a continuous verification loop.


FunctionalRuntime ResourcesNotes
Frontend
SPABrowser, AmplifyRuns on the browser, served by Amplify
BFFAmplifyNext.js server running on Amplify
ShopAccess/Email Module
EndpointsAPI Gateway, EKS Ingress & ServiceRoutes in API Gateway routed through VP Link, NLB and EKS Ingress.
ServicesEKS Deployment, EKS PodSpecific deployment to be defined. Initially most likely operations
PersistenceAurora RDS (Postgres)Currently a Database per Module, Database Server provisioned at the Partition layer.
Postman ProxyEKS Pod, ESP (Postmark Account)Implemented as a Ktor Client within the EKS Pod that accesses the ESP via an HTTP API.