Email Integration -- Architectural Scenarios

Functional-level sequence diagrams for the key use cases of the ShopAccess/Email module. Participants represent functional components as defined in functional.md.

Scenario 1: Provision Tenant Email Configuration

A client system (CS script or future admin UI) provisions a new tenant email configuration with a unique tenant slug and config slug. The L3 application service runs pre-flight checks inside a DB transaction before any external mutation, then orchestrates the L2 capability composer to create external resources in a specific order (Postmark first, Route53 second), then persists the captured IDs and triggers the bounded DNS-verification polling round.

Decisions reflected in this scenario:

DQ-201: two-level server module (L1 protocol proxies + L2 capability composers).
DQ-202 / DQ-203: AES-256-GCM with versioned envelope; key derived via HKDF.
DQ-204: STS auto-chained at module startup; the L1 Route53 proxy makes calls without per-call AssumeRole.
DQ-205: persist-first lifecycle (PROVISIONING entry state); pre-flight checks; structured Failure(PartialProgress) on partial failure.
DQ-205.k: all Postmark mutations before any Route53 mutation.
DQ-205.m: Route53 record writes use UPSERT.
DQ-206: slug resolution.

Pre-conditions:

Tenant exists in the system with a valid tenantEId.
Postmark account token, encryption key, Route53 role ARN, and zone ID available via HOCON (delivered by ESO at startup).
Partition Route53 zone exists (infrastructure prerequisite).

Post-conditions on success:

Postmark server, sending domain, and webhook created (in that order).
DKIM TXT, Return-Path CNAME, and DMARC TXT records UPSERTed in the partition Route53 zone.
email_configuration row persisted with encrypted server token, all external IDs, status PENDING_VERIFICATION.
Client receives the configuration immediately; DNS verification proceeds asynchronously via Scenario 1b.1.

Post-conditions on partial failure (any step in section “Run External Mutations” fails):

Row persisted with status PROVISIONING_FAILED, partial external IDs captured, diagnostic message describing the failure point.
Operator triages via DELETE (best-effort decommission, see Scenario 4 / DQ-205.d).

PlantUML diagram

Scenario 1b: Async DNS Verification (trigger-driven)

DNS verification is trigger-driven rather than continuously polled. Three triggers feed a single shared primitive — a bounded polling round of up to 5 attempts × 60 seconds — inside EmailConfigurationService. Verification success transitions the row to UNLOCKED; round exhaustion leaves it in PENDING_VERIFICATION until the next trigger fires. There is no automatic transition to VERIFICATION_FAILED in v1. See DQ-207.

The bounded polling round itself is identical across triggers; the three sub-scenarios below differ only in how the round is initiated. Scenario 1b.1 shows the full round; Scenarios 1b.2 and 1b.3 show only the trigger and reference 1b.1 for the polling block.

Scenario 1b.1: DNS Verification triggered by provisioning success

Provisioning’s tail (Scenario 1, “Trigger bounded DNS verification”) kicks off a fire-and-forget bounded polling round on the pod that handled the provision request.

Pre-conditions:

Row was just transitioned to PENDING_VERIFICATION (from PROVISIONING).
verification_started_at set to now().
postmarkDomainId populated.

Post-conditions on success:

Row transitions to UNLOCKED; idempotent UPDATE guarded by WHERE status = 'PENDING_VERIFICATION'.

Post-conditions on round exhaustion:

Row stays in PENDING_VERIFICATION. Pod-local activePolling entry removed. Recovery awaits the next trigger (Scenario 1b.2 or 1b.3).

PlantUML diagram

Scenario 1b.2: DNS Verification triggered by manual `/retry-verification`

CS or an operator hits the retry endpoint to kick off a fresh bounded round, typically in response to an operator-alert page or to recover from a previous round’s exhaustion.

Pre-conditions:

Row in PENDING_VERIFICATION or VERIFICATION_FAILED state.

Post-conditions:

verification_started_at refreshed; if from VERIFICATION_FAILED, status transitions to PENDING_VERIFICATION.
A bounded polling round is kicked off (deduplicated via activePolling).
Endpoint returns 200; further state transitions happen asynchronously per Scenario 1b.1.

PlantUML diagram

Scenario 1b.3: DNS Verification triggered by send-time precondition fail

A send attempt against a PENDING_VERIFICATION row fails fast (the send is not delayed by a synchronous verify), but kicks off a fresh bounded polling round as a fire-and-forget side effect so the next send attempt is more likely to succeed. See DQ-207.b.

Pre-conditions:

An EmailJob create / send call landed.
Looked-up EmailConfiguration is in PENDING_VERIFICATION (not UNLOCKED).

Post-conditions:

The current send attempt fails fast with PreconditionFailed.
A bounded polling round is kicked off (deduplicated via activePolling).
EmailJob row transitions to FAILED with diagnostic per Scenario 2.

PlantUML diagram

Scenario 2: Send Email

A client system submits an email job with addressing, subject, body, and optional attachments. EmailJobService (L3) resolves the tenant’s email configuration via EmailConfigurationService.getUnlockedConfiguration(), hands the decrypted server token to EmailSender (L2), which calls postmarkServerProxy.sendEmail (L1) and reports back. If the configuration is in PENDING_VERIFICATION, the precondition check fails fast AND fires off a bounded DNS-verification polling round (Scenario 1b.3) so the next send attempt is more likely to succeed.

Decisions reflected in this scenario:

DQ-201: two-level server module; sending uses EmailSender (L2) and postmarkServerProxy (L1).
DQ-202 / DQ-203: server token decryption.
DQ-207.b: send-time precondition-fail kicks off Scenario 1b.3.

Pre-conditions:

Tenant has a provisioned EmailConfiguration (any status; behavior branches per status below).
Client provides: To, Cc (optional), Reply-To, Subject, Body (HTML), attachments (optional, as Blob or URL).
Client provides: emailConfigurationId — a UUID; resolved at runtime through EmailConfigurationService’s interface, not via DB FK (cross-Universe; see information-model.md § 7.1).

Post-conditions:

On UNLOCKED config: EmailJob persisted as NEW, then transitioned to QUEUED on Postmark acceptance. MessageID stored for webhook correlation. Client receives the record.
On non-UNLOCKED config: EmailJob persisted as FAILED with diagnostic. If config is PENDING_VERIFICATION, a bounded polling round is kicked off (Scenario 1b.3).

Transaction boundaries. Each EmailJob write (persistJob(NEW), the final transition to QUEUED / FAILED) is its own transaction in EmailJobUniverse. The intervening getUnlockedConfiguration call is a separate transaction in EmailConfigurationUniverse — the two services do not share a transaction. External HTTP calls to Postmark sit between transactions. See functional-design.md § 5 for the binding service-isolation rule.

PlantUML diagram

Scenario 3: Receive Message Event via Webhook

Postmark sends a delivery status event (Delivery, Bounce, or SpamComplaint) to the Arda webhook endpoint. The endpoint authenticates the request, correlates the event to an existing EmailJob via MessageID, and updates the job’s status.

Pre-conditions:

Webhook configured on the Postmark server with Bearer token authentication (see DQ-011)
An EmailJob exists in QUEUED or SENT status with a matching MessageID

Post-conditions:

EmailJob status updated to SENT, DELIVERED, BOUNCED, or COMPLAINED based on the event type
Diagnostic information stored for adverse events (bounce reason, complaint type)

PlantUML diagram

Event Type Mapping

Postmark `RecordType`	Source Status	Target Status	Diagnostic Fields
`Delivery`	QUEUED / SENT	DELIVERED	`DeliveredAt`, `Recipient`
`Bounce`	QUEUED / SENT	BOUNCED	`Type` (HardBounce, SoftBounce, …), `Description`, `BouncedAt`
`SpamComplaint`	DELIVERED	COMPLAINED	`Type`, `Recipient`

Note: Postmark may send a Delivery event before the internal status has transitioned from QUEUED to SENT. The service handles this by accepting valid forward transitions regardless of intermediate states (QUEUED to DELIVERED is valid if the SENT event was missed or arrived out of order).

Scenario 4: Tenant Decommission

DELETE /email-configuration/<configId> removes a tenant configuration. The L3 service runs best-effort decommission of the external resources via the L2 capability composer, then deletes the DB row unconditionally, and returns an aggregated success/failure result. The deletion order at L2 is the inverse of provisioning’s mutation order: Route53 records first, then Postmark resources. See DQ-205.d and DQ-205.k.

Decisions reflected in this scenario:

DQ-205.d: best-effort decommission; row deleted unconditionally; aggregated result returned.
DQ-205.k: Route53 deletes precede Postmark deletes (inverse of provisioning’s order to avoid leaving DNS records pointing at a deleted Postmark domain).
DQ-201.d: structured DecommissionResult carrying per-resource success/failure.

Pre-conditions:

Row exists in any state except PROVISIONING (in-flight provisioning is not directly DELETE-able; operator must wait or manually triage stuck rows per DQ-205.f).
Captured external IDs may be partial (especially for PROVISIONING_FAILED rows).

Post-conditions:

DB row deleted.
Best-effort attempts made to delete each external resource we have an ID for.
Caller receives aggregated DecommissionResult listing which deletions succeeded and which failed (so any leftovers can be cleaned up manually).

PlantUML diagram

Scenario 5: EmailJob lifecycle edges — Cancel and Resend (narrative)

Send (Scenario 2) is the happy-path. The two non-trivial lifecycle edges are below. They are documented in narrative form because the architectural shape is the same as Scenario 2 with different state checks; downstream design (BFF, SPA UX, integration tests) can reference this as the source of truth for behavior.

5.a — Cancel an EmailJob

Endpoint: PUT /v1/shop-access/email/email-job/<jobId>/cancel.

Allowed only when the job is in status NEW. NEW means the job has been persisted but the L1 send call has not yet been issued. Once the job is QUEUED or beyond, Postmark already owns the message and we cannot recall it.

Behavior:

EmailJobService.cancelJob(jobId) reads the job row.
If status NEW: UPDATE status='CANCELLED' with idempotency guard WHERE job_id = ? AND status = 'NEW'. Returns the updated EmailJob.
Any other status: returns Result.failure(PreconditionFailed("status=<status>")) → HTTP 409.

No external systems are touched. No L2 / L1 calls. The cancellation is a pure DB transition.

5.b — Resend a previously sent EmailJob

Endpoint: PUT /v1/shop-access/email/email-job/<jobId>/resend.

Allowed when the job is in status BOUNCED or FAILED. Creates a new EmailJob row referencing the original via originalJobId. The original row is left intact (audit trail).

Behavior:

EmailJobService.resendJob(jobId, overrides?) reads the original job.
If original status is BOUNCED or FAILED:
- Construct a new job spec from the original’s content + caller-supplied overrides (typically to / cc).
- Run the same flow as Scenario 2 (Create EmailJob → Resolve Configuration → Compose and Send) for the new job.
- The new job has originalJobId set to the original’s id; its lifecycle proceeds independently.
Otherwise: Result.failure(PreconditionFailed) → HTTP 409.

The configuration check at send-time is identical to Scenario 2: a non-UNLOCKED config will fail fast and (if PENDING_VERIFICATION) trigger Scenario 1b.3.

Scenario 6: EmailConfiguration admin operations — Lock and Unlock (narrative)

Endpoints:

PUT /v1/shop-access/email/email-configuration/<configId>/lock
PUT /v1/shop-access/email/email-configuration/<configId>/unlock

These are pure DB transitions used by CS / admin tooling to disable or re-enable email sending for a tenant configuration. No external systems are touched.

Behavior:

lock(configId): allowed only from UNLOCKED. UPDATE status='LOCKED' with guard WHERE config_id = ? AND status = 'UNLOCKED'. Other statuses return 409.
unlock(configId): allowed only from LOCKED. Symmetric.

Send-time interaction: EmailConfigurationService.getUnlockedConfiguration() checks status == UNLOCKED. A LOCKED configuration causes send-time precondition failure exactly like other non-UNLOCKED statuses (per Scenario 2’s else-branch). No DNS-verification kick-off happens for LOCKED (different from PENDING_VERIFICATION per Scenario 1b.3).

Race-condition note for downstream design: a lock that lands between L3’s status check and the actual Postmark send (i.e., during Scenario 2’s “Compose and Send” block) does not abort the in-flight send — the send proceeds with the already-decrypted token. At v1 concurrency this race window is small (sub-second) and the correctness impact is bounded (one email might be sent after a lock was issued). If this becomes a concern in v2, options include SELECT FOR UPDATE on the config row during send, or a generation-counter check at the L3 boundary. v1 accepts the small race as a known trade-off.

Email Integration -- Architectural Scenarios

Scenario 1: Provision Tenant Email Configuration

Scenario 1b: Async DNS Verification (trigger-driven)

Scenario 1b.1: DNS Verification triggered by provisioning success

Scenario 1b.2: DNS Verification triggered by manual /retry-verification

Scenario 1b.3: DNS Verification triggered by send-time precondition fail

Scenario 2: Send Email

Scenario 3: Receive Message Event via Webhook

Event Type Mapping

Scenario 4: Tenant Decommission

Scenario 5: EmailJob lifecycle edges — Cancel and Resend (narrative)

5.a — Cancel an EmailJob

5.b — Resend a previously sent EmailJob

Scenario 6: EmailConfiguration admin operations — Lock and Unlock (narrative)

Scenario 1b.2: DNS Verification triggered by manual `/retry-verification`