Design: Email Integration Phase 5a -- Component Library Updates
Overview
Section titled “Overview”Phase 5a ships five additive helpers in common-module consumed by the Phase 5b Email module. This design document covers four of them; the fifth — the idempotency helpers — has its own carved-out design. The four covered here are:
AppError.Application(§ 1) — a new third top-level branch underAppError, peer toInternalandInvocation. Three concrete subtypes (PreconditionFailed,PolicyRejected,ConflictingState) carry the “well-formed call, healthy system, application state doesn’t permit this operation” category that today gets misclassified asInvocation.GeneralValidationorInternal.IncompatibleState.Internal.IncompatibleStatereclassification sweep (§ 2) — a methodology-driven sweep through the 62+ construction sites ofInternal.IncompatibleStateincommon-module/lib/src/main, classifying each as kept (genuine bug-class), moved toApplication.ConflictingState(recoverable application outcome), or moved toInvocation.GeneralValidation(caller error).sanitizeHeader(§ 3) — value-cleaning primitive for inbound HTTP headers; composes downstream of the existingHeadersAllowList(name-based observability scoping) to provide value-based hard-rejection / silent-drop / clean for persistence.TokenCipher+Hmac(§ 4) — application-layer encrypted-field primitive implementing the DQ-R1-019 two-axis envelope, plus a smallHmachelper that DRYs two existing JDK-Maccall sites.
The idempotency helpers (§ 5) live in their own design document. They were a deep enough design exercise to warrant separation: a RawIdempotencyStore operating natively on JsonElement with a typed wrapper IdempotencyStore<Req, Res> produced by an inline fun typedAs() extension. Schema-evolution becomes a per-caller responsibility via the typed wrapper’s Json configuration.
All four helpers in §§ 1-4 are mostly independent — you can read one section without reading the others. Reviewers wanting a phased pass can take §§ 1-4 in order; § 1 is the foundation everything else assumes (Application.ConflictingState is referenced from the idempotency design).
Decision Summary
Section titled “Decision Summary”| # | Decision | Choice |
|---|---|---|
| DQ-R1-019 | Per-partition email server-token encryption key | Two-axis envelope a{N}.k{SM-VERSION-ID}; SM-native versioning; MaterialRegistry populated from a single ESO-projected JSON map carrying every live key-material version. TokenCipher ships in common-module. |
| DQ-R1-027 | AppError.Application shape | sealed class Application with three concrete subtypes; reportable() = emptyList() at branch root; no HTTP-status hints on subtypes. |
| DQ-R1-028 | Sweep methodology and PR sequencing | Discovery-then-classify; sweep lands as the final Phase 5a PR; common-module scope only; major bump. |
| DQ-R1-029 | sanitizeHeader placement and shape | New lib/api/headers/ package; Result<String?> for accept / silent-drop / hard-reject; composes downstream of HeadersAllowList. |
| DQ-R1-030 | TokenCipher factory + decrypt-failure classification + Hmac DRY | companion operator fun invoke(info, materials, currentVersionId) returning Result<TokenCipher>; auth-tag failure -> Internal.IncompatibleState; unknown versionId -> Transient.FailoverFailed; Hmac extracted and shared. |
All Kotlin sketches in this document conform to the workspace kotlin-coding standards: Result<T> on every fallible method, single-exit, when over if, no !! / getOrThrow / getOrNull, DI for dependencies, @JvmInline value classes for primitive type-safety.
1. AppError.Application
Section titled “1. AppError.Application”1.1 Frame
Section titled “1.1 Frame”The existing AppError hierarchy splits caller error from system error:
Internal— bugs and operational signals (Implementation,Infrastructure,IncompatibleState,InternalService,InternalTimeout,Transient).reportable()returnslistOf(this); these page on-call.Invocation— caller’s fault (ArgumentValidation,NullArgument,NotFound,Duplicate,Authorization). Not bug-worthy;reportable()returns empty list.
Neither captures the third real category: the call was well-formed, the system is healthy, but the application’s current state does not allow this operation right now. Today these get squeezed into:
Invocation.GeneralValidation— which lies; the caller did nothing wrong, the application’s state did.Internal.IncompatibleState— which lies in the other direction; the system is fine, just not in the state the caller assumed; pages on-call when it shouldn’t.
Per DQ-R1-027, AppError.Application is added as a third top-level branch with three concrete subtypes:
PreconditionFailed— the operation requires prior state the system doesn’t have (“cannot send before tenant is verified”, “cannot ship before order is paid”).PolicyRejected— the operation is disallowed by policy (“tenant suspended”, “rate limit exceeded for partition”).ConflictingState— the operation race-lost or its expectation drifted (“expected statuspending, foundcommitted”, “version conflict on optimistic update”).
1.2 Public API
Section titled “1.2 Public API”Extends the existing cards.arda.common.lib.lang.errors.AppError.kt:
package cards.arda.common.lib.lang.errors
// existing imports + existing AppError sealed hierarchy unchanged ...
sealed class AppError(...) : Throwable(...) {
// ... existing Composite / Generic / Internal / Invocation branches ...
/** * Expected application-domain outcomes that are neither caller errors nor * system bugs. The call was well-formed, the system is healthy, but the * application's current state does not allow this operation. * * [reportable] returns an empty list for every [Application] subtype: * these are not bug-worthy and must not page on-call. * * REST mapping is the single responsibility of the L4 mapping table * (typically `HttpErrorResponses.kt`); [Application] subtypes do NOT * carry HTTP-status hints. */ sealed class Application( override val message: String, override val context: LazyMessage? = null, override val cause: Throwable? = null, ) : AppError(message, cause, context) { override fun reportable(): List<Throwable> = emptyList() }
/** * The operation requires prior state the system does not have. * Example: "cannot send notification before tenant email configuration is verified". */ data class PreconditionFailed( override val message: String, override val context: LazyMessage? = null, override val cause: Throwable? = null, ) : Application(message, context, cause)
/** * The operation is disallowed by policy. * Example: "tenant suspended", "rate limit exceeded for partition". */ data class PolicyRejected( override val message: String, override val context: LazyMessage? = null, override val cause: Throwable? = null, ) : Application(message, context, cause)
/** * The operation race-lost or its expectation drifted. * Example: "expected status `pending`, found `committed`". */ data class ConflictingState( override val message: String, override val context: LazyMessage? = null, override val cause: Throwable? = null, ) : Application(message, context, cause)}1.3 L4 mapping
Section titled “1.3 L4 mapping”The L4 mapping table in HttpErrorResponses.kt gains entries:
Application.PreconditionFailed-> HTTP 409 Conflict (or 412 Precondition Failed depending on the resource semantics).Application.PolicyRejected-> HTTP 403 Forbidden.Application.ConflictingState-> HTTP 409 Conflict.
The exact mapping is the L4 layer’s responsibility; AppError.Application subtypes do not carry HTTP-status hints. A future L4 dispatcher (gRPC, SQS) maps differently without changing the AppError types.
1.4 Test plan
Section titled “1.4 Test plan”| Surface | Test type | What it asserts |
|---|---|---|
Application.PreconditionFailed ctor | Pure Kotlin | Construction with message, message + context, message + cause, all three. |
Application.PolicyRejected ctor | Pure Kotlin | Same shapes as PreconditionFailed. |
Application.ConflictingState ctor | Pure Kotlin | Same shapes. |
Application.reportable() | Pure Kotlin | All three subtypes return emptyList<Throwable>(). |
AppErrorReportableTest.kt extension | Pure Kotlin | Existing tests pass; new tests in the same file cover the three subtypes. |
L4 mapping (HttpErrorResponses.kt) | Pure Kotlin | Each subtype maps to the documented HTTP status; no subtype falls through to a default 500. |
2. Internal.IncompatibleState reclassification sweep
Section titled “2. Internal.IncompatibleState reclassification sweep”2.1 Frame
Section titled “2.1 Frame”DQ-R1-027 introduces AppError.Application; this sweep applies it across the codebase. The 62+ existing construction sites of Internal.IncompatibleState in common-module/lib/src/main each need a per-site judgement.
Per DQ-R1-028, the sweep:
- Lands as the last Phase 5a PR (sequenced after the four
Added-only helpers). - Is the major-bump PR — consumers doing exhaustive
whenoverAppError.Internalsee reclassified sites move out, which is aChanged-category release. - Covers
common-moduleonly. The matching sweep withinoperationsis Phase 5b’s consumer adoption work.
2.2 Three buckets
Section titled “2.2 Three buckets”Each site classifies into one of three buckets:
- Bucket A — keep as
Internal.IncompatibleState— genuine bug-class invariant violation. The system’s internal state contradicts an invariant the code expects. Examples: aPersistenceop finds an entity with two head versions when the bitemporal invariant forbids it; aUniversefinds an orphaned reference; aStateEnginefinds a transition the state graph doesn’t allow. - Bucket B — move to
Application.ConflictingState— recoverable application outcome. The caller asked for an operation against an expected state, and the system’s current state disagrees, but neither side is buggy. Examples: optimistic-update version mismatch; idempotency-key replay finds a different prior request body (Mismatchoutcome — though that’s surfaced as an outcome, not an error); a draftcommitfinds the draft already discarded by another session. - Bucket C — move to
Invocation.GeneralValidation— the caller passed input that the system can detect is invalid. Less common than the other two buckets, but exists. Example: a state-transition request that names a transition the state graph defines but the current state doesn’t have outbound to (caller knew the transitions, picked one wrong).
2.3 Discovery and classification methodology
Section titled “2.3 Discovery and classification methodology”The sweep PR’s first task is discovery — not modification. Build the inventory of Internal.IncompatibleState call sites with a grep pass, then walk each:
# in common-module/lib/src/main, from the worktree rootgrep -rn 'IncompatibleState(' lib/src/main/kotlin > scratch/incompatible-state-inventory.txtFor each site, the classifying engineer writes a one-line rationale next to the site number (in a scratch file, before any code change). The rationale answers two questions:
- What invariant is being checked? If the answer involves “the system’s own data shape” or “the code’s own contract”, that’s bucket A.
- Whose fault is the failure? If the caller asked for something against an expected state -> bucket B. If the system is internally inconsistent -> bucket A. If the caller passed input the call signature accepted but the body rejects -> bucket C.
The rationale per site is preserved in the PR description — the PR’s blast radius matters; reviewers need to see why each site moved (or didn’t).
2.4 Refactoring scope
Section titled “2.4 Refactoring scope”Each reclassified site changes:
- The
IncompatibleState(...)constructor call -> the appropriate replacement. - The error message text may need adjustment (an
Application.ConflictingStatemessage reads more naturally as “Expected status pending, found committed” than as the more terse internal-error wording). - The
contextandcauseparameters carry forward unchanged.
The migrated test sites get assertion updates:
- Tests that asserted
is Internal.IncompatibleStatechange tois Application.ConflictingState(or whichever bucket). - Tests that asserted
reportable() = listOf(this)change toreportable() = emptyList()for migrated sites (the on-call paging change).
2.5 Out-of-scope for this PR
Section titled “2.5 Out-of-scope for this PR”- The
operationsIncompatibleStatesweep — Phase 5b’s consumer adoption owns it. infrastructureandarda-frontend-app— not consumers ofAppError; no impact.
3. sanitizeHeader
Section titled “3. sanitizeHeader”3.1 Frame
Section titled “3.1 Frame”HeadersAllowList (existing, in lib/runtime/observability/) controls which headers are safe to log. sanitizeHeader controls what values are safe to read into business logic or persist. The two concerns are independent and compose at L4 inbound:
The composition pattern at L4 inbound has two stages. The allowlist filters by name first for observability scoping; sanitizeHeader then cleans or rejects by value before any survivor enters L3 / persistence.
3.2 Public API
Section titled “3.2 Public API”New package cards.arda.common.lib.api.headers:
package cards.arda.common.lib.api.headers
import cards.arda.common.lib.lang.errors.AppError
/** * Clean an inbound HTTP header value for persistence or business-logic use. * * Returns: * - [Result.success] of [String] -- cleaned value, safe to persist or pass to L3. * - [Result.success] of `null` -- header is policy-rejected; caller drops it * silently (no error). Use for headers that * the application doesn't accept but that * common HTTP clients may emit (e.g., * opportunistic correlation headers). * - [Result.failure] -- value violates a hard constraint (control * characters, oversize, charset). Caller * MUST reject the request. * * Hard-rejection categories return [AppError.Invocation.GeneralValidation] * with the offending header name in the message. */fun sanitizeHeader(name: String, value: String): Result<String?> = ...Cleaning rules (v1):
- Trim leading and trailing whitespace.
- Reject (
Result.failure) if the trimmed value contains C0 or C1 control characters (ASCII 0x00-0x1F or 0x7F-0x9F), with the exception of HTAB (0x09). - Reject (
Result.failure) if the trimmed value length exceeds the per-header cap (default 1024 chars; configurable in v2+). - Reject (
Result.failure) if the value is not valid UTF-8. - Otherwise return
Result.success(cleanedValue).
The function signature returns Result<String?>; the null channel is reserved for the future silent-drop path. v1 only emits Result.success(cleanedValue) (accept) or Result.failure (hard-reject) — the null return is never produced by v1. Callers may flatten Result<String?> to Result<String> defensively in v1, or treat null as “drop silently” once v2+ activates the path.
3.3 Composition with HeadersAllowList
Section titled “3.3 Composition with HeadersAllowList”HeadersAllowList continues to operate exactly as today (observability scoping; no changes). At L4 inbound, callers compose them:
// L4 inbound handler -- inside the transactionval filtered = HeadersAllowList.filter(rawHeaders)val cleaned: Result<Map<String, String>> = filtered.headers .toList() .foldRight(Result.success(emptyMap<String, String>())) { (name, value), acc -> acc.flatMap { soFar -> sanitizeHeader(name, value).map { cleanedValue -> // v1 never produces success(null); v2+ silent-drop collapses here cleanedValue?.let { soFar + (name to it) } ?: soFar } } }// L4 maps a Result.failure to HTTP 400 via the standard error-mapping pipeline.// cleaned.getOrElse { ... } is the L4 entry point's responsibility, not this helper's.The two helpers are deliberately independent. HeadersAllowList is a Sentry-shaping concern; sanitizeHeader is a persistence-safety concern. Future work that needs only one (e.g., a non-HTTP transport adopting the value cleaning) reuses sanitizeHeader alone without dragging in observability defaults.
3.4 Test plan
Section titled “3.4 Test plan”| Surface | Test type | What it asserts |
|---|---|---|
| Happy path | Pure Kotlin | Plain ASCII values, mixed-case values, trim-edges values all return Result.success(cleaned). |
| Control characters | Pure Kotlin | Embedded NUL, ESC, BEL, DEL all return Result.failure(AppError.Invocation.GeneralValidation). HTAB (0x09) is preserved. |
| Length cap | Pure Kotlin | 1024-char value accepted; 1025-char value rejected. |
| UTF-8 | Pure Kotlin | Valid multi-byte UTF-8 accepted; invalid byte sequences rejected. |
| Composition example | Pure Kotlin | A canned input map containing one allow-listed clean header, one allow-listed dirty header, one disallowed header passes through HeadersAllowList.filter + sanitizeHeader correctly. |
4. TokenCipher + Hmac
Section titled “4. TokenCipher + Hmac”4.1 Frame
Section titled “4.1 Frame”DQ-R1-019 pinned the per-partition encryption-key design:
- Two-axis envelope
a{N}.k{SM-VERSION-ID}:<base64-payload>. a{N}— algorithm version (v1 ships onlya1— AES-256-GCM + HKDF-SHA256). Code-indexed; never retired.k{SM-VERSION-ID}— AWS Secrets ManagerversionIdof the source key material; runtime-indexed via a singleExternalSecretmount projecting a JSON map of every live key-material version into the in-memoryMaterialRegistry. The cipher does not make any application-side calls to AWS Secrets Manager.- HKDF-SHA256 derivation from a 64-byte SM input (DQ-203 in the application-layer set).
TokenCipher is the Phase 5a primitive implementing this envelope. Hmac is a small adjacent helper that DRYs two existing JDK-Mac call sites and serves as TokenCipher’s internal HKDF building block.
Per DQ-R1-030:
- Factory shape —
companion operator fun invoke(...): Result<TokenCipher>. Constructor-shaped call site;Result<T>carries validation failures. - Auth-tag failure classification —
AppError.Internal.IncompatibleState. Bug-worthy; pages on-call. Hmacextraction — shared betweenTokenCipher,OpaqueId.kt, andS3AssetService.kt.
4.2 Package layout
Section titled “4.2 Package layout”New package cards.arda.common.lib.crypto:
crypto/├── TokenCipher.kt -- envelope cipher; public├── Hmac.kt -- HmacSHA256 wrapper; public├── EnvelopeAlgorithm.kt -- internal interface for algorithm-version dispatch└── EnvelopeAlgorithmA1.kt -- internal v1 implementationHmac is exposed as a sibling helper because the two existing JDK-Mac call sites are public callers of the same primitive; HKDF stays internal to TokenCipher in v1 (DT-003 deferred).
4.3 Hmac public API
Section titled “4.3 Hmac public API”package cards.arda.common.lib.crypto
import cards.arda.common.lib.lang.errors.AppError
/** * Thin wrapper over [javax.crypto.Mac] for HmacSHA256. * * Existing call sites that this wrapper replaces: * - cards.arda.common.lib.runtime.observability.OpaqueId (HMAC of the * tenant identifier with the Sentry-scrub salt). * - cards.arda.common.lib.infra.storage.S3AssetService (HMAC of the * S3 object key with a per-bucket secret). * * v1 exposes HmacSHA256 only. Other HMAC algorithms can be added as * additional public companion factories without breaking existing callers. */class Hmac private constructor(...) {
fun mac(input: ByteArray): Result<ByteArray> = ...
companion object { /** SHA-256 HMAC keyed with [key]. Returns failure on empty key. */ fun sha256(key: ByteArray): Result<Hmac> = ... }}4.4 TokenCipher public API
Section titled “4.4 TokenCipher public API”package cards.arda.common.lib.crypto
import cards.arda.common.lib.lang.errors.AppErrorimport java.util.UUID
/** * Application-layer encrypted-field primitive matching the DQ-R1-019 * two-axis envelope: `a{N}.k{SM-VERSION-ID}:<base64-payload>`. * * The envelope's algorithm-version axis (`a{N}`) is code-indexed; v1 ships * only `a1` (AES-256-GCM + HKDF-SHA256). The material-version axis * (`k{SM-VERSION-ID}`) is runtime-indexed via the [MaterialRegistry] * supplied at construction time. */class TokenCipher private constructor( private val info: String, private val materials: MaterialRegistry, private val currentVersionId: UUID, private val algorithms: EnvelopeAlgorithmRegistry,) {
fun encrypt(plaintext: ByteArray): Result<String> = ...
fun decrypt(envelope: String): Result<ByteArray> = ...
companion object { /** * Construct a [TokenCipher] for a given purpose. * * @param info HKDF `info` constant (per-purpose; "email-server-token", * "card-claim-token", etc.). Non-empty. * @param materials Registry mapping `versionId` -> 64-byte key material. Holds * every key-material version the cipher must decrypt against. * Pre-populated by the caller; may be mutated at runtime * (e.g., by a caller-managed file watcher reacting to ESO * refreshes). Must have at least one entry. * @param currentVersionId The `versionId` to use when encrypting new envelopes. Must * be present in [materials] at construction time. */ operator fun invoke( info: String, materials: MaterialRegistry, currentVersionId: UUID, ): Result<TokenCipher> = ... }}
/** Registry mapping `versionId` -> 64-byte key material. */class MaterialRegistry private constructor(...) { fun get(versionId: UUID): ByteArray? = ... fun add(versionId: UUID, material: ByteArray): Result<Unit> = ...
companion object { fun of(initial: Map<UUID, ByteArray>): Result<MaterialRegistry> = ... }}The cipher does not consult any external system at runtime. The MaterialRegistry is the single source of key material. The registry is populated at construction time by the caller — typically Phase 5b’s EmailConfigurationService parsing a JSON map projected by ESO from the partition’s EmailEncryptionKey AWS Secrets Manager secret. The caller may mutate the registry at runtime in response to ESO refresh events (file watcher, periodic re-read); the cipher reads only what is currently in the registry.
4.5 Envelope format
Section titled “4.5 Envelope format”The envelope is a single string composed of three segments. The algorithm version and the SM versionId are joined with .; the prefix and the base64 payload are joined with :. Example:
a1.k01234567-89ab-cdef-0123-456789abcdef:<base64-of(IV || ciphertext || tag)>| Segment | Example | Meaning |
|---|---|---|
| Algorithm version | a1 | Envelope-algorithm version. v1 ships only a1 (AES-256-GCM + HKDF-SHA256). |
SM versionId | k01234567-89ab-cdef-0123-456789abcdef | AWS Secrets Manager versionId of the source material, k-prefixed. UUID format. |
| Payload | base64 of (IV |
Round-trip property: decrypt(encrypt(plaintext)) == plaintext for any plaintext, given the material referenced by the envelope’s versionId is reachable.
Parsing: split on :, then split the prefix on .. Reject anything that doesn’t match the expected three-segment shape with AppError.Invocation.GeneralValidation.
4.6 Key-derivation and encryption
Section titled “4.6 Key-derivation and encryption”Encryption:
- Pick the current
versionIdfrom [MaterialRegistry] (caller-supplied; typically AWSCURRENT). - HKDF-SHA256 over (
material,info, salt = empty) -> 32-byte AES key. - Generate random 12-byte IV.
- AES-256-GCM with the derived key, the IV, and the plaintext; output = 12-byte IV || ciphertext || 16-byte tag.
- base64-encode output (no padding); concatenate
a1.k{versionId}:prefix.
Decryption:
- Parse the envelope; extract
versionIdand base64 payload. - Look up
versionIdin [MaterialRegistry]. If missing, returnResult.failure(AppError.Transient.FailoverFailed(...))(see § 4.7). - HKDF-SHA256 over (
material,info, salt = empty) -> 32-byte AES key. - Split payload into IV (12), ciphertext (variable), tag (16).
- AES-256-GCM decrypt with the derived key. If the auth-tag check fails, return
Result.failure(AppError.Internal.IncompatibleState(...))(see § 4.7). - Return
Result.success(plaintext).
4.7 Decrypt failure classification
Section titled “4.7 Decrypt failure classification”Two failure modes on decrypt, classified into different AppError families because the operational response differs.
Auth-tag mismatch → Result.failure(AppError.Internal.IncompatibleState(...)). Genuinely bug-class:
- The
versionIdwas found, so we’re using the correct key material. - The auth tag failed, so either (a) the ciphertext was corrupted in storage, (b) the IV or tag bytes were truncated / shifted, or (c) the envelope was tampered with.
- None of these are normal operational outcomes. Auth-tag failure is rare; when it happens, an engineer needs to look.
Internal.IncompatibleState’sreportable()returnslistOf(this), so this surfaces in on-call alerting.
Unknown versionId → Result.failure(AppError.Transient.FailoverFailed(cause)) where cause is a synthetic Throwable whose message names the missing versionId (e.g. IllegalStateException("Key material for version $versionId not present in registry")). Bounded transient:
- The pod’s
MaterialRegistryhas not yet observed a version that the secret store knows about. In production this happens during the brief window between an ESO refresh of the projected JSON map and the application’s reaction to that refresh (or, less commonly, during pod startup before the first projection has been read). - Existing retry layers — Postmark webhook delivery retries, the application’s outbound idempotency + retry on
AppError.Transient.*, L4 client retries on5xx— fire after timescales that exceed ESO’s reconciliation interval. By the time the retry attempt arrives, the registry has been refreshed and the decrypt succeeds. Transient’sreportable()returns empty list, so this does NOT page on-call.- Class-name caveat:
AppError.Transient.FailoverFailedwas originally defined for Aurora failover scenarios; its name is observability noise here. Diagnostic information lives in the cause’s message and in structured logging at the catch site. A more specific subtype (Transient.PropagationLagor similar) is intentionally NOT added because adding to the sealedTransienthierarchy would force consumers’ exhaustivewhenblocks to update — a breaking change disproportionate to the observability gain. Reuse of the existing subtype is the deliberate trade-off.
If operational reality reveals spurious auth-tag failures (e.g., a real ESO sync producing brief windows of stale key material in a mount), the auth-tag classification can be revisited — but the starting position is bug-worthy.
4.8 Adjacent refactor — OpaqueId and S3AssetService
Section titled “4.8 Adjacent refactor — OpaqueId and S3AssetService”Two existing call sites inline the JDK-Mac dance for HmacSHA256:
// OpaqueId.kt:67val mac = Mac.getInstance("HmacSHA256").apply { init(SecretKeySpec(salt, "HmacSHA256")) }
// S3AssetService.kt:143-144Mac.getInstance("HmacSHA256") .apply { init(SecretKeySpec(key, "HmacSHA256")) }Both migrate to:
val hmacResult = Hmac.sha256(key)// ... use hmacResult.getOrThrow().mac(input) inside the existing Result chainThe behaviour at both sites is byte-identical before and after; the migration is a private call-site refactor with no external API change. The PR’s CHANGELOG entry stays Added-only (the new helper is the addition; the migration is internal).
4.9 Test plan
Section titled “4.9 Test plan”| Surface | Test type | What it asserts |
|---|---|---|
Hmac.sha256 ctor | Pure Kotlin | Empty key returns Result.failure; non-empty key returns Result.success. |
Hmac.mac round-trip | Pure Kotlin | Known-answer test against a canned (key, input, expected) triple. |
TokenCipher.invoke ctor | Pure Kotlin | Empty info returns Result.failure; empty registry returns Result.failure; currentVersionId not in materials returns Result.failure; valid args return Result.success. |
TokenCipher encrypt-decrypt round-trip | Pure Kotlin | Random plaintexts of various lengths (0, 1, 16, 1024, 65536 bytes) round-trip cleanly within the same MaterialRegistry. |
| Material-version transition | Pure Kotlin | Envelope written with versionId=A decrypts cleanly when versionId=A is still in the registry. |
| Tampered envelope | Pure Kotlin | Flipping a single byte of the base64 payload causes decrypt to return Result.failure(AppError.Internal.IncompatibleState). |
Unknown versionId | Pure Kotlin | Envelope referencing a versionId not in the registry returns Result.failure(AppError.Transient.FailoverFailed) whose cause’s message names the missing version. |
| Bad envelope shape | Pure Kotlin | Envelopes missing the . or : separators return Result.failure(AppError.Invocation.GeneralValidation). |
OpaqueId regression | Pure Kotlin | Existing OpaqueIdTest continues to pass post-migration. |
S3AssetService regression | Pure Kotlin | Existing S3AssetService tests continue to pass post-migration. |
5. Idempotency helpers
Section titled “5. Idempotency helpers”Carved out into a separate design document due to depth and surface area. See idempotency-design.md for:
- API sketches (
RawIdempotencyStore,IdempotencyStore<Req, Res>,IdempotencyStoreFactory,IdempotencyKeyMinter). - Package layout (
cards.arda.common.lib.runtime.idempotency). - DB schema and concurrency strategy (
INSERT ON CONFLICT+ follow-upSELECT, no row locks). - Canonical-JSON helper for stable hashing.
- Test plan (ContainerizedPostgres lifecycle,
Mismatchdetection,InFlight,purgeExpired).
Phase 5b L3 services consume the typed view (IdempotencyStore<EmailSendRequest, EmailJob>) and the IdempotencyKeyMinter for outbound Postmark retry safety. The idempotency_record Flyway migration ships in Phase 5b’s consumer adoption (operations), not in common-module.
6. Cross-cutting items
Section titled “6. Cross-cutting items”6.1 Workspace kotlin-coding compliance
Section titled “6.1 Workspace kotlin-coding compliance”All five helpers follow the workspace kotlin-coding standards:
- Every fallible method returns
Result<T>; single-exit composition withflatMap/mapCatching. - No
!!,getOrThrow,getOrNull. Tests are the exception (Result<T>.getOrThrow()is permissible to surface unexpected failures as test failures). whenoveriffor branching on type or status.- DI for all external dependencies (Postgres connection, AWS SDK clients,
JsonConfig). @JvmInlinevalue classes for primitive type-safety (ConsumerNamespace,IdempotencyKeyin the idempotency design;HmacandTokenCipherare classes because they hold mutable internal state).
6.2 JsonConfig usage
Section titled “6.2 JsonConfig usage”JsonConfig.standardJson (at cards.arda.common.lib.lang.serialization.Json.kt) is the canonical Json instance for all kotlinx serialization in common-module. Phase 5a’s error_payload projection and the idempotency canonical-JSON helper use it directly. For canonical hashing (where prettyPrint = false is needed), refine via:
private val canonicalJson = JsonConfig.refine { prettyPrint = false }JsonConfig.refine(block) returns a fresh Json instance with the standard configuration applied first, then block applied. See Json.kt:31.
6.3 GitHub Packages publishing
Section titled “6.3 GitHub Packages publishing”common-module publishes to the workspace’s GitHub Packages repository. Each Phase 5a PR’s CHANGELOG entry version bump becomes the published artefact version. Phase 5b’s gradle.properties bump consumes the published version once Phase 5a’s final PR has merged.
Consumer-side authentication uses the workspace GITHUB_TOKEN pattern (per workspace memory: GITHUB_TOKEN=$(gh auth token) npm install for npm; equivalent Gradle pattern for common-module).
6.4 Inherited constraints honoured
Section titled “6.4 Inherited constraints honoured”Phase 5a’s helpers must compose cleanly with constraints set in earlier rounds of the project’s design. The four below are not restated as decisions in this document (they were settled before Phase 5a started) but every helper in this design honours them:
- DQ-204 — STS role chain for outbound AWS calls (decision-log, DQ-R1-020).
TokenCiphermakes no outbound AWS calls:MaterialRegistryis populated by the caller from an ESO-projected JSON map (see § 4.4). common-module stays AWS-SDK-agnostic. AppError classification for STS-class failures (authorization-shaped) is the L1 / L2 caller’s responsibility in Phase 5b; Phase 5a’sApplication.PolicyRejectedis available for that classification but is not wired here. - DQ-206 — Outbound encryption-key handling (decision-log). Plaintext lives only on the call stack.
TokenCipher.encryptandTokenCipher.decryptneither log nor cache plaintext; theMaterialRegistrycaches derived key material keyed by SMversionId, never plaintext. The HKDF derivation runs in-stack per call; no key cache. - DQ-208 — Async-tx boundaries (decision-log). L3 services own transactions; common-module helpers must not open or close transactions on the caller’s behalf. The
IdempotencyStoreFactory’sinTransaction(connection)/inConnection(connection)/withTx(tx)shape (mirroringDatabaseBackedMap) binds the store to the caller’s transaction without owning it. - Cross-Universe rule (information-model-design). Entities owned by different services must not share foreign keys or transactions. The shared
idempotency_recordtable is partitioned byConsumerNamespace(seeidempotency-design.md§ 3.1); the schema has no foreign keys to consumer-owned tables; per-consumer rows never cross service boundaries.
7. PR sequencing
Section titled “7. PR sequencing”Five PRs land in common-module. Per DQ-R1-028, four are Added-only minors and one (the sweep) is Changed major. The sweep lands last so consumers absorb one combined gradle.properties bump.
| # | Deliverable | Source design | Release | Independence |
|---|---|---|---|---|
| 1 | AppError.Application introduction (the three subtypes + reportable() override) | § 1 | Added; 9.2.0 | No predecessors. PR #2 depends on this. |
| 3 | sanitizeHeader (lib/api/headers/) | § 3 | Added; 9.3.0 | No predecessors. Parallelisable with #1, #4, #5. |
| 4 | TokenCipher + Hmac (lib/crypto/) + OpaqueId / S3AssetService migration | § 4 | Added; 9.4.0 | No predecessors. Parallelisable with #1, #3, #5. |
| 5 | Idempotency helpers (lib/runtime/idempotency/) | idempotency-design.md | Added; 9.5.0 | No predecessors. Parallelisable with #1, #3, #4. |
| 2 | Internal.IncompatibleState reclassification sweep | § 2 | Changed; 10.0.0 | Requires PR #1 merged. Lands last. |
PRs #1, #3, #4, #5 are parallelisable in any order. PR #2 is sequenced last so it lands as the major-bump consolidation.
PR-by-PR base: each PR opens off origin/main. There is no integration branch in common-module; the five PRs are independent contributions that merge in their own order. The Phase 5b consumer adoption PR sees the cumulative effect via a single gradle.properties lift to 10.0.0.
8. Risks and mitigations
Section titled “8. Risks and mitigations”| Risk | Mitigation |
|---|---|
Adding AppError.Application as a sealed-class peer of Internal/Invocation causes source-incompatibility for consumers doing exhaustive when over AppError. | This is a known cost of sealed-class additions and the workspace already lives with it. The PR is Added-only because it adds a new branch; consumers update their when clauses opportunistically. (The sweep PR — which reclassifies sites into the new branch — is the Changed/major-bump release.) |
The IncompatibleState sweep mis-classifies a site (kept when it should have moved, or vice versa). | Per-site one-line rationale captured in the PR description; the sweep is reviewable as a checklist; mis-classifications are correctable in a follow-up PR (additive within the same release line). |
TokenCipher auth-tag failures fire spuriously in production due to ESO sync gaps producing brief windows of stale key material. | Operational dashboards (Sentry on Internal.IncompatibleState) surface the failure rate; if non-trivial, the classification revisits with operational data. The genuine “key not present yet” case is the Transient.FailoverFailed path (§ 4.7), not the auth-tag path; spurious ESO-sync windows should not produce auth-tag mismatches (the material itself is intact when projected). |
The Hmac extraction breaks the OpaqueId / S3AssetService behaviour subtly. | Existing tests for both files continue to pass after the migration; known-answer tests in HmacTest.kt confirm byte-equivalent output for canned inputs. |
Phase 5b consumes the new helpers before common-module 10.0.0 is published. | Phase 5b’s implementation merge is gated on the publication. The Phase 5a release sequencing ensures 10.0.0 is the final release of Phase 5a; Phase 5b’s gradle.properties bumps to that exact version. |
9. References
Section titled “9. References”Documents
Section titled “Documents”goal.md— Phase 5a goal, success criteria, repository scope.task-plan.md— six-PR execution plan with worktree strategy.idempotency-design.md— carved-out idempotency-helpers design.- Inherited decisions (DQ-201..208, DQ-012, DQ-R1-019) live in
../../decision-log.md; see § 6.4 above for the four constraints this design honours implicitly. decision-log.md—DQ-R1-027throughDQ-R1-031for this phase;DQ-R1-019for the encryption-envelope source.
Existing common-module references
Section titled “Existing common-module references”cards.arda.common.lib.lang.errors.AppError(AppError.kt) — existing hierarchy that § 1 extends.cards.arda.common.lib.lang.serialization.JsonConfig(Json.kt) — canonicalJsoninstance;refine(block)helper at line 31.cards.arda.common.lib.runtime.observability.HeadersAllowList(HeadersAllowList.kt) — composition counterpart for § 3.cards.arda.common.lib.runtime.observability.OpaqueId(OpaqueId.kt:67) — HmacSHA256 migration target.cards.arda.common.lib.infra.storage.S3AssetService(S3AssetService.kt:143) — second HmacSHA256 migration target.cards.arda.common.lib.persistence.keystore.DatabaseBackedMap(DatabaseBackedMap.kt) — factory pattern (inTransaction/inConnection/withTx) the idempotency factory mirrors.
Workspace standards
Section titled “Workspace standards”kotlin-coding—Result<T>, single-exit,whenoverif, no!!/getOrThrow/getOrNull.plantuml-guide— diagram conventions (validated; named colors; prose summary).
Copyright: © Arda Systems 2025-2026, All rights reserved