Skip to content

Token Cipher Capability

The Token Cipher capability lets an L3 service encrypt and decrypt small secrets stored at rest — primarily per-tenant tokens whose plaintext must never leave L3 — under a versioned envelope that survives both algorithm rotation and key-material rotation without re-encrypting historical rows. It ships as a self-contained unit in cards.arda.common.lib.crypto, with no AWS SDK dependency: key material is supplied to the cipher by the caller from an ESO-projected source.

The capability provides three abstractions:

A1 — Encrypt to a versioned envelope. Given a UTF-8 info constant (per-purpose HKDF salt — different consumers derive non-overlapping keys from the same source material), a MaterialRegistry, and a currentVersionId: UUID, produce an envelope string a{N}.k{SM-VERSION-ID}:<base64url-payload> over arbitrary plaintext bytes (base64url, no padding — see the on-disk shape below). The envelope captures both the algorithm version and the source-material version, so historical envelopes remain decryptable when either axis rotates.

A2 — Decrypt a versioned envelope. Given an envelope produced under any prior algorithm version that is still implemented and any source-material version still present in the caller’s registry, recover the original plaintext bytes. Decrypt is read-only against the registry.

A3 — HmacSHA256 wrapper. A small helper (Hmac) wraps javax.crypto.Mac with a uniform Result<T>-shaped API. The cipher uses it internally for HKDF derivation; two pre-existing JDK-Mac call sites in common-module (OpaqueId, S3AssetService) migrate to it for byte-identical behaviour.

The capability has four invariants the L3 service can rely on:

  • No application-side calls to AWS Secrets Manager. TokenCipher consults only the in-memory MaterialRegistry. The caller (not the cipher) populates the registry from a single ESO-projected JSON map of every live key-material version, and may mutate the registry at runtime in response to ESO refresh events.
  • Plaintext never logged or cached. Per DQ-206, the cipher logs nothing about plaintext, derives the AES key in-stack per call, and caches no derived keys. The registry holds source material (the 64-byte SM input), never derived keys.
  • Auth-tag failure is bug-class, not application-recoverable. AES-GCM tag-verification failure on decrypt indicates storage corruption, key-material desync, or active tampering. Surfaced as AppError.Internal.IncompatibleState so it pages on-call.
  • Unknown versionId on decrypt is bounded-transient. Surfaced as AppError.Transient.FailoverFailed, the existing transient-retry layer (e.g. an L3 service’s coroutine retry, a webhook re-delivery, a job re-enqueue) absorbs the bounded ESO propagation lag between AWS Secrets Manager and the pod’s projection.

The on-disk envelope shape is a{N}.k{SM-VERSION-ID}:<base64url-payload>:

  • a{N} — algorithm version, code-indexed; v1 ships only a1 (AES-256-GCM + HKDF-SHA256). Never retired; bumping N requires a release.
  • k{SM-VERSION-ID} — AWS Secrets Manager versionId (UUID) of the source material used at write time. Runtime-indexed via the MaterialRegistry.
  • <base64url-payload> — base64url-encoded IV(12) || ciphertext || auth-tag(16).

Malformed envelope shapes — missing :, missing ., missing k prefix, invalid UUID, invalid base64, too-short ciphertext — surface as AppError.Invocation.GeneralValidation. They are caller-input errors, not corruption.

The capability is one public package (lib/crypto/) containing three caller-facing types and two internal types behind a sealed-interface dispatch on a{N}.

PlantUML diagram

Non-generic class with a private constructor and companion operator fun invoke(info, materials, currentVersionId): Result<TokenCipher> factory. The factory validates that info is non-blank and that currentVersionId is present in the supplied MaterialRegistry; otherwise it returns Result.failure(AppError.Invocation.GeneralValidation). Once constructed, the cipher exposes encrypt(plaintext) and decrypt(envelope); both are pure on the registry’s current contents — neither mutates the registry, neither calls out to any external system.

Thread-safe registry mapping UUID (SM versionId) to 64-byte source material, backed by a ConcurrentHashMap. of(initial) rejects an empty map and any value whose size is not 64 bytes; add(versionId, material) enforces the same length invariant on subsequent additions. get returns a defensive copy. The caller populates the registry at construction time with every live key-material version and may mutate it at runtime in response to ESO refresh events; TokenCipher itself is read-only against it.

Thin wrapper over javax.crypto.Mac for HmacSHA256 with a Result<T> API. Hmac.sha256(key) validates a non-empty key and returns Result<Hmac>; mac(input) returns the 32-byte tag as Result<ByteArray>. Used internally by TokenCipher for the HKDF Extract + Expand steps, and externally by OpaqueId and S3AssetService (migrated from inline Mac.getInstance("HmacSHA256") for DRY and consistent error handling).

FE-4: EnvelopeAlgorithm (internal sealed interface)

Section titled “FE-4: EnvelopeAlgorithm (internal sealed interface)”

Dispatch type for the a{N} axis. Each implementation declares its version string and provides encrypt(derivedKey, plaintext) / decrypt(derivedKey, ciphertext) over a 32-byte derived key. Adding a2 is a new object EnvelopeAlgorithmA2 : EnvelopeAlgorithm plus a single when arm in TokenCipher.decrypt — encrypt continues to use the current algorithm; decrypt remains backwards-compatible for as long as the old algorithm object is present in the package.

FE-5: EnvelopeAlgorithmA1 (internal object)

Section titled “FE-5: EnvelopeAlgorithmA1 (internal object)”

V1 implementation: AES-256-GCM with a 12-byte IV (cryptographically random per encrypt), 16-byte (128-bit) authentication tag, and a 32-byte derived key. Validates derived-key length on both encrypt and decrypt. Raw output layout is IV(12) || ciphertext || tag(16). Auth-tag verification failure is mapped to AppError.Internal.IncompatibleState; too-short ciphertext (below the minimum IV + tag length) is mapped to AppError.Invocation.GeneralValidation (malformed shape — the cipher hasn’t been invoked yet).

3.1 Encrypt — current version, fresh IV per call

Section titled “3.1 Encrypt — current version, fresh IV per call”

PlantUML diagram

3.2 Decrypt — registry-resolved, no external call

Section titled “3.2 Decrypt — registry-resolved, no external call”

PlantUML diagram

Until the operator decides to introduce a2, every envelope produced by the system carries the a1. prefix. When a2 lands as a new EnvelopeAlgorithm object:

  • Encrypt switches to a2 on the next deploy (the cipher uses the latest EnvelopeAlgorithm for new writes).
  • Decrypt routes by the a{N} axis: a1 envelopes continue to decrypt under EnvelopeAlgorithmA1; a2 envelopes decrypt under EnvelopeAlgorithmA2.
  • No re-encryption pass is required. Historical envelopes remain readable for as long as their algorithm object remains in the package; retiring a1 requires a coordinated data drain (out of scope for the cipher itself).

The deployed SM secret holds a JSON map of every live key-material version, projected to the pod as a single ESO mount. On rotation:

  1. Operator (or future Rotation Lambda — tracked separately, see PDEV-659) writes a new version to the JSON map and updates the current-version pointer.
  2. ESO projects the refreshed map into the pod’s mounted secret.
  3. The caller reads the refreshed map (file watcher, scheduled re-read, or pod restart — the choreography is the caller’s choice) and calls MaterialRegistry.add(versionId, material) for new entries and reconstructs TokenCipher if currentVersionId changes.
  4. New encrypts use the new current; old envelopes continue to decrypt because their materials remain in the registry until the operator explicitly drops them.

Between step 1 and step 2 above, AWS Secrets Manager holds the new version but the pod’s mount has not refreshed yet. If an L3 service tries to decrypt an envelope that references the new versionId, TokenCipher.decrypt returns Result.failure(AppError.Transient.FailoverFailed(...)) whose cause message names the missing version. The L3 caller’s existing transient-retry layer — Postmark webhook retries, outbound idempotency replays, L4 client retries — fires after timescales that exceed ESO’s reconciliation interval; the next attempt finds the registry refreshed and decrypt succeeds. FailoverFailed is the least-bad fit semantically (existing sealed Transient hierarchy with Aurora-failover-shaped names); a more specific subtype is intentionally not added to avoid the breaking change of extending the sealed hierarchy.

Condition on decryptSurfaced asBehaviour
Auth-tag mismatchAppError.Internal.IncompatibleStateBug-class. Pages on-call. Indicates corruption / desync / tampering.
Unknown versionId (registry miss)AppError.Transient.FailoverFailedBounded transient. Caller’s existing retry absorbs.
Malformed envelope (missing : / . / k, bad UUID, bad base64, too-short ciphertext)AppError.Invocation.GeneralValidationCaller-input error. Surfaces to the caller of the caller.
Unknown algorithm a{N}AppError.Invocation.GeneralValidationCaller-input error (envelope carries an a{N} the deployment does not implement).
Test scopeWhat it asserts
TokenCipherTestFactory: blank infoGeneralValidation; missing currentVersionIdGeneralValidation. Round-trips at 0/1/16/1024/65536 bytes. Material-version transition: envelope produced under one version decrypts when that version remains in the registry alongside a newer one. Auth-tag failure on a tampered base64 byte → IncompatibleState. Unknown versionId on decrypt → Transient.FailoverFailed whose message names the missing UUID. Malformed shape (missing :, missing ., invalid UUID, empty input) → GeneralValidation.
MaterialRegistryTestof rejects empty map, rejects non-64-byte values, accepts valid maps. add rejects non-64-byte material. get returns defensive copies. contains reports membership accurately.
HmacTestEmpty-key rejection. RFC 4231 known-answer vector for HmacSHA256. Round-trips and deterministic output for fixed inputs.
OpaqueIdTest (existing)After migration to Hmac.sha256, byte-identical output confirmed against canned inputs.
S3AssetServiceTest (existing)After migration to Hmac.sha256, behaviour preserved.

All paths relative to Arda-cards/common-module/.

FileRole
lib/src/main/kotlin/cards/arda/common/lib/crypto/TokenCipher.ktPublic class + companion operator fun invoke(info, materials, currentVersionId) factory; envelope parsing; HKDF derivation; dispatch into EnvelopeAlgorithmA1.
lib/src/main/kotlin/cards/arda/common/lib/crypto/MaterialRegistry.ktThread-safe versionId → 64-byte material store; length-enforced of / add.
lib/src/main/kotlin/cards/arda/common/lib/crypto/Hmac.ktHmacSHA256 wrapper over javax.crypto.Mac with Result<T> API.
lib/src/main/kotlin/cards/arda/common/lib/crypto/EnvelopeAlgorithm.ktInternal sealed interface for the a{N} axis.
lib/src/main/kotlin/cards/arda/common/lib/crypto/EnvelopeAlgorithmA1.ktInternal object; AES-256-GCM with 12-byte IV + 128-bit tag; auth-tag failure → IncompatibleState, too-short ciphertext → GeneralValidation.
lib/src/main/kotlin/cards/arda/common/lib/runtime/observability/OpaqueId.ktModified — inline Mac.getInstance("HmacSHA256") replaced by Hmac.sha256(...); byte-identical output.
lib/src/main/kotlin/cards/arda/common/lib/infra/storage/S3AssetService.ktModified — inline Mac.getInstance("HmacSHA256") replaced by Hmac.sha256(...); byte-identical output.
FileScope
lib/src/test/kotlin/cards/arda/common/lib/crypto/TokenCipherTest.ktFactory invariants; encrypt/decrypt round-trips across sizes; material-version transition; tampered envelope; unknown-versionId; malformed-shape variants.
lib/src/test/kotlin/cards/arda/common/lib/crypto/MaterialRegistryTest.ktConstructor + add length invariant; get defensive copy; contains.
lib/src/test/kotlin/cards/arda/common/lib/crypto/HmacTest.ktEmpty-key rejection; RFC 4231 KAT; deterministic output.
ReleaseSubjectPR
common-module 11.2.0TokenCipher + Hmac + MaterialRegistry in lib/crypto/; OpaqueId and S3AssetService migrated to Hmac.#183

Caller responsibilities (not shipped by common-module)

Section titled “Caller responsibilities (not shipped by common-module)”

The caller owns:

  • Key delivery. ESO ExternalSecret projecting a single JSON map of every live versionId → material into the pod.
  • Registry population. Parsing the projected map and calling MaterialRegistry.of(initial) (or add for runtime refresh).
  • Refresh choreography. File watch / scheduled re-read / pod-restart — whichever fits the deployment.
  • info constant. A per-purpose UTF-8 string (e.g. "arda.email.serverToken.a1") so different consumers derive non-overlapping AES keys from the same source material.
  • Failure handling. Mapping AppError.Transient.FailoverFailed to the consumer’s transient-retry layer; surfacing IncompatibleState to on-call alerting.

Rotation tooling — JSON-map schema, operator rotation script, AWS SM Rotation Lambda, and the disposition of the deployed EmailEncryptionKeyFallbackRole — is tracked separately as PDEV-659.