Skip to content

Runbook: Email Encryption-Key Rotation

Author: Miguel Pinilla Last Verified: not yet — first rotation has not been performed in production Applies to: all four active partitions (dev, stage, demo, prod)

This runbook is the operator-facing procedure for rotating the EmailEncryptionKey AWS Secrets Manager secret that lives in each Application Runtime partition. The encryption key is the symmetric value the operations component’s TokenCipher uses to encrypt per-tenant Postmark server tokens before persisting them in the partition database (see DQ-R1-019 in the Email Integration decision log).

For the design of the two-axis envelope (a{N}.k{SM-VERSION-ID}), the dual ESO mounts (AWSCURRENT + AWSPREVIOUS), and the lazy + coroutine migration model, see DQ-R1-019. This runbook is the operator’s hands-on procedure; it assumes you already understand the design.

Rotate the encryption key when any of the following hold:

  • A scheduled rotation interval has elapsed (no formal interval is set today; expected to be every 90 days once the migration model is exercised at scale).
  • A suspected compromise of the key value (key material exposed in a log, an operator’s laptop disk image leaked, etc.).
  • A compliance-driven rotation requirement from an auditor or customer contract.

Routine “nothing’s wrong, just hygiene” rotations are expected once Phase 5b ships and the operations component is processing real tenant data; pre-Phase-5b, the key has no consumers yet so rotation is a smoke test only.

aws secretsmanager update-secret-version-stage (or put-secret-value with stage labels) creates a new version of the SM secret and promotes it to AWSCURRENT; the previous version inherits the AWSPREVIOUS label automatically. Both labels are projected into the partition’s Kubernetes namespace by the dual ExternalSecret resources defined in the operations Helm chart, so all pods see both the current and previous key material with no restart required.

The operations component’s TokenCipher:

  • Encrypts new writes with AWSCURRENT and tags them a{N}.k{<new-versionId>}.
  • Decrypts reads by parsing the k{...} prefix and selecting the key version named there. If the version is still mounted (AWSCURRENT or AWSPREVIOUS), the read is in-memory; otherwise the cipher falls through to the EmailEncryptionKeyFallbackRole-gated SM SDK call to fetch the older version (rare, costed in latency).
  • A per-pod coroutine background-mops-up older versions by re-encrypting reads against AWSCURRENT before persisting — eventually draining traffic to older versions.

The full migration model is documented in roadmap/completed/email-integration/4-runtime-platform-updates/design/email-server-key-encryption.md.

Before running the rotation:

  • Read access to the partition’s AWS account (aws sts get-caller-identity --profile <profile>). Required profile: Alpha002-Admin for dev / stage, Admin-Alpha1 for demo / prod.
  • Verify the SM secret exists: aws secretsmanager describe-secret --secret-id {infrastructure}-{partition}-I-EmailEncryptionKey --profile <profile> returns CREATED.
  • Verify the current AWSCURRENT version: aws secretsmanager list-secret-version-ids --secret-id {infrastructure}-{partition}-I-EmailEncryptionKey --profile <profile> shows at least one version with ["AWSCURRENT"] in VersionStages.
  • (Recommended) Note the current AWSCURRENT VersionId — useful for the post-rotation verification step.
  • Phase 5b’s operations component is either:
    • not yet deployed (rotation is a no-op smoke test), or
    • deployed with dual ExternalSecret mounts for AWSCURRENT + AWSPREVIOUS active (the only safe state for a rotation that affects live traffic).

Generate a fresh 64-character random value matching the original GenerateSecretString shape (SecretStringTemplate: "{}", GenerateStringKey: "key", PasswordLength: 64, ExcludePunctuation: true):

Terminal window
NEW_KEY=$(openssl rand -base64 64 | tr -d '+/=' | head -c 64)
NEW_PAYLOAD=$(jq -n --arg key "$NEW_KEY" '{key: $key}')
echo "$NEW_PAYLOAD" | jq '.'

The payload structure must match what CDK originally provisioned — a JSON object with a single key field — otherwise the operations component’s deserialiser will reject the new version.

2. Push the new version to AWS Secrets Manager

Section titled “2. Push the new version to AWS Secrets Manager”
Terminal window
aws secretsmanager put-secret-value \
--profile <profile> \
--secret-id {infrastructure}-{partition}-I-EmailEncryptionKey \
--secret-string "$NEW_PAYLOAD"

Capture the returned VersionId; it becomes the new AWSCURRENT. The previous AWSCURRENT automatically receives the AWSPREVIOUS label (and the previous AWSPREVIOUS loses it, falling back to SDK-only retrieval through the EmailEncryptionKeyFallbackRole chain).

If the operations component is deployed:

Terminal window
# Force-reconcile both ExternalSecret resources so the new version reaches the pods promptly
kubectl annotate externalsecret email-encryption-key-current \
--namespace <partition-namespace> \
force-sync=$(date +%s) --overwrite
kubectl annotate externalsecret email-encryption-key-previous \
--namespace <partition-namespace> \
force-sync=$(date +%s) --overwrite
# Confirm both projected Kubernetes Secrets carry the expected versionIds
kubectl get secret email-encryption-key-current --namespace <partition-namespace> -o yaml | yq .data.key | base64 -d | sha256sum
kubectl get secret email-encryption-key-previous --namespace <partition-namespace> -o yaml | yq .data.key | base64 -d | sha256sum

The current digest should differ from the digest you captured pre-rotation; the previous digest should match what current was before.

If the operations component is deployed, exercise an end-to-end write + read for a non-customer-facing tenant in the partition (a test tenant maintained for this purpose). Confirm the new write’s envelope prefix is a{N}.k{<new-versionId>} and that subsequent reads succeed without invoking the SDK fallback path (which would log at WARN).

Append a row to the partition’s rotation log (location TBD as part of Run-7 docs — likely the partition’s verification sign-off table or a dedicated rotations.md):

DatePartitionOperatorPrevious VersionIdNew VersionIdReason

If the rotation introduces a problem (decryption failures on previously-written tokens, ESO not picking up the new version, etc.):

  1. Do NOT push a third version. AWS Secrets Manager retains the previous version under the AWSPREVIOUS label automatically; promoting it back is what you want.
  2. Promote AWSPREVIOUS back to AWSCURRENT:
    Terminal window
    PREV_VERSION_ID=$(aws secretsmanager list-secret-version-ids \
    --secret-id {infrastructure}-{partition}-I-EmailEncryptionKey \
    --query "Versions[?VersionStages[?contains(@, 'AWSPREVIOUS')]].VersionId | [0]" \
    --output text \
    --profile <profile>)
    BAD_VERSION_ID=$(aws secretsmanager list-secret-version-ids \
    --secret-id {infrastructure}-{partition}-I-EmailEncryptionKey \
    --query "Versions[?VersionStages[?contains(@, 'AWSCURRENT')]].VersionId | [0]" \
    --output text \
    --profile <profile>)
    aws secretsmanager update-secret-version-stage \
    --secret-id {infrastructure}-{partition}-I-EmailEncryptionKey \
    --version-stage AWSCURRENT \
    --move-to-version-id "$PREV_VERSION_ID" \
    --remove-from-version-id "$BAD_VERSION_ID" \
    --profile <profile>
    The --remove-from-version-id flag is required: AWS Secrets Manager refuses to move AWSCURRENT to $PREV_VERSION_ID while the label is still attached to $BAD_VERSION_ID, so the call has to name both ends of the move explicitly.
  3. Force-reconcile the ExternalSecret resources again. Confirm the digests have swapped.
  4. Diagnose the root cause of the original rotation failure before re-attempting.

The bad-version is now AWSPREVIOUS and remains readable through the fallback role until it ages out (90 days by default, configurable on the SM secret).

AWS Secrets Manager retains all versions indefinitely by default. To bound the SM secret’s size and reduce blast radius, configure a retention policy:

Terminal window
aws secretsmanager update-secret \
--profile <profile> \
--secret-id {infrastructure}-{partition}-I-EmailEncryptionKey \
--description "AWSCURRENT + AWSPREVIOUS only; older versions purged after 30 days"

(The description here is documentation only; AWS does not enforce retention from the description. A proper retention policy requires AWS Backup Vault Lock or a scheduled cleanup Lambda — out of scope for this runbook.)

  • DQ-R1-019 (decision log) — the two-axis envelope and ESO dual-mount design.
  • DQ-R1-024 (decision log) — why the secret is provisioned via CFN-native GenerateSecretString rather than a Custom Resource.
  • Partition Mail Topology — where the encryption-key secret sits in the broader Phase-4 surface.
  • Email Server Key Encryption design — full envelope + migration model.
  • partition-email.ts — the CDK stack that provisions the secret.

Copyright: (c) Arda Systems 2025-2026, All rights reserved