Phase 1 -- Implementation Learnings
Substantive insights from Phase 1 implementation that future phases (and future operator walkthroughs) should benefit from. Each learning ties back to a concrete moment in the walkthrough and an artefact that captures it.
L-1: Postmark GET /servers requires both count AND offset
Section titled “L-1: Postmark GET /servers requires both count AND offset”The first run of tools/drift-check.ts against the live Postmark accounts returned HTTP 422 for both PostmarkProd and PostmarkNonProd. The error body revealed {"ErrorCode": 600, "Message": "Parameter 'offset' is required but has been left out"}. The Postmark API rejects the call when either query parameter is missing.
The drift-check code originally sent ?count=1. Corrected to ?count=1&offset=0. The fix is in PR #446 commit 691ba1d; the API observations note (postmark-api-observations.md) on PR #69 was updated in the same patch to record the requirement explicitly.
Take-away: read API help-doc statements like “paginated via count/offset” as “both parameters required” rather than “either or both”. When in doubt, probe a known-good token with curl to capture the actual error body before debugging client code.
L-2: 1Password items with duplicate names break SDK resolution
Section titled “L-2: 1Password items with duplicate names break SDK resolution”When the operator examined the Arda-SystemsOAM vault, two items had the title Postmark-Prod (one with the bearer-token glyph, one with the AWS-style “P” glyph from a Postmark sign-in template). The SDK’s secret-reference resolution returned "more than one item matched the secret reference query" rather than picking one.
The disambiguation is operator-driven: the right policy is one canonical item per typed reference, with duplicates either renamed or deleted. The runbook documents this; the drift-check’s named-failure diagnostic (error resolving secret reference: more than one item matched) makes the condition diagnosable at a glance.
Take-away: the typed op:// reference encodes the contract; if 1Password contains more than one item matching the title, the SDK is correct to refuse rather than pick. Discovered cleanly because drift-check probes every reference declared in platform/one-password.ts.
L-3: 1Password DesktopAuth requires a one-time authorization in the desktop app
Section titled “L-3: 1Password DesktopAuth requires a one-time authorization in the desktop app”The first run of tools/drift-check.ts (after npm install brought in @1password/sdk@0.4.0) returned Denied authorization for SDK client. The integration name (arda-infrastructure-drift-check) was unknown to the desktop app and the SDK’s request was rejected silently.
Resolution: re-running with the 1Password desktop app focused triggered the standard “allow this SDK to access the vault?” prompt; once accepted, subsequent runs worked without further interaction.
Take-away: when a new SDK integration name is introduced, the first run on each operator’s workstation needs a one-time approval. The runbook’s pre-flight checklist should mention this so future operators know to expect (and accept) the prompt.
L-4: The Free Kanban Tool’s Postmark server token belongs in a separate vault
Section titled “L-4: The Free Kanban Tool’s Postmark server token belongs in a separate vault”OP_SERVICE_ACCOUNT_TOKEN is scoped read-only to Arda-SystemsOAM. The original cross-cutting design placed the Free Kanban Tool’s Postmark server token in that same vault, which meant any compromise of OP_SERVICE_ACCOUNT_TOKEN (CI side) or DesktopAuth (operator side) would yield the live sending credential. That contradicted the bounded-blast-radius framing in cross-cutting-design.md § 2.5.
The decision (DQ-R1-007) moves the Free Kanban Tool’s server-token item to a new Arda-CorporateOAM vault. The Free Kanban Tool’s runtime resolves the credential via its own SDK auth path; OP_SERVICE_ACCOUNT_TOKEN does not have read access to the Corporate vault. The token’s blast radius shrinks accordingly.
The new vault was provisioned during the walkthrough; the 1Password item itself is created by Phase 3’s Corporate CLI (Phase A) when the Postmark server is provisioned. Phase 1 does not declare or require the item.
Take-away: blast-radius is a real, structurable concern — if a single auth token grants read on every credential in a single vault, separating credentials by trust tier into separate vaults is cheap and effective. Vault naming convention adopted: Arda-<InstanceGroup>OAM for runtime sending credentials owned by that instance group (matching the existing Arda-DevOAM, Arda-StageOAM, etc.).
L-5: bash -x against secret-resolving code leaks the secret
Section titled “L-5: bash -x against secret-resolving code leaks the secret”Mid-walkthrough, bash -x was used to debug a transient resolution failure on tools/set-gha-repo-secret.sh. The trace echoed each variable assignment to stderr, including the resolved OP_SERVICE_ACCOUNT_TOKEN value (852-character base64 blob starting with ops_eyJ). The trace then landed in the conversation log on the operator’s machine.
Resolution:
- Redacted the local conversation log file (
~/.claude/projects/.../*.jsonl) — 4 occurrences replaced with[REDACTED-OP_SERVICE_ACCOUNT_TOKEN]. - Operator rotated
OP_SERVICE_ACCOUNT_TOKENin the 1Password Developer-Tools page; updated theIAC-SCRIPTS Service Account Token1Password item with the new value; re-pushed the GHA secret usingtools/set-gha-repo-secret.sh.
Take-away: never use bash -x against scripts that handle secrets. Use targeted set -x blocks that exclude the secret-bearing lines, or rely on the script’s own structured logging (which never echoes resolved values). Adopted as a session convention.
L-6: Phase 1’s typed surface should not include items only Phase 3 creates
Section titled “L-6: Phase 1’s typed surface should not include items only Phase 3 creates”FREE_KANBAN_POSTMARK_ITEM was originally declared in platform/one-password.ts as one of four typed references. The drift-check probed all four. Since Phase 1 doesn’t create the Free Kanban item (Phase 3 does, when the Corporate CLI provisions the Postmark server), the probe failed with “no item matched the secret reference query”.
The fix removes the typed reference from Phase 1. Phase 3 reintroduces it with the new Arda-CorporateOAM vault per DQ-R1-007. Decision rationale: a typed reference exists when the resource exists; placing the constant ahead of the resource creates a phase-spanning dependency that drift-check can’t satisfy.
Take-away: the typed surface should grow phase by phase, in lockstep with the resources each phase creates. “Forward-declare in the earliest phase” is a tempting pattern but introduces premature coupling and noise in the verification surface.
L-7: Postmark domain verification is a separate dance from account creation
Section titled “L-7: Postmark domain verification is a separate dance from account creation”The Postmark help article on domain verification (https://postmarkapp.com/support/article/how-do-i-verify-a-domain) describes a per-sending-domain workflow involving DKIM TXT, Return-Path CNAME, optional DMARC, and a manual “Verify” click in the Postmark Console. Until at least one sending domain is verified, a new Postmark account is effectively in sandbox mode for end-recipient delivery.
Phase 1 implements account creation + token capture; it does not address sending-domain verification. The verification belongs to Phase 3 (Corporate’s arda.ardamails.com parent zone) and Phase 4 (per-partition sub-zones).
The Phase 1 operator runbook now carries a “Looking Ahead” section pointing at the Postmark Console’s signature_domains page. The Phase 3 stub at 3-corporate-updates/operator-domain-verification-checklist.md captures the verification dance for future expansion.
Take-away: account creation is necessary but not sufficient for live mail delivery. Operators need to see this so they can prepare DNS access and Postmark Console access before the Phase 3 / Phase 4 deploys.
L-8: The 2FA toggle is per-user, not per-account, and lives on the user profile
Section titled “L-8: The 2FA toggle is per-user, not per-account, and lives on the user profile”Step 3.2 of the runbook (“enable 2FA on the owner mailbox” for PostmarkNonProd) hit a snag: the https://account.postmarkapp.com/account page (which exists and shows account-level settings) does not expose a 2FA toggle. The user-list view at /account/users shows each user’s 2FA status as Off / On — but it’s a status indicator, not an enable surface.
The 2FA enable action lives on the user’s own profile, reached via the top-right user-avatar menu in the Postmark Console (separate URL, browser-session-dependent). For PostmarkProd, this URL was already discovered and entered in the runbook earlier in the walkthrough. For PostmarkNonProd, the URL was not located on walkthrough date; the runbook records this in the troubleshooting table as a known gap.
REQ-EXT-003 is therefore recorded as Partial in the sign-off table — account exists, on Platform plan, account-level token captured and resolves, drift-check probe returns HTTP 200; the only outstanding item is the user-side 2FA toggle.
Take-away: SaaS UI patterns for 2FA differ from account settings; check both the account settings and the user-profile flow when documenting auth setup. Document the discovered URL when it’s found and a future walkthrough can update the runbook.
L-9: GitHub Actions blocks workflow_dispatch for files not yet on the default branch
Section titled “L-9: GitHub Actions blocks workflow_dispatch for files not yet on the default branch”Pre-merge attempts to dispatch the new external-resources-drift.yml workflow against --ref jmpicnic/email-integration-phase-1-infra returned HTTP 404: workflow not found on the default branch. GitHub registers workflows when they appear on the repository’s default branch; until then, workflow_dispatch cannot find them.
Resolution: defer T-C5 (first workflow run) until PR #446 merges to main. The post-merge step is documented in the runbook’s new “Post-Merge: First Drift-Workflow Run (T-C5) and GHA Secret Audit (T-C7)” section, with the exact command sequence.
Take-away: any new GitHub Actions workflow has a chicken-and-egg situation pre-merge. Plan for a post-merge operator action whenever a phase introduces a workflow file. Live first-run validation is a post-merge gate, not a pre-merge gate.
L-10: gh secret list audit surfaced a leftover from the prior implementation
Section titled “L-10: gh secret list audit surfaced a leftover from the prior implementation”T-C7 ran gh -R Arda-cards/infrastructure secret list | grep -i postmark to verify zero Postmark-token-named GHA secrets (V-CI-103 / REQ-CI-003). The audit returned POSTMARK_NONPROD_ACCOUNT_TOKEN set on 2026-04-30, leftover from the prior Phase-0 implementation. The rev1 design moved away from Postmark tokens as GHA secrets entirely; the leftover violated the design.
Resolution: gh secret delete POSTMARK_NONPROD_ACCOUNT_TOKEN. Re-audit returned zero matches.
Take-away: pre-existing GHA secrets are easy to forget when redesigning. A one-shot grep audit at end of each phase that touches CI catches this cleanly. Worth replicating in Phase 4 (when partition CI surfaces evolve).
Cross-cutting note: the project-level CLAUDE.md per-phase worktree convention
Section titled “Cross-cutting note: the project-level CLAUDE.md per-phase worktree convention”Mid-Phase-2 planning, the worktree convention switched from a single flat layout to per-phase subdirectories (projects/email-integration-worktrees/phase-N/<repo>/). The change is documented in the project-level CLAUDE.md. It enables multiple phases to coexist locally without branch-switching contention — e.g., this Phase 1 byproducts file is being written on phase-2-docs while Phase 1 PR #69 awaits review on the phase-1 worktree. The convention scales naturally to Phase 3, 4, 5a, 5b without further changes.
What worked well
Section titled “What worked well”- Drift-check as a smoke test, not just a CI workflow. Running it locally during the walkthrough caught L-1 (Postmark URL bug), L-2 (duplicate items), L-6 (FREE_KANBAN typed reference) immediately. The “dual-purpose” framing in
DQ-R1-002paid off. <operator: confirm ...>placeholders in the runbook draft. Six placeholders surfaced during authoring; all six were filled in by the operator in one bulk message during the walkthrough. The pattern made the operator-vs-author handoff explicit and bounded.- Per-task STOP points in the spec. Implementers and the operator could pause-and-confirm at each task boundary rather than running through end-to-end and discovering issues only at the end.
- The CFN stack-name preservation comment + grep test (V-IAC-003). Phase 2 leans heavily on this pattern, but Phase 1’s
tools/drift-check.tsadopted a similar discipline (the URLcount=1&offset=0is asserted intools/drift-check.test.ts).
Copyright: © Arda Systems 2025-2026, All rights reserved