Phase 1 -- Implementation Alternatives

Decisions reached during Phase 1 implementation, with the alternatives considered and the trade-offs weighed. Decisions persistent enough to warrant a DQ-R1-NNN log entry are also recorded in ../../decision-log.md; this document captures the ones that didn’t earn a formal entry plus the implementation context for the ones that did.

A-1: GHA-secret tool — TypeScript+Octokit+libsodium vs shell+`gh`

Choice: shell (tools/set-gha-repo-secret.sh).

Considered:

Option	LOC	Dependencies	Tests
A Full port from the prior Phase-0 `scripts/gha-secrets/` implementation — TS CLI + `@octokit/rest` + `libsodium-wrappers` + shared `scripts/lib/`	~1370	New: `@octokit/rest`, `libsodium-wrappers` (binary-distributed)	Jest, mocked-mode + lib tests
B Slim TS port — single-file `tools/gha-secret.ts`, subprocess `gh` for the GitHub side, `@1password/sdk` for op	~250	Reuses `@1password/sdk` (already a dep)	Jest, mocked spawn + mocked SDK
C Shell — `tools/set-gha-repo-secret.sh`, `op read \| gh secret set`	~130	Reuses already-required `op` and `gh` CLIs	Manual dry-run + bats (skipped)

Trade-off: B and C achieve the same operator semantics with very different code surfaces. C wins on:

LOC (5x smaller than A; 2x smaller than B).
Zero new dependencies.
Pattern parity with the existing tools/sync-secrets-from-1password.sh (same op read | gh secret set body).
Operator-friendliness: an operator already comfortable with op and gh can read the script and confirm what it does in 60 seconds.

C trades off:

No type checking on the args.
Tests are manual (--dry-run smoke test); a bats suite was an option but not added.

Why C and not A: the spec § 4 explicitly carves out “tools/gha-secret.ts utility itself is assumed in place” as out of scope. Adopting A would have been a 1300-line scope expansion to bring in lib/, four new dependencies, and a bunch of tests for code we have a working shell substitute for. The cost-benefit didn’t justify it.

Why C and not B: B would have introduced its own justification (“bring drift-check’s already-imported @1password/sdk to a sibling helper”). But the GitHub side still needs Octokit+libsodium or the gh subprocess; either way C’s “use what’s already on the operator’s PATH” is cleaner.

A-2: 2FA URL discovery — placeholder pattern vs API-based discovery

Choice: <operator: confirm exact step> placeholder pattern in the runbook draft, with the operator filling in URLs during the walkthrough; troubleshooting table entry for the dead-end case.

Considered:

Option	Mechanism
A Manual placeholder	Author drafts the runbook with `<operator: confirm exact step>` markers; operator fills in during walkthrough; commits push the actual URLs.
B Postmark API discovery script	A small script reads the Postmark account-token, queries `GET /users` and surfaces the user-profile URL.
C Hard-code URLs from the API observations note	Use canonical URLs that match the Postmark Developer Documentation.

Trade-off: B catches the user-profile-URL discovery as a script-runnable step (would have helped with the PostmarkNonProd dead-end). A is faster to author and lower-overhead per walkthrough. C creates fragility — Postmark UI URLs are not part of their public API contract; they can change.

Why A: the placeholder pattern is generic; it works for any UI URL that depends on the operator’s account / role / browser session. The trade-off was caught: PostmarkProd’s URL was found during walkthrough; PostmarkNonProd’s was not. The runbook now records the dead-end in the troubleshooting table — a future walkthrough fills it in.

B is recorded as suggestion S-4. Worth considering if a future operator hits the same dead-end.

A-3: Free Kanban vault location — `Arda-SystemsOAM` vs `Arda-CorporateOAM`

Choice: Arda-CorporateOAM (per DQ-R1-007).

Considered:

Option	Vault	Trade-off
A	`Arda-SystemsOAM` (original cross-cutting design)	One vault to manage. Free Kanban Tool’s runtime credential is reachable by `OP_SERVICE_ACCOUNT_TOKEN`.
B	`Arda-CorporateOAM` (separate vault)	One additional vault. Free Kanban server token isolated from the OAM-tier credentials; CI / `OP_SERVICE_ACCOUNT_TOKEN` compromise does not yield it.

Why B: bounded blast radius is structurable; the cost (one extra vault) is low; the benefit (a CI-side compromise of OP_SERVICE_ACCOUNT_TOKEN does not yield the runtime sending credential) is meaningful and matches the design intent in cross-cutting-design.md § 2.5. Future vault-naming convention (Arda-<InstanceGroup>OAM) extends naturally.

The full text and consequences are in DQ-R1-007.

A-4: Drift-check probe of `FREE_KANBAN_POSTMARK_ITEM` — include vs exclude

Choice: exclude. Remove the typed reference from platform/one-password.ts for Phase 1.

Considered:

Option	Phase 1 includes the typed reference?	Drift-check probes it?
A	Yes	Yes — fails with “no item matched” until Phase 3 creates the item
B	Yes	No — phase-bucketing in drift-check filters items by their `phase` annotation
C	No (chosen)	N/A — the constant doesn’t exist in Phase 1
D	Yes	Yes, but allow “missing-by-design” results to be informational (warnings, not failures)

Why C: the typed surface should grow phase by phase. A typed reference exists when the resource exists. Forward-declaring creates noise in verification and a maintenance burden (anyone reading the code wonders if the item should exist now). B and D both keep the constant alive but require drift-check complexity that pays off only after multiple typed surfaces overlap — not yet. C is the simplest; Phase 3 reintroduces the constant with the new vault.

C is recorded inside DQ-R1-007 as a consequence; D is recorded as suggestion S-1 (revisit during Phase 3 planning if the phase-bucketing concern comes back).

A-5: Drift workflow filename — describe-the-invariant vs describe-the-phase

Choice: external-resources-drift.yml (per DQ-R1-001).

Considered:

Option	Pros	Cons
A	`phase-1-drift.yml`	Anchors to phase scope.
B	`op-drift.yml`	Short.
C	`external-resources-drift.yml`	Describes the asserted invariant (“external resources”). Stable across project phases.

Why C: the workflow asserts an invariant that outlives Phase 1; the filename should describe the invariant, not the phase that introduced it. Stable convention for future drift workflows (e.g., corporate-resources-drift.yml, partition-resources-drift.yml).

A-6: Drift-check module location — `scripts/` vs `tools/`

Choice: tools/drift-check.ts (per DQ-R1-002).

Considered:

Option	Pros
A	`infrastructure/scripts/drift-check.ts` (alongside legacy script utilities; matches the prior implementation’s layout).
B	`infrastructure/tools/drift-check.ts` (operator-runnable + CI-runnable; modern convention).

Why B: the module is operator-runnable in addition to CI-runnable. Better matches the dual-purpose nature than scripts/ (which the prior implementation largely used for one-shot orchestrators). The tools/ convention is forward-compatible with the eventual tools/gha-secret.ts migration (out of scope of this project but on the trajectory).

A-7: Operator runbook sign-off mechanism — code block vs frontmatter vs Markdown table

Choice: designated ## Operator Sign-off Markdown section with a small table (per DQ-R1-003).

Considered:

Option	Pros	Cons
A	Code block	Easy to read inline.
B	YAML frontmatter field	Machine-parseable.
C	Markdown table	Human-readable, diff-friendly under git, no new tooling.

Why C: the runbook is a human-driven artefact; the sign-off needs to be operator-fillable mid-walkthrough. The Markdown table is the natural choice; YAML frontmatter would have been a foreign convention.

A-8: HUMAN-STEPS parser disposition — delete in Phase 1 vs defer

Choice: delete in Phase 1 (per DQ-R1-004).

Considered: defer the parser-code deletion to a later phase.

Why “delete in Phase 1”: REQ-OPS-004 states no parser gate remains; the runbook in documentation/ is the canonical operator artefact. Deferring deletion would create a transient state where the parser still exists but isn’t authoritative — ambiguous.

In practice the deletion was a no-op: the parser code never made it onto main (PR #445 closed without merging). T-C6 in the task plan therefore had nothing to delete.

A-9: API-surface freshness cadence — annual vs per-update vs failure-triggered

Choice: per-failure (with a soft annual review backstop) — per DQ-R1-005.

Considered:

Option	Trigger
A	Annually
B	At each Postmark major-update post
C	First drift-test failure attributable to surface drift, plus annual review

Why C: scheduled-only would let regressions sit unnoticed for up to a year; per-update would create unnecessary documentation churn since most Postmark updates do not affect the small surface Arda uses. The combination is the right balance.

Phase 1 -- Implementation Alternatives

A-1: GHA-secret tool — TypeScript+Octokit+libsodium vs shell+gh

A-2: 2FA URL discovery — placeholder pattern vs API-based discovery

A-3: Free Kanban vault location — Arda-SystemsOAM vs Arda-CorporateOAM

A-4: Drift-check probe of FREE_KANBAN_POSTMARK_ITEM — include vs exclude

A-5: Drift workflow filename — describe-the-invariant vs describe-the-phase

A-6: Drift-check module location — scripts/ vs tools/

A-7: Operator runbook sign-off mechanism — code block vs frontmatter vs Markdown table

A-8: HUMAN-STEPS parser disposition — delete in Phase 1 vs defer

A-9: API-surface freshness cadence — annual vs per-update vs failure-triggered

A-1: GHA-secret tool — TypeScript+Octokit+libsodium vs shell+`gh`

A-3: Free Kanban vault location — `Arda-SystemsOAM` vs `Arda-CorporateOAM`

A-4: Drift-check probe of `FREE_KANBAN_POSTMARK_ITEM` — include vs exclude

A-6: Drift-check module location — `scripts/` vs `tools/`