Feature Flags — Spike Analysis

This spike answers one question: what is the best way to gate functionality per tenant / user / environment and toggle it without a deploy, given how Arda’s frontend, BFF, backend, and Cognito identity actually work today.

The findings below are drawn from reading the source — arda-frontend-app, operations, and common-module (the shared Kotlin library where auth lives) — not from design docs. Where a documented design and the running code disagree, the code wins and is noted.

1. Goal, restated

Ship code dormant in production, then reveal it to a chosen audience — one user for a demo, a beta cohort, or a specific tenant that gets a bespoke module — and flip it live or dark from a dashboard, with no rebuild and no redeploy.

Mapped to the industry toggle taxonomy (see Section 4), the ask spans two categories with very different lifespans:

Release / preview toggles — “show this in-progress feature to user X or cohort Y.” Short-lived; deleted at GA.
Permission toggles / entitlements — “tenant Z owns custom module M.” Long-lived; effectively part of the product’s commercial shape.

Keeping these two apart is the single most important design decision. More in Section 7.

2. Verified current state

2.1 Authentication is real; authorization is “are you logged in”

The JWT authenticator in common-module (lib/.../runtime/auth/JwtAuthn.kt) verifies exactly four things — issuer, audience, token_use, and presence of sub — then stops:

JWT.require(algorithm)
  .withIssuer(issuerUrl)
  .withAudience(audience)
  .withClaim("token_use", tokenUse.name)
  .withClaimPresence("sub")     // only requires that a subject exists
  .build()

The principal it produces carries only the subject (Authentication.kt):

is AuthenticationEvidence.JwtToken ->
  Result.success(AuthPrincipal.Authenticated(evidence.jwtCredential.subject!!, evidence))

There are two authenticators: this JWT one, and an opaque Bearer API-key authenticator (BearerKeyAuthn).

2.2 There is no RBAC

A Realm enum exists (Realm.kt: PUBLIC, USER, ADMIN, ARDA), but it is used only as Ktor’s challenge-realm label string in the WWW-Authenticate header — not as an access check. operations assigns a single Realm.USER once at startup (runtime/Main.kt); no endpoint varies by role.
custom:role is read nowhere in the backend. The ServiceContextPlugin extracts only sub, email, and custom:tenant from the token.
In the frontend, custom:role exists in the token type but is referenced only to default to 'User' (src/lib/jwt.ts); nothing gates UI on it.
The documented PUBLIC / FREE / LICENSED / ARDA realm model in Realms, Scopes, and Permissions is an MVP0 design that never shipped — the code enum doesn’t even match it.

2.3 Tenant scoping (ABAC) is real and enforced

Tenant isolation is the one genuine access control:

Entities use AbstractScopedUniverse / ScopedMetadata with a tenantId column (common-module: lib/.../persistence/universe/), and queries are auto-filtered by the context tenant.
The active scope is one of Unauthenticated / Global / Tenant (ApplicationContext.kt → ServiceScope), carried through the coroutine context.
requireTenantScoped(tenantId) cross-checks a request’s tenant parameter against the context scope and fails on mismatch.
CdnUrlResolver even rejects asset URLs whose path prefix is not the caller’s tenant.

2.4 Where the tenant identity comes from — the trust model

ServiceContextPlugin.kt derives the tenant differently per authenticator:

Direct JWT request → tenant comes from the cryptographically verified custom:tenant claim. The caller cannot spoof it.
Opaque API-key request → tenant comes from the X-Tenant-Id header (and subject from X-oidc-subject).

In production the browser never calls the backend directly. The flow is:

Browser → BFF (validates the user's Cognito JWT)
        → backend, authenticated with the system API key,
          forwarding X-Tenant-Id / X-Author / X-oidc-subject derived from that JWT

So the BFF is the trust boundary for tenant isolation: the backend trusts X-Tenant-Id on the API-key path. This is consistent and fine, but it means flag-evaluation identity on the backend should come from the same ApplicationContext scope (tenant + subject), never from a re-parsed role claim that isn’t there.

2.5 What this means for flags

Targeting dimension	Available on backend today	Source
User	✅	`ServiceScope.subject` (= Cognito `sub`)
Tenant	✅	`ServiceScope.Tenant.tenant` (verified claim or BFF header)
Role / tier	❌	not parsed anywhere — do not target on it
Environment	✅	deploy config (HOCON / Helm per partition)

The dormant custom:role claim is raw material a flag or entitlement layer could finally put to use — but nothing reads it today, so the spike must not assume it.

3. Where flags must be evaluated

A flag has to be evaluable in three places, and they must agree:

Browser (hide/show UI)  →  BFF (gate routes/SSR)  →  Kotlin backend (gate business logic + API)

The backend is the real enforcement point — hiding a button does not stop the API call. The frontend evaluation is UX only. This is non-negotiable for any “ship dormant, reveal to some users” feature that touches data.

4. How this is done elsewhere

Pete Hodgson’s taxonomy (hosted on martinfowler.com) is the canonical reference and sorts flags by intent, because intent dictates lifespan — and mismatched lifespan is the root of flag debt:

Release toggles — gate in-progress work merged to trunk; days-to-weeks; delete at GA.
Experiment toggles — split traffic to measure; live until significance; the layer under A/B testing.
Ops toggles — operational kill-switches / load-shedding; long-lived by design.
Permission toggles — expose features to subsets of users/tenants; longest-lived (shade into entitlements).

The modern build-vs-buy landscape:

SaaS platforms — LaunchDarkly, PostHog, ConfigCat, DevCycle, Statsig. Dashboard toggling, targeting rules, SDKs, percentage rollout out of the box.
Self-hosted OSS — Unleash, Flagsmith. Same model, you run the control plane.
OpenFeature — a vendor-neutral standard (CNCF) with a provider model and JVM + JS/React SDKs. You code against one API and swap the backing provider later. The standard way to avoid lock-in.
Roll your own — a flags table + admin UI + SDKs. Full control, real ongoing cost; only justified by requirements a vendor can’t meet (e.g. bitemporal audit native to your stack).

5. Build vs. buy for Arda

Option	Fit	No-deploy toggle	Tenant / user / env targeting	Backend (Kotlin) eval	Effort	Lock-in
PostHog flags ⭐	Already in the stack	✅ dashboard	✅ group = tenant, person = user, project/property = env	✅ `posthog-java`, local eval	Low	Medium (mitigate via abstraction)
Unleash / Flagsmith (self-host)	OSS, multi-tenant native	✅	✅	✅ JVM SDK	Medium (run infra)	Low
LaunchDarkly / Statsig	Best-in-class	✅	✅	✅	Low	High + cost
Build in-house (Data Authority + bitemporal)	Native tenant scope + audit	❌ build admin UI too	✅	✅ native	High	None
Cognito claims (`custom:flags` / pre-token Lambda)	❌ as primary	❌ stale until refresh	coarse only	reads claim	Low	—

6. Recommendation

Adopt PostHog feature flags behind a thin internal abstraction. Evaluate in the frontend, the BFF, and the backend. Defer building anything bespoke.

Why PostHog leads:

It is already integrated in arda-frontend-app (PostHog instrumentation is wired in the provider tree), so there is no new vendor relationship and no new infra.
Its model maps cleanly onto Arda’s verified identity:
- PostHog distinct_id ← Cognito sub
- tenant targeting ← tenant UUID (see 6.3 for the group-vs-person-property cost trade-off)
- environment ← one PostHog project per env (dev / stage / prod), or an env property
It supports everything the goal needs: boolean + multivariate flags, percentage rollout, per-user and per-group targeting, server-side local evaluation (Java SDK), and client-side bootstrapping (React) to avoid first-paint flicker.

Mitigate lock-in from day one: put a one-method interface in front of it (isEnabled(key, ctx)) on each side, or adopt OpenFeature with a PostHog provider. Either way, never scatter posthog.isFeatureEnabled(...) across the codebase — swapping vendors should be a one-file change.

Why not the alternatives now: self-hosted OSS adds infra you don’t need yet; LaunchDarkly is excellent but paid and redundant with PostHog; Cognito claims are the wrong mechanism (stale until token refresh — no live kill-switch — and 2 KB string attributes); a bespoke bitemporal service is weeks of work to reinvent a solved problem. Revisit the in-house build only when governance needs as-of audit of who flipped what — and even then, keep it behind the same abstraction.

6.1 Integration design (mapped to the code)

Frontend (arda-frontend-app) — UX only:

Bootstrap flags at login in AuthInit (once sub / tenant are known from JWTContext): identify the PostHog user + group, fetch flags, store in a new featureFlagsSlice (mirrors the existing itemsFilterSort slice, persisted via redux-persist to avoid flicker).
Expose a useFeatureFlag(key): boolean hook — same consumption shape as useAuth().
Slot a FeatureFlagProvider into layout.tsx right after AuthInit, alongside ItemCardsProvider.

BFF (src/app/api/) — already extracts identity in processJWTForArda() and injects X-Tenant-Id / X-Author / X-oidc-subject. Optionally add a GET /api/features route (using the getBffAuthHeaders() pattern) to serve evaluated flags for SSR / first paint. Add X-Env if backend needs the environment explicitly.

Backend (Kotlin / common-module) — the real enforcement:

Add a small FeatureFlags interface in common-module, with a posthog-java local-evaluation implementation (polls flag definitions, evaluates in-process → no per-request network hop).
Feed identity straight from ApplicationContext → ServiceScope: subject as distinct_id, tenant as the group key. Both are already on the context for every request.
Gate code with if (flags.isEnabled("pdev-679-thing", ctx)) { ... }. Flip it in PostHog → running pods pick it up on the next poll, no redeploy.

6.2 Consistency rules

One shared, typed registry of flag keys referenced by both frontend and backend (an enum on each side, same string values).
The backend is authoritative; the frontend mirror is for UX.
Define refresh policy explicitly: frontend cache TTL + refresh trigger (navigation / visibilitychange); backend local-eval poll interval (the kill-switch SLA sets this).

6.3 PostHog plan and cost, verified

Confirmed against PostHog’s published docs and pricing (June 2026). The live project’s plan and add-on state could not be read directly — the connected MCP key is scope-limited to generate-app-url / llma-personal-spend / user, so flag, organization (billing), and project scopes are all gated. The two items marked unverified on our org need a quick check in the PostHog UI.

Capability	Status	Note
Feature flags	✅ free tier	First 1M flag requests/month free, then pay-as-you-go from `$0.0001`/request, dropping with volume
Local evaluation (backend)	✅ available, Java SDK	Uses a Feature Flags Secure API key; bills only the periodic definition fetch (counted as 10 requests each), not per check — cheap at scale
Per-user / per-cohort targeting	✅ no add-on	Target on `distinct_id` or a person property
Per-tenant targeting by group	⚠️ needs Group Analytics add-on (paid, from `$0.000071`/event)	Required to set “Match by → Group” and target group properties / group-level rollout %
Group Analytics enabled on our org	❓ unverified	Could not read org billing via MCP — check in PostHog UI
Bootstrapping (frontend, React)	✅	Backend returns flags; frontend seeds them to avoid first-paint flicker

7. The flag-vs-entitlement split

The brain-dump’s two examples are different animals:

“Show it to some user” → a release/preview toggle. Short-lived, delete at GA. PostHog targeting by sub or a beta group. Clean fit.
“Very custom modules per tenant” → a permission toggle / entitlement. Long-lived, commercially meaningful, expected to persist and be billed against. PostHog tenant-group targeting works for the MVP, but entitlements eventually want a durable home (a tenant-entitlements record in the backend, possibly the future use for custom:role / a tier claim).

Decide the boundary in triage. A practical rule: if removing the flag in six months would be a bug (someone paid for that module), it’s an entitlement, not a release flag — don’t let it live forever in the experimentation tool.

8. Risks & open questions for triage

Flag vs. entitlement boundary — settle first; everything else follows.
Frontend/backend consistency — shared key registry; backend authoritative.
Kill-switch SLA — how fast must a bad flag die? Sets the backend poll interval.
Data residency / PII — sending sub + tenant to PostHog. Events already flow there, but confirm with the owner of that integration; lean on PostHog’s PII scrubbing conventions already present in common-module observability.
Stale-flag hygiene — a removal ritual; PostHog surfaces stale flags.
Group Analytics add-on — verify in the PostHog UI whether it is enabled on our org. It is required only if we want group-keyed tenant targeting / group-level rollout; the free person-property approach covers the immediate goal (see 6.3). Java local evaluation and the free 1M-request flag tier are already confirmed available.
MCP key scopes — the connected PostHog key cannot read flags/org/projects. Re-auth with feature_flag:read, organization:read, project:read (at least) to let tooling introspect the live setup.

9. Proof of concept (≈ a few days)

One PostHog flag (pdev-679-spike) targeted by tenant group + a 10% user rollout.
Frontend: useFeatureFlag gating one trivial UI element, bootstrapped at login.
Backend: a FeatureFlags interface + posthog-java local eval gating one endpoint branch, using ApplicationContext identity.
Demo: flip it in the PostHog dashboard and watch the frontend and a running backend pod change behavior with no redeploy.
Deliverable: the flag-vs-entitlement boundary written up for triage.

This de-risks the whole effort and answers “is buy good enough?” concretely before any large commitment.