Investigation: PDEV-610 Stale-Data Banner Cross-User Signal
Working notebook for the PDEV-610 sub-project. Captures problem framing, options, current-state findings, and live discussion threads. Promoted content lands in goal.md, the inline Decision Log section of design.md, or a future specification.md.
Problem
Section titled “Problem”The items detail panel renders cards via useFreshRead, which on mount snapshots the set of rId values it sees and only flips isStale=true when its own next refresh finds a different set. The banner is wired to that flag.
Flow today:
- User A edits → A’s save handler refreshes A’s cache → A’s snapshot diff trips → A sees the banner. ✅
- User B has the same item open in another browser → nothing in B’s process is told anything happened → B’s
useFreshReadnever refreshes, never sees anrIdchange → no banner. ❌
The gap is not in the banner or in useFreshRead; both work correctly given their inputs. The gap is that no transport at all carries “this item moved” from A to B. No BroadcastChannel, no polling, no SSE, no WebSocket. The refactor that introduced the banner exposed this gap; it didn’t create it.
Every option below is some flavor of “give User B a reason to revalidate.”
Options, least → most architectural impact
Section titled “Options, least → most architectural impact”1. Manual “Refresh” affordance
Section titled “1. Manual “Refresh” affordance”- Where:
arda-frontend-apponly, one component. - Mechanism: User B clicks Refresh → existing
refreshCardsForItems()runs →useFreshRead’s next diff trips the banner ifrIds changed. - Pros: trivial; zero new infrastructure; zero recurring cost.
- Cons: doesn’t actually solve the reported bug — User B still has to know to refresh. UX bandage.
- Verdict: acceptable only as a fallback paired with another option.
2. Window-focus / visibility revalidation
Section titled “2. Window-focus / visibility revalidation”- Where:
arda-frontend-apponly. Listen forvisibilitychangeandfocusinItemCardsProvider(or a tiny hook on the detail panel); on visible-again, callenqueueStaleRefreshfor currently-mounted items. - Mechanism: B switches tabs/windows and comes back → FE refetches →
useFreshReadtrips the banner. - Pros: zero backend change; no long-lived connections; very small surface; matches “I came back from somewhere else” mental model.
- Cons: doesn’t help while B is actively staring at the screen without switching focus. Misses the two-screens-side-by-side case.
- Verdict: cheap baseline; usually combined with one of the polling options.
3. BroadcastChannel for same-browser tabs
Section titled “3. BroadcastChannel for same-browser tabs”- Where:
arda-frontend-apponly.ItemCardsProviderposts{type: "item-changed", entityId, rId}on its own writes; on receive, invokesenqueueStaleRefreshfor that entity. - Mechanism: only covers the same browser (multi-tab single user). Does not flow across browsers or machines.
- Pros: zero backend change; near-zero latency; robust.
- Cons: ticket repro explicitly requires two users in separate browsers — BroadcastChannel does not help that case.
- Verdict: free correctness win for multi-tab, but does not close PDEV-610 on its own.
4. Active polling of the items detail panel (and/or cache)
Section titled “4. Active polling of the items detail panel (and/or cache)”- Where:
arda-frontend-apponly. While items are mounted (and tab is visible), callenqueueStaleRefreshevery N seconds. - Mechanism: B’s FE periodically refetches;
useFreshRead’s rId-diff trips the banner. Tunable interval; can apply exponential backoff if idle. - Pros: frontend-only; no new backend surface; latency bounded by the interval; no connection state to manage.
- Cons: request volume scales with (#concurrent users × #open items / interval). Cost depends on whether the endpoint supports conditional GETs.
- Verdict: simplest mechanism that actually solves the cross-user case. Strong default candidate.
5. Conditional-GET polling with backend ETag / rId support
Section titled “5. Conditional-GET polling with backend ETag / rId support”- Where: mostly
arda-frontend-app;operationsmay need to confirm/expose ETag headers the FE can echo asIf-None-Match. - Mechanism: as 4, but no-change responses return
304 Not Modifiedinstead of full payload. - Pros: option 4 simplicity, much lower cost; sets up reuse for other staleness surfaces (PDEV-588 etc.).
- Cons: small backend audit to confirm caching headers; mild FE-BE coupling.
- Verdict: option 4 done well. Right pick when this pattern will recur.
6. Server-Sent Events from operations
Section titled “6. Server-Sent Events from operations”- Where:
operations(new SSE endpoint per tenant or per entity scope) andarda-frontend-app(EventSourcesubscription wired intoItemCardsProvider). - Mechanism: backend publishes “item changed” events; FE subscribes once per session and forwards into the existing cache invalidation path.
- Pros: push-based, sub-second latency, fits the unidirectional invalidation use case; lighter than WebSocket.
- Cons: new long-lived HTTP connections → ALB/EKS/idle-timeout considerations; auth-on-stream; backend needs an internal event bus (Kotlin coroutines
SharedFlow, PostgresLISTEN/NOTIFY, or similar) to fan writes out across subscribers; non-trivial multi-pod scaling. - Verdict: right answer when multiple surfaces will need real-time invalidation and polling cost becomes painful.
7. WebSockets
Section titled “7. WebSockets”- Where:
operations+arda-frontend-app. - Mechanism: full bidirectional channel.
- Pros: flexible; supports future real-time features (presence, live cursors, push notifications).
- Cons: strictly more infrastructure than SSE for a strictly unidirectional invalidation use case.
- Verdict: only justifiable if a separate product reason wants bidirectional real-time. Don’t introduce it just for this ticket.
8. Cross-service pub/sub fabric (SNS/SQS, Redis pub-sub, Kafka, …) + WS/SSE gateway
Section titled “8. Cross-service pub/sub fabric (SNS/SQS, Redis pub-sub, Kafka, …) + WS/SSE gateway”- Where:
infrastructure,operations, possibly a new gateway component,arda-frontend-app. - Mechanism: every mutation publishes a domain event; many subscribers (web gateway, other services, analytics) consume. Cross-user invalidation is just one consumer.
- Pros: correct long-term shape if Arda heads toward event-driven, multi-consumer architecture.
- Cons: disproportionate for a single banner; this would be a platform decision, not a bug fix.
- Verdict: out of scope here; mention only as the asymptote.
Where this points
Section titled “Where this points”- Self-contained items-page fix → options 2–5.
- Reusable invalidation channel investment → option 6.
- Reject for this ticket: 1 (doesn’t fix it), 7 (overkill for unidirectional), 8 (platform decision).
Current-state architecture (relevant to option 4)
Section titled “Current-state architecture (relevant to option 4)”Confirmed from arda-frontend-app/src/app/items/ItemCardsContext.tsx:
ItemCardsProvideris mounted at the root layout (post-PDEV-597). One provider, one in-memoryMap<entityId, {cards, fetchedAt}>per browser session.enqueueStaleRefresh(eid)has a microtask coalescer (ItemCardsContext.tsx:650-675): every entityId enqueued in the same JS tick merges into a single batchedrefreshCardsForItems(eids)call — one HTTP request, eids batched in the payload.refreshCardsForItemsis plural-by-design; the network shapecardsForItems(eids)is already batch-friendly.
Caveat: today’s TTL refresh is not one-per-browser. It is useStaleCheck (ItemCardsContext.tsx:197-221), which runs per cell and schedules its own setTimeout at each cell’s own fetchedAt + TTL (default 30 s, env-tunable). Cells with staggered fetchedAt fire at staggered moments, so they typically do not land in the same microtask → multiple smaller batches per TTL window.
To get one-batched-request-per-browser-per-cycle, add a single provider-level setInterval that enqueues every currently-cached (or currently-mounted) eid each tick. The existing microtask coalescer turns that into one refreshCardsForItems(allEids) call. The per-cell useStaleCheck could remain as a safety net or be removed.
Cost knobs to keep in mind:
- Cache size vs. payload size. Cache is LRU-capped via
itemCardsMapLru.ts; one batched POST sends every cached eid. Worth checking the LRU cap and whethercardsForItemshas a server-side limit. - Visible vs. cached. It may be cheaper to poll only currently-mounted (
useFreshRead-subscribed) items rather than every cached entry. Same coalescing, smaller payload, banner correctness preserved (banner only matters where a user is looking).
Combining 1 + 2 + 3 as a local invalidation bus
Section titled “Combining 1 + 2 + 3 as a local invalidation bus”Proposal: instead of treating options 1, 2, 3 as three separate features, treat them as producers on a small in-app event bus, with the provider as the primary consumer. The bus uses BroadcastChannel as its transport, which gives same-browser cross-tab propagation for free.
producers ──► bus ──► consumer focus/visibility ItemCardsProvider save success → enqueueStaleRefresh(eid) bulk-action completion → microtask coalescer scan / state change → batched refreshCardsForItems(eids) manual Refresh click [future] SSE event handlerThis separates “something might have changed” (many sources) from “go fetch and reconcile” (one owner). Other consumers (list cells, detail pane, future surfaces) can subscribe if they need to react locally without going through the provider.
Coverage map
Section titled “Coverage map”| Layer | Catches | Misses |
|---|---|---|
| 1 (manual Refresh) | User-initiated escape hatch | Anything the user doesn’t know to escape |
| 2 (focus / visibility) | “I came back from another tab / window” | Active multi-screen viewing |
| 3 (BroadcastChannel) | Same browser, multi-tab same user | Different browsers, machines, incognito |
| Combined 1+2+3 + action-triggers | Same-browser staleness; post-action freshness for the acting user and their other tabs | The PDEV-610 primary repro — two users in separate browsers |
The reported bug explicitly says “User A and User B both sign in… in separate browsers.” BroadcastChannel does not cross browser instances. So 1+2+3 alone do not close PDEV-610.
How it composes with option 4
Section titled “How it composes with option 4”1+2+3 do not replace option 4; they let option 4 be tuned much lower.
- Without 1+2+3: option 4 carries everything, so the interval must be aggressive (5–10 s) to feel responsive. That dominates request volume.
- With 1+2+3: the perceived-latency cases (“I just acted”, “I just came back”, “my other tab acted”) are caught locally at ≈0 ms. Polling only has to carry the genuinely-remote case (other browser, other user) within an SLA acceptable for that case — comfortably 30–60 s. Request volume drops 5–10×.
Framing: 1+2+3 cover the same-browser layer; #4 (or #6 later) carries the cross-process layer. They occupy different parts of the staleness surface and reinforce each other rather than overlap.
Design risks to watch
Section titled “Design risks to watch”- Cycle / amplification. If the provider both consumes the bus and publishes when its own fetches complete, it can loop. Discipline: only write-originating intents publish (save, bulk action, manual refresh, focus-resume). Cache-update completions do not.
- Same-tab self-receive. BroadcastChannel does not echo back to the sending context. A save handler that only posts to the channel will not notify its own tab. Wrap publish in a helper that both calls the local consumer and posts to the channel — producers then have one API and don’t think about tab boundaries.
- Focus-resume scope. On focus return, which eids? Currently-mounted (anything with an active
useFreshRead/useStaleChecksubscriber) is the natural answer — bounded and relevant. - Throttling. Bursts of user actions can produce bursts of messages; the provider’s existing microtask coalescer already deduplicates a tick’s worth, so this comes nearly free. Worth verifying under bulk-action flows that touch many eids at once.
- Producer specificity. “Anything the user did” can trip refreshes for items an action did not affect. Each producer should publish the specific eids it knows about (save handler knows its own eid; bulk action knows its set; scan knows the scanned eid) rather than a blanket “refresh everything.”
Suggested shape (if we go this way)
Section titled “Suggested shape (if we go this way)”- Tiny
useItemStaleSignal(or similar) exposingmarkItemStale(eid | eid[])that does the call-locally + post-to-channel dance. - One
BroadcastChannel('arda-item-stale')instance owned by the provider. - Producers (save handlers, bulk actions, scan, manual Refresh button) call
markItemStale(...). - Focus / visibility handler in the provider calls
markItemStale(currentlyMountedEids). - The provider’s channel listener simply calls
enqueueStaleRefresh(...)— no new fetch path; the existing coalescer handles batching. - Option 4 (or 6) plugs in later as one more producer: a
setIntervalpostsmarkItemStale(mountedEids)every N seconds; an SSE handler postsmarkItemStale(eid)on push. Consumer side never changes.
The architectural payoff: the bus is the seam where any future transport plugs in without rewiring producers.
Open discussion threads
Section titled “Open discussion threads”- Interval and scope for option 4 layered on top of the bus. What polling interval, applied to which eids (mounted vs. cached), with what backoff when idle. Whether to require ETag /
If-None-Match(option 5) from the outset or as a follow-up. - PDEV-613 (bulk print trailing
{}). Independent investigation; not yet started.
Copyright: (c) Arda Systems 2025-2026, All rights reserved
Copyright: © Arda Systems 2025-2026, All rights reserved