ADR-002: Cache-Invalidation Coalescing

Author: Miguel Pinilla Date: 2026-06-02 Status: Accepted

Context

The invalidation mechanism from ADR-001 attracts bursty publication patterns: a bulk-delete handler publishes one eid per deleted item in a tight loop; a focus event publishes every grid-displayed and detail-panel eid in a single call; a sibling tab can deliver a multi-eid message while a local producer is also publishing. Without explicit shaping, every published eid would produce a separate refreshCardsForItems call against the BFF.

The mechanism also attracts publication for “this eid might exist” cases: a card-state event handler that does not statically know whether the row carries an item id (the order-queue case), a delete handler whose deleted-ids array can legitimately be empty. Without a tolerant input contract, every publishing site has to carry a if (eid) ... guard, and forgetting one is a silent bug rather than a visible one.

This ADR records how those two concerns are addressed without introducing additional moving parts.

Decision Drivers

No amplification under burst. Many producer calls in one tick must collapse to one network refresh per affected eid set.
Predictable batching boundary. Whatever shapes the burst must drain at a well-defined point a synchronous caller can reason about.
Producer ergonomics. Producers should not carry inline guards for empty / missing input. Forgetting a guard should be a no-op, not a TypeError or a wasted request.
One refresh per batched intent on the active tab. A producer that owns its own local refresh path should not also be charged for the bus’s local-notify branch firing the same refresh.

Options Considered

Option A: No coalescing; rely on cache dedup

Description: Every markItemStale(eid) produces a direct refreshCardsForItems([eid]). Reduce the cost by deduping in-flight requests per eid at the cache layer.
Pros: Simplest model. The cache is the single point of coordination.
Cons: Per-call eid is one element, so the BFF receives one POST per producer call. In-flight dedup helps for exact-same-eid concurrent calls but does not help for two calls on different eids in the same tick. Bulk-delete of N items becomes N requests.

Option B: Time-windowed debounce at the bus

Description: Buffer published eids for a short interval (e.g., 50 ms) and flush the union as one batch.
Pros: Caller-agnostic; combines bursts from independent producers.
Cons: Introduces an arbitrary timing constant that has to be tuned. Delays the refresh on the active tab even when only one producer fired. Adds a setTimeout to test against, complicating fake-timer tests.

Option C: Microtask coalescer at the consumer (this ADR, layer 1)

Description: The bus’s consumer (ItemCardsProvider) maintains a queue and schedules a single queueMicrotask flush. Every enqueueStaleRefresh(eid) call within one tick adds to the queue; the flush issues one batched refreshCardsForItems(eids) call and clears the queue. The bus itself stays direct.
Pros: No arbitrary timing constant — the boundary is the microtask flush, which JavaScript guarantees runs after the current synchronous frame. A burst of N publications in one tick produces exactly one batched fetch. Synchronous callers can reason about the boundary by await Promise.resolve(). Free in tests under both real and fake timers (the coalescer uses queueMicrotask, which most fake-timer setups leave real).
Cons: One layer below the bus. Anyone looking at the bus alone would not see the batching; the doc has to explain where the coalescer lives.

Option D: Tolerant input contract on the bus (this ADR, layer 2)

Description: markItemStale and markItemStaleRemoteOnly accept string | readonly string[] | undefined. undefined, empty strings, and arrays of empty strings normalize to a no-op (no consumer call, no channel post). Producers publish whatever they have without pre-guards.
Pros: Removes a class of producer-site bugs by making the silent case explicit and centralised. Cuts repetitive if (eid) ... boilerplate at every publishing call site. One place to inspect the no-op semantics.
Cons: Slightly wider input type than the strict “string of non-zero length” some callers would prefer. The bus is the wrong place to enforce “non-empty eid” as a precondition.

Option E: Cross-tab-only publication variant (this ADR, layer 2)

Description: A second bus method, markItemStaleRemoteOnly, posts on the channel but does not invoke the local consumer. Used at producer sites that own their own local refresh callback (the detail-panel Refresh button, CardStateDropdown with onTriggerRefresh).
Pros: Active tab refreshes exactly once instead of twice when the producer already has a local refresh path. Surfaces the intent (“notify siblings, I’m refreshing locally myself”) at the call site.
Cons: A second method on the bus that callers have to choose between. Picking the wrong one is a performance bug (double fetch), not a correctness bug.

Decision

We chose a two-layer combination of C, D, and E:

The consumer-side microtask coalescer (C) absorbs publication bursts into one batched refresh per tick.

The bus’s tolerant input contract (D) lets producers publish without inline guards.

The cross-tab-only variant (E) lets the two producer sites that own a local refresh path skip the bus’s local-notify branch.

The two layers are deliberately at different ends of the pipeline. The coalescer is at the consumer (where it has the cleanest microtask boundary and the cleanest path to the batched refresh primitive). The tolerant input is at the bus (where it removes per-callsite boilerplate). Putting both at the bus would conflate “what arrives” with “what fires”; putting both at the consumer would push input-normalization concerns out of the bus’s contract.

We rejected Option A because the no-coalescing failure mode is real (bulk-delete of 50 items would issue 50 POSTs). We rejected Option B because the timing constant is arbitrary and the test surface is worse than the microtask alternative.

Consequences

Positive

A burst of markItemStale publications across multiple producer sites within one tick collapses to one batched refresh per affected eid set.
Producer call sites do not carry inline if (eid) guards. Forgetting an eid (e.g., card.item?.entityId returning undefined) is a clean no-op, observable in the bus’s no-op normalization, not a TypeError.
Active-tab double refresh is avoided at the two sites that own their own local refresh callback.
Test surface is simple: await Promise.resolve() is sufficient to flush the coalescer. No real-time waits and no fake-timer dependencies.

Negative

Two methods on the bus (markItemStale, markItemStaleRemoteOnly) instead of one. Picking the wrong one is a silent perf bug.
The coalescer’s behavior is invisible at the bus layer. The system reference documents this; readers who skim the bus surface alone will miss the batching.

Neutral

The coalescer uses queueMicrotask. This was a deliberate choice over Promise.resolve().then(...) for the test ergonomics with Jest fake timers; both produce equivalent observable behavior.
The bus’s input contract is wider than the consumer’s. Inputs are normalized at the bus boundary, so the consumer always receives a plain readonly string[] with no empty elements.

Follow-Up Actions

If a future producer site is added that owns its own local refresh callback, document the choice between markItemStale and markItemStaleRemoteOnly at that site.
If a non-item-cards subsystem adopts this bus pattern, reconsider whether the input-normalization rules should be hoisted to a shared helper rather than re-implemented per bus.