Specification: Product Slow — Front-End Items Page Performance

This specification pins the design decisions identified in goal.md as load-bearing for the PDEV-489 sub-issue stack: the shape of ItemCardsContext (which PDEV-235 introduces and PDEV-548 / PDEV-549 consume), the AG Grid SSRM integration point (which determines when the batched kanban-card query fires and how it interacts with the block cache), and the freshness model (which determines how cross-session staleness is bounded without re-introducing the per-row fan-out the project is here to eliminate).

Everything else — file lists, line-by-line edits, test additions — is implementation detail and lives in the per-sub-issue PRs.

1. `ItemCardsContext`

1.1 What it is

ItemCardsContext is the page-scoped shared store of kanban-card data keyed by item eid, owned by the /items page. It exists so that every component on the page (grid row cells, detail panel, bulk-action handlers) reads the same kanban-card dataset for a given item without each issuing its own network request.

“Page-scoped” means one mount of /items in one browser tab, not “one user session”. The context is React state inside the page component tree; its lifetime is the lifetime of that React root for /items:

One user, two tabs of /items → two independent stores, two independent batched fetches. No cross-tab sharing.
Navigate away and back (e.g. /items → /scan → /items) → the page unmounts, the store is dropped, the next mount starts cold.
Reload → cold.
Across users → there is no sharing surface; each user’s browser has its own React tree.

ItemCardsContext is the single source of truth on the page, not a freshness guarantee against the backend. It enforces consistency within the page (all consumers see the same cards for a given eid) but does not, on its own, bound staleness vs. the backend. The freshness model in §3 is what bounds staleness; the context is the substrate it operates on.

The only kanban-card caching wider than the page mount lives outside this project’s scope: nothing today, and explicitly nothing added — kanban-card data is too change-heavy to cache at the BFF or in localStorage. The items list’s separate Next.js unstable_cache caches items/query-ssrm responses on the BFF, not kanban-cards, and is not affected by this work.

1.2 Why it is needed

Three independent code paths on /items need the same kanban-card data for the same items, at overlapping times:

Grid row cells (QuickActionsCell in columnPresets.tsx) read safeCards.length, inOrderQueueCount, printedCount, and pick the candidateCard for the print/preview/order-queue buttons on every row.
ItemDetailsPanel reads the same cards (full card list) when the panel opens for a row.
Bulk handlers (handleDeleteMultipleItems, handlePrintSelectedCards, handlePreviewSelectedCards in page.tsx) read cards for every selected item to gate deletion, choose labels to print, and compose preview sheets.

Without a shared store, each path issues its own request — and the grid path issues one per row. Routing all three through ItemCardsContext turns the rendered page into a single batched read and turns subsequent consumers (panel, bulk action) into reads against an already-warm store, which the freshness model then layers refresh behavior on top of.

The context is not a general-purpose kanban-card cache: it is the backing store of the /items page’s rendered state. Off-page consumers (e.g. the standalone kanban page, the print preview window) own their own data acquisition.

1.3 Design criteria

The shape is constrained by the consumers, the SSRM lifecycle, the acceptance criteria in goal.md, and the freshness model in §3. Pinning these criteria explicitly so the four PRs in the stack do not drift:

Page-scoped lifetime, not request-scoped. The store outlives any single SSRM block fetch and any single panel-open event. It is cleared only when the /items page unmounts or when the tenant / active-tab / filter-tokens combination changes (any change that invalidates the set of items the grid is showing).
Keyed by item eid (entity ID), not row index. The grid is server-side; row indices are not stable across sorts, filter changes, or pagination. eid is the only stable handle the BFF and operations share with the frontend.
Entry shape carries a client-side fetchedAt. Each entry is { cards: KanbanCardResult[], fetchedAt: number }. fetchedAt is written from Date.now() on the client at the moment the response resolves — never from asOf.recorded or any server timestamp — so clock skew between client and server cannot make every entry permanently “stale” and induce a refresh loop. fetchedAt is the substrate for the TTL check in §3.
Populated by batched fetch, not per-item. The introducer (PDEV-235) replaces ensureCardsForItem(eid) per row with a single ensureCardsForItems(eids[]) call per SSRM block. Per-item entry points remain on the surface as thin wrappers for callers that genuinely act on one eid (the detail panel’s refresh-after-mutation path); they must not be the population path for the page load.
Idempotent on overlap, in-flight deduplicated. Two SSRM blocks (or a block plus a panel open plus a focus-refresh sweep) may request overlapping eid sets concurrently. The context must deduplicate in-flight requests by eid and never issue two parallel kanban-card queries for the same eid. The existing cardFetchPromisesRef per-item dedup generalizes naturally to a per-eid in-flight map that the batched call writes into once per member.
Invalidation on per-item mutation, not on every change. When a user prints, previews, moves a card through the order queue, deletes an item, or receives cards in the panel, the context refreshes only the affected eids (via refreshCardsForItem(eid) or its batched form refreshCardsForItems(eids[])). The grid does not invalidate the entire store on a single mutation.
No persistence. Kanban-card state is too change-heavy to persist; the context is in-memory React state for the lifetime of the page mount.
Empty-result semantics are first-class. An item with zero cards has its eid mapped to { cards: [], fetchedAt }, not absent from the store. Consumers distinguish “not yet fetched” (map[eid] === undefined) from “fetched, no cards” (map[eid].cards.length === 0). PDEV-490 K12 (withTotal = false) guarantees the batched query returns an empty array rather than the legacy IncompatibleState 500, so the absent-vs-empty distinction is meaningful.
Failure is sticky-empty by default, retry on mutation or focus. If the batched kanban-card query fails for a block, the context records the eids as { cards: [], fetchedAt: now } and logs the error; consumers render as if the items had no cards rather than perpetually spinning. The next user-initiated refresh path on one of those items (mutation, panel-open, focus-refresh) can recover.
Freshness at every edit surface, staleness tolerated only at display. Every code path where a user is about to update state — opening the detail panel for an item, initiating a bulk mutation — refreshes the affected eids before acting (subject to the policy in §3). Display-only reads (grid cells, bulk print/preview) may read stale data. This preserves the pre-project safety net for editing while keeping the page-load latency win.

1.4 Public surface

The context value (after this project):

interface ItemCardsContextType {
  /** eid → { cards, fetchedAt }. Undefined means "not yet fetched"; a present
   *  entry with cards.length === 0 means "fetched, no cards". */
  itemCardsMap: Record<string, { cards: KanbanCardResult[]; fetchedAt: number }>;

  /** Batched populate. Fetches eids that are absent or whose entries are
   *  older than the TTL (§3). Deduplicates concurrent calls per eid. No-op
   *  for already-fresh eids. */
  ensureCardsForItems: (itemEntityIds: string[]) => Promise<void>;

  /** Batched refresh. Always fetches, ignoring TTL. Overwrites entries for
   *  the given eids. Used by mutation completion, focus-refresh, and the
   *  panel's parallel refresh on open. */
  refreshCardsForItems: (itemEntityIds: string[]) => Promise<void>;

  /** Single-item wrappers over the batched calls. Kept on the surface for
   *  callers that genuinely act on one eid (e.g. the detail panel's
   *  refresh-after-receive-card path). */
  ensureCardsForItem: (itemEntityId: string) => Promise<void>;
  refreshCardsForItem: (itemEntityId: string) => Promise<void>;

  onOpenItemDetails?: (item: items.Item) => void;
  bulkPrintingCards?: Set<string>;
  bulkPrintingLabels?: Set<string>;
}

Alongside the context, the same module exports the freshness hook used by edit surfaces:

/** Read with optional debounce. Returns cached data immediately if present;
 *  triggers refreshCardsForItem(eid) on mount/eid-change; holds caller's
 *  paint for up to debounceMs (default 0) waiting for the refresh to land.
 *  After debounceMs (or immediately if 0), returns cached + isStale flag;
 *  consumers can render a banner when isStale flips and rId differs. */
function useFreshRead(
  itemEntityId: string,
  opts?: { debounceMs?: number },
): { cards: KanbanCardResult[] | undefined; isStale: boolean; refresh: () => Promise<void> };

The single-item wrappers exist because the detail panel’s “refresh-after-receive-card” path is genuinely per-item; forcing every caller to wrap an eid in an array would be noise.

1.5 Consumer contract

Per-consumer behavior under the freshness model (§3):

Consumer	Today	After this project
`QuickActionsCell` (per row, display-only)	`useEffect` calls `ensureCardsForItem(eid)` on mount + dependency change	Reads `itemCardsMap[eid]?.cards` synchronously. No effects. Block-level fetch is owned by the SSRM datasource. If the entry is stale-by-TTL at read time, the read enqueues a coalesced batch refresh (§3) without blocking the paint.
`ItemDetailsPanel` — on open	Local `fetchCards` calls `cardsForItem` directly	`useFreshRead(eid, { debounceMs: 200 })`. Paints from cache for instant render, awaits refresh up to 200ms, then paints. After resolution: if `rId`s differ vs. cached, surface the banner (§3.4). Net round-trip count: 1 (same as today).
`ItemDetailsPanel` — after in-panel mutation (`Add to order queue`, `onReceiveCard`, etc.)	`fetchCards` + delayed refetches at 300ms / 1000ms / 500ms / 1500ms	`refreshCardsForItem(eid)` + the same delayed refetches. Existing behavior preserved; only the call surface changes.
`ItemDetailsPanel` — `refreshItemCards` window event	`fetchCards` + delayed refetches	`refreshCardsForItem(eid)` + same delayed refetches.
`handleDeleteMultipleItems` (mutating)	Loops `cardsForItem` per selected `eid`	`await refreshCardsForItems(selectedEids)` before proceeding, with a visible progress indicator. If any selected `eid`’s `rId` set differs vs. the cached version, abort with a banner (“Selection changed — refresh and retry?”) and let the user re-trigger.
`handlePrintSelectedCards`, `handlePreviewSelectedCards` (non-mutating)	Loops `cardsForItem` per selected `eid`	Reads from `itemCardsMap`. No refresh, no debounce. Worst case: a duplicate label sheet or a preview of a since-changed card. Accepted risk.

2. AG Grid SSRM integration point

2.1 Where the batched call lives

The batched kanban-card/query call is issued inside the SSRM datasource’s getRows callback, after the /items/query-ssrm response resolves and before params.success is called for the block. The sequence per block:

getRows(params) invoked by AG Grid with the block’s startRow, endRow, sort model, and filter model.
Datasource calls /items/query-ssrm → receives rows for the block plus total count + filter options.
Datasource extracts the eid set from the returned rows and calls ensureCardsForItems(eids) on the page-level context.
Once the kanban-card store is populated, datasource calls params.success({ rowData, rowCount }).

Steps 3 and 4 run in series — params.success waits on the kanban-card fetch — because the per-row cells read itemCardsMap[eid]?.cards synchronously on render. If success fires before the store is populated, every row briefly renders with safeCards = [] and then re-renders, which both flickers the buttons and triggers spurious cell-render telemetry.

Alternative considered and rejected: fire-and-forget the kanban-card call after params.success, with rows initially showing a “loading” state. Rejected because (a) the counts column is the same width whether data is present or not, so there is no layout cost to waiting; and (b) the SSRM block fetch itself is the long pole — waiting an extra ~50–300ms for the kanban-card call is dominated by the items/query-ssrm latency and barely changes the user-visible time-to-paint.

2.2 Block cache interaction

AG Grid’s SSRM block cache may evict and re-request a block when the user scrolls back to it. The itemCardsMap is page-scoped, not block-scoped: when a block is re-fetched, the datasource still calls ensureCardsForItems(eids), which is a no-op for eids already in the store with a fresh-by-TTL fetchedAt. Block evictions do not themselves trigger kanban-card re-fetches.

Cache invalidation (full grid refresh — sort change, filter change, tenant switch) clears the SSRM block cache and the itemCardsMap. Partial refresh (single-item mutation) clears neither; it calls refreshCardsForItem(eid) which overwrites just that entry.

2.3 Error handling

If /items/query-ssrm fails, the SSRM datasource calls params.fail() and the kanban-card call is not issued — no rows exist to fetch cards for.

If the batched kanban-card/query fails after items/query-ssrm succeeds:

The datasource still calls params.success with the row data — the grid must render rows, since the items themselves are loaded.
The context records each eid in the failed batch as { cards: [], fetchedAt: now } (per §1.3 #9).
The failure is logged via the existing error-logging path (not per-row console.error).
The next user mutation, panel-open, or focus-refresh on an affected row triggers refreshCardsForItem(eid), which can recover.

The current IncompatibleState 500 → “no cards” branch in getKanbanCardsForItem becomes dead code once operations#173 lands (PDEV-490 K12); PDEV-235 removes it.

2.4 Block boundary edge cases

Block size mismatch. AG Grid’s default SSRM block size is 100. The 60-row test tenant fits in one block, so the project’s “2 round-trips per block” target is “2 round-trips total” in the common case. For a 500-row tenant: 5 items/query-ssrm + 5 batched kanban-card/query = 10 round-trips, which is still O(blocks) not O(rows).
Partial last block. The last block may return fewer rows than requested. The batched call uses the actual eid set from the response, not the requested range — no empty-eid calls.
Empty block. If items/query-ssrm returns zero rows (filtered-to-empty state), the datasource skips the ensureCardsForItems call entirely and calls params.success({ rowData: [], rowCount }) directly.

3. Freshness and concurrency model

3.1 The trade-off and the boundary

The pre-project code refreshed kanban-card data on every panel open and every bulk action — expensive but safe; cross-session staleness was bounded to the time between user interactions on a row. The naïve “replace fetches with cache reads” form of this project would have collapsed page-load traffic and dropped that safety. The freshness model below restores the safety at the edit surfaces while keeping the page-load batching.

The honest claim is:

Cross-session staleness on /items is bounded by (a) the user’s next interaction with a row (open panel, bulk action, mutation), (b) the next scroll into a block whose entries are stale-by-TTL, (c) the next sort/filter/tab change, (d) the next time the browser tab regains focus, or (e) the TTL window for a row that someone is actively reading — whichever comes first. There is no continuous polling; sub-second freshness for stationary views is out of scope and tracked separately under PDEV-442.

3.2 TTL — on-read, per-`eid`, coalesced

Each entry in itemCardsMap carries a client-side fetchedAt. The TTL default is 30 seconds, exposed as a constant so it can be tuned without reshaping the context. Per-eid rather than global so that an entry just refreshed by a panel-open or mutation doesn’t get re-fetched by an adjacent grid read.

Eviction trigger is on-read, not timer-based:

When a consumer reads map[eid] and finds the entry stale-by-TTL (now - fetchedAt > ttlMs), the read enqueues eid into a coalesced batch.
The coalescer flushes on the next microtask / animation frame as one Filter.In call for the union of enqueued eids. Up to one refresh batch in flight at any time; further enqueues during flight are queued for the next flush.
Display-only reads (grid cells) do not block on the refresh — they paint with the stale data; the cell re-renders when the batch resolves and the map updates (stale-while-revalidate).
Edit-surface reads (useFreshRead) participate in the same coalescer but may also hold their paint for up to debounceMs.

Consequences worth naming:

A stationary grid does not refresh on TTL alone. With on-read triggers, only rows someone is reading produce refreshes. This is intentional and fits the editing-workbench use case; the dashboard-watcher use case needs push and is out of scope.
Visibility-API gating is free. A backgrounded tab is not rendering, so it is not reading, so it is not refreshing. No separate visibilitychange check is needed for traffic control.
Scroll into a stale block produces exactly one batched refresh for the entries the block contains — not one per row.
TTL writes are client-clock-only. fetchedAt = Date.now() on the client at response resolution. Never asOf.recorded or any server timestamp; client/server clock skew cannot induce a permanent-stale loop.

Rejected alternative: timer-based sweep (setInterval scanning the map). Costs more code, requires visibility-API gating to avoid background traffic, and produces refreshes for rows nobody is reading. On-read has no comparable upside given Piece 3 (below) handles every case where freshness actually matters.

3.3 Refresh-on-focus

A single visibilitychange handler on the /items page calls refreshCardsForItems(visibleEids) once when the tab transitions from hidden to visible. visibleEids is the union of eids for rows in the SSRM block cache, plus the open detail panel’s eid if any. The panel contribution covers the “left panel open, scrolled row out, switched tab, came back” case — without it the panel can outlive its block-cache entry and miss the refresh.

This handler is the one explicit non-interaction refresh trigger in the model. It catches the dominant “user came back from lunch / another tab” pattern without committing to continuous polling. Cost is ~5 LoC plus the coalescer already in place from §3.2.

Behavior:

On hidden → visible transition, one batched refresh is issued for the visible block(s).
Coalescing applies — if a scroll-triggered or interaction-triggered refresh is already in flight, the focus refresh joins the next batch rather than racing.
TTL is irrelevant to this trigger — the focus event always refreshes, on the assumption that any time spent backgrounded is enough to warrant a check.

3.4 Edit-surface refresh and reconciliation by `rId`

Verified prerequisite: KanbanCardResult already exposes rId (per-version identifier) and asOf: { effective, recorded } (bitemporal coordinates) at the top of every card, and the BFF route src/app/api/arda/kanban/kanban-card/query/route.ts is a pure passthrough (forwardAsNextResponse(upstream, data)). No BFF change is required.

Detail panel — open with debounce:

Panel opens → useFreshRead(eid, { debounceMs: 200 }) reads cached cards for instant first paint.
Hook issues refreshCardsForItem(eid) in parallel.
If refresh resolves within 200ms (the expected P50–P70 case given the index from operations#173), the panel paints once with fresh data — no flicker, no banner.
If refresh takes longer (P99 tail), the panel falls through at 200ms: paints with cached data, sets isStale: false.
When refresh eventually lands, compare the rId set of cached cards to the rId set of fetched cards. Three cases:
- Identical rId sets → silently update fetchedAt, no banner.
- Differing rId for one or more cards, added cards, or removed cards → set isStale: true, render the banner.
- All cards removed (item now has none) → render the banner with adjusted copy.

Banner — sticky, dismissible, with [Refresh]:

Copy: “This item was updated. [Refresh]”
Default: the displayed cards are not replaced. User input fields in the panel are untouched.
[Refresh] action:
- If the panel has unsaved edits, show a small confirm: “Discard unsaved changes and load the latest?” On confirm, apply server state to the form; on cancel, dismiss the confirm but leave the banner.
- If no unsaved edits, apply server state immediately.
Dismiss action: hide the banner. If the user later saves, the save proceeds with last-write-wins semantics — they were warned.

This is the simple correctness contract: the user always knows when they’re looking at superseded data; clobbering is possible but never silent. Field-level merge UI is explicitly out of scope and would land as a separate ticket under PDEV-442.

Bulk-mutation handlers — refresh-then-act:

handleDeleteMultipleItems (and any other bulk path that mutates card state):

Show progress indicator: “Checking selection…”.
await refreshCardsForItems(selectedEids).
Compare the rId set per eid against the cached version (captured immediately before step 2).
If any rId set differs (a card was added, removed, or moved), abort with a banner: “Selection changed — refresh and retry?” and do not proceed with the mutation.
If all rId sets match, proceed with the mutation as today.

Bulk handlers do not use useFreshRead — they need a definite before/after on the refresh, not a debounce.

3.5 Display-only reads stay simple

Grid cells render map[eid]?.cards ?? [] directly. They participate in the on-read TTL coalescer (so scrolling into stale blocks triggers a batched refresh) but do not debounce or block, and they do not surface any banner on rId mismatch — the visible count changing IS the freshness signal.

handlePrintSelectedCards and handlePreviewSelectedCards follow the same pattern: cache-read, no refresh. Accepted risk: a card may be printed or previewed that has since moved server-side. Worst case is a duplicate label sheet or a preview of a stale state — both operationally tolerable, neither a correctness issue.

4. Stack consumer sketch

Brief sketch of how each PR in the stack consumes the API and the freshness model above, so the surface is right the first time:

PDEV-235 — Introduces ensureCardsForItems / refreshCardsForItems with the { cards, fetchedAt } entry shape, the on-read TTL coalescer (§3.2), the refresh-on-focus handler (§3.3), and the SSRM datasource integration (§2.1). Removes the per-row useEffect → ensureCardsForItem chain in QuickActionsCell. Removes the IncompatibleState 500 dead branch.
PDEV-548 — Introduces the useFreshRead hook (§1.4) and the banner component (§3.4). Routes ItemDetailsPanel through useFreshRead(eid, { debounceMs: 200 }) for open; keeps the existing mutation-driven fetchCards calls and rewires them onto refreshCardsForItem(eid). Implements the rId-set diff and the banner with [Refresh] action including the unsaved-edits confirm.
PDEV-549 — Splits bulk handlers by mutation intent. Rewires mutating handlers (handleDeleteMultipleItems) onto await refreshCardsForItems(selectedEids) + the rId-set check and abort-with-banner pattern from §3.4. Rewires non-mutating handlers (handlePrintSelectedCards, handlePreviewSelectedCards) as cache-only reads (§3.5).
PDEV-550 — Independent of the context API and the freshness model; removes the 10+ console.log lines per request in api/arda/kanban/query-details-by-item and api/arda/kanban/query. Parks at the top of the stack so the log-cleanup review does not gate the latency-critical changes.

5. Out of scope for this specification

Latency baseline numbers. Measure with the existing Sentry + CloudWatch surface; record results in the per-PR verification notes, not here.
getKanbanCardsForItems BFF client signature. A cardsForItems(eids[]) helper that issues a single Filter.In POST to /v1/kanban/kanban-card/query belongs in src/lib/ardaClient.ts; its exact signature is implementation detail for the PDEV-235 PR.
TTL tuning. 30s is the starting default; the value is a tuning decision based on observed staleness pain, not a specification decision. Exposed as a constant for adjustment.
Field-level merge UI on concurrent edits. The banner + [Refresh] pattern is the agreed correctness contract; a richer merge UI is a separate ticket under PDEV-442.
Periodic polling and server push (SSE / WebSocket). Sub-second freshness for stationary views is the dashboard-watcher use case, not the editor’s workbench /items serves. Tracked under PDEV-442 as separate work.
Test strategy. Each PR carries its own unit-test additions; the acceptance criteria in goal.md are the verification surface, not a test plan.