Spike #855 — Item List Filter & Sort: Proposal (Historical)

This document is historical. It describes the originally proposed architecture (AG Grid native column filters, client-side row model, custom useAllItemsQuery hook). After team review, the implementation pivoted to CloudScape PropertyFilter + AG Grid Server-Side Row Model (SSRM) + BFF-side filter/sort engines.

For the canonical reference, read architecture.md. For what actually shipped, see implementation-notes.md.

Constants quoted in this document (e.g., HARD_MAX_ITEMS = 8_000, ITEMS_ALL_TTL_SECONDS = 7200, the 1MB CloudFront limit framing) reflect the original proposal. The shipped design is different: HARD_MAX_ITEMS is a BFF-memory cap (not CloudFront-derived), CACHE_TTL_SECONDS = 300 (5 min), tenant-scoped invalidation tag 'items-all:' + tenantId, capacity additionally bounded by the 2MB unstable_cache per-entry limit. Don’t copy-paste from this file — see architecture.md for the canonical values.

The core BFF caching architecture (unstable_cache, revalidateTag, pagination loop) is exactly what this proposal designed — only the filter UI and data delivery layers changed.

Ticket: Arda-cards/management#855 Branch: feature/items-sorting-filtering Date: 2026-04-14 Author: David Quintanilla

This document was the single deliverable for spike #855. It combined the investigation findings, architecture proposal, sprint scope, feasibility prototype plan, and migration path into one file. It was the input for sprint planning.

0. TL;DR

Implement AG Grid Enterprise multi-column filter + multi-sort for /items, entirely client-side, against a full-tenant dataset fetched once per session and cached at the BFF.
No backend changes required. The current API does not support server-side sort or extended filter operators, so the feature is designed to ship independently.
BFF implements a fat-fetch endpoint that loops the existing cursor-paginated backend API in 500-row pages up to an 8k safety cap (bounded by the Amplify CloudFront 1MB response limit, not browser performance), strips each item down to grid-relevant columns, and caches the result per tenant for 2 hours with four explicit refresh triggers (write, login, list mount, TTL ceiling).
AG Grid client-side row model runs filter, sort, pagination, and search in the browser with sub-100ms latency over ~5k items.
Filter and sort model are persisted in a small Redux slice (Published Items tab only). Rows are kept in a lightweight custom hook with a module-scoped cache — no new state-management library. Cursor pagination disappears from the client UI entirely.
One sprint for one engineer, plus buffer. No blocking dependencies. Phase 2 (real backend filter/sort) and phase 3 (shared Redis cache) are documented escape hatches gated on production metrics, not planned work.
Every decision in the project’s Goal document is honored. §2 maps each adopted decision to its concrete implementation.

1. Context and constraints

1.1 What the ticket asks for

Architecture document covering data flow, state management, column filter type mapping, and migration path.
Feasibility prototype in the ux-prototype Storybook demonstrating at least single-column filter + sort with AG Grid Enterprise.
Sprint proposal with task-level estimates, identified dependencies, and test plan.

Acceptance criteria include validating the BFF caching approach against the current API pagination model and validating the architectural decisions in the Goal document.

1.2 Authoritative sources read for this proposal

Goal document — adopted decisions for BFF caching, AG Grid native features, 500-page loop, ~5k items per tenant, 2-hour refresh ceiling
Application vs Canary Audit — capability and implementation diff between arda-frontend-app and the canary createEntityDataGrid
Getting Started guide
Tickets: #742 parent epic, #781 phase 1/2 filter plan, #824 Query DSL search, #611 rename Filter→Search, #749 AG Grid Enterprise license
arda-frontend-app codebase: items page, columnPresets, ArdaGrid, ItemTableAGGrid, ardaClient, BFF routes, store slices, amplify.yml

1.3 The hard constraint

The backend API does not currently support server-side sort or extended filter operators. This feature is designed to ship independently, with no backend changes required. The architecture follows directly from that constraint.

1.4 Baseline facts about the existing app

Next.js 16.1.6 App Router, React 19.2.4, TS 5.9.3. Redux Toolkit + redux-persist.
Auth: AWS Cognito SDK direct (not Amplify Auth despite the name).
Hosting: AWS Amplify Hosting (CI/CD + Lambda for SSR routes). Three apps (dev/stage/prod), each with inline build specs that override the committed amplify.yml.
AG Grid Community + Enterprise 34.3.1 installed and registered. License active on deployed environments via NEXT_PUBLIC_AG_GRID_LICENSE_KEY set in each Amplify Console and picked up by amplify.yml:16. Local dev shows a watermark — harmless.
Current /items flow: ItemsPage → ardaClient.queryItems() → POST /api/arda/items/query → POST {BASE_URL}/v1/item/item/query. No caching anywhere — cache: 'no-store' hardcoded in every route.
Backend filter DSL today: and/or/eq/regex. The existing search box already uses this via regex clauses and works in production. No sort field. No range/in/null/not operators.
Pagination is cursor-based (thisPage/nextPage/previousPage tokens). Default 50 items per page.
AG Grid wiring: sortable: true, filter: false at default col def. Sort is client-side over the current page only. enableFiltering, onFilterChanged, enableMultiSort, onSortChanged props exist on ArdaGrid but are never passed.
Tenant item counts estimated ~5k. Not measured, not guaranteed.
@arda-cards/api-proxy and canary createEntityDataGrid primitives are NOT consumed by this app today. Canary migration is a separate future project.

1.5 Amplify Hosting runtime reality

API routes run as Lambda functions. Cold starts matter. Memory and /tmp are per-instance.
Any NEXT_PUBLIC_* env var set in the Amplify Console flows into .env at build time via amplify.yml:16 grep pattern. No build-spec change needed to add public env vars.
Inline build specs in each Amplify app override the committed amplify.yml. Pipeline changes must be applied per-app via aws amplify update-app or they silently do nothing.
Unit tests run in the build pipeline (amplify.yml:26). Flaky tests block deployments, not just CI runs.
Amplify Hosting SSR responses are gated by CloudFront’s ~1MB response body + headers limit, NOT the looser API Gateway 6MB limit. Confirmed by production bug report aws-amplify/amplify-hosting#3214. The fat-fetch endpoint must respect this — §5.3.2 addresses it via column stripping plus explicit gzip compression. This is the single tightest constraint on the design (see §6.1).

1.6 Confirmed answers to earlier unknowns

Backend sort? No. The API has no sort field.
Backend DSL extension? No. and/or/eq/regex is what we have.
Does the existing server-side filter path work today? Yes. Search box uses it in production.
Items per tenant? Estimated ~5k. Some tenants likely larger.
Why frontend-only? Server-side sort/filter is not available in the current API.
Backend paginate.size cap? Unknown. The internal loop handles either case.
Product freshness expectations? Unknown. 2-hour TTL + explicit triggers is a reasonable default; confirm before shipping.
Shared Redis on the table? Unknown. Not blocking phase 1.
Lambda response size at 5k items? Must be validated in the prototype (§8). Column stripping is the mitigation.

2. Alignment with the Goal document

2.1 Decisions adopted by the Goal document

The Goal document in the Arda-cards/documentation repo adopted these decisions before this spike started. This proposal implements all of them. Quoted phrases are from the Goal doc’s “General Guidance & Adopted Decisions” section.

BFF-level tenant cache. “The BFF will retrieve the complete set of items for a tenant from the back end and will keep a cache in its memory, keyed by tenant id.” → Next.js unstable_cache wrapping a BFF-internal cursor loop, keyed by tenant (§5.3.2).
AG Grid native sort/filter/pagination with Redux integration. “The SPA and BFF will use the AG Grid native Sort, Filter and Pagination with its integration with Redux.” → AG Grid client-side row model with per-column filter: true, Shift-click multi-sort (default via multiSortKey='shift'), and built-in pagination. Filter/sort model persisted in a new per-tab Redux slice (§5.1, §5.4). Note: the project’s ArdaGrid wrapper exposes these as enableFiltering / enableMultiSort props that translate to the underlying AG Grid options — see §5.4.2 for the wrapper-prop clarification.
500-row-page backend loop until exhausted. “The BFF will read from the back end in pages of 500 items until all items for a tenant are retrieved.” → BACKEND_PAGE_SIZE = 500 with an internal cursor loop in the BFF fat-fetch route (§5.3.2).
~5k items per tenant assumption. “Current estimates (2026-04-02) are < 5000 items per tenant.” → Drives the client-side row model choice (comfortable at this scale) and the HARD_MAX_ITEMS = 8_000 safety cap in §5.3.2 — note this cap is bounded by the Amplify CloudFront 1MB response limit (see §6.1), NOT by AG Grid browser performance.
Cache refresh triggers. “The BFF cache will be refreshed from the back end when: (1) there is an update, creation or deletion of an item, (2) a user for the tenant logs in, (3) an Item List is mounted in a user session, (4) at a minimum once every 2 hours (should be configurable).” → revalidateTag on every write (§5.3.3), a refetch-on-mount effect in the custom useAllItemsQuery hook (§5.2), explicit login-time cache bust via invalidateAllItemsCache (§5.3.3), ITEMS_ALL_TTL_SECONDS = 7200 background TTL (§5.3.2).
Cache eviction deferred. “Cache Eviction will need to be designed once we have a better understanding of the memory pressure we have in production.” → Acknowledged. unstable_cache on Lambda handles eviction implicitly via Lambda instance lifecycle.
AG Grid native UX is an acceptable starting point. “It is also O.K. to start with a subset of those capabilities.” → Sprint 1 enables the filter types listed in §5.4.1; floating filters, the advanced filter expression builder, and saved views are deferred.

2.2 Interpretation note: “server side rendering of the page”

The Goal document also contains this phrase:

“BFF will do a server side rendering of the page to be shown to the user against its cached items.”

In context, this means the BFF serves data from its in-memory cache to the SPA — not that the BFF implements AG Grid’s Server-Side Row Model (SSRM) API. This proposal reads “rendering” as “serving,” and implements it accordingly: the BFF returns the full tenant dataset from cache in one response, and AG Grid’s client-side row model handles filter, sort, and pagination in the browser without further round-trips.

Why this interpretation at 5k items:

Client-side row model is the idiomatic “native” mode the Goal document calls for. Sub-100ms interaction latency, zero network round-trips per filter change.
SSRM against the BFF cache would require reimplementing AG Grid’s filter model semantics in Node, a dedicated set-filter-values endpoint per column, and a network round trip for every filter/sort event. At 5k items this is complexity with no user-visible benefit.
Cold starts are cheaper. One BFF cursor loop pays for the whole user session. SSRM mode would potentially pay the loop cost on every Lambda cold start a user hits.
Future AG Grid features (advanced filter, quickFilter, grouping) work natively in client-side mode. Many need custom server implementations in SSRM.
The “refresh on list mount” and “2-hour ceiling” policies in the Goal document only make sense if the client holds a snapshot across interactions. SSRM makes every interaction a live query and the refresh policy becomes meaningless.

If HARD_MAX_ITEMS fires in production (tenant counts grow past ~8k, CloudFront-bound), phase 2 migrates to real backend filter/sort. That is where SSRM becomes the right tool.

2.3 What this proposal does NOT change from the Goal document

Tenant cache lives in the BFF, not the client. ✓
Cache is refreshed on mutations, login, list mount, and 2-hour ceiling. ✓
AG Grid native sort/filter/pagination drive the UX. ✓
Redux persists the filter/sort state across reloads and tabs. ✓
Cursor pagination to the real backend happens in the BFF, not the SPA. ✓

3. Phased roadmap

Phase	Scope	Ships when	Backend dep	Infra dep
Phase 1	AG Grid native filter/sort/search/pagination client-side over full tenant dataset; Redux-persisted filter/sort model (Published Items tab); fat-fetch BFF route with `unstable_cache` + `revalidateTag`; custom `useAllItemsQuery` hook with module-scoped cache; HARD_MAX safety metric	Sprint 1	None	None
Phase 2 (escape hatch)	Migrate to AG Grid Server-Side Row Model against the real backend. Requires backend sort + richer filter DSL	Only if HARD_MAX fires in production	Yes — full backend filter/sort	None
Phase 3 (escape hatch)	Custom Next.js `cacheHandler` backed by Redis/ElastiCache or DynamoDB; cluster-wide `revalidateTag`	Only if per-instance hit rate is too low	None	Redis/ElastiCache

Phases 2 and 3 are documented paths, not planned work.

4. Data flow

User navigates to /items
         │
         ▼
ItemsPage mounts
         │
         ▼
useAllItemsQuery(tenantId, tab)          [custom hook, module-scoped cache, refetch on mount]
         │
         ▼   (cache miss at client layer)
GET /api/arda/items/all?tab=<tab>
         │
         ▼
getCachedAllItems()                       [BFF, unstable_cache, TTL 2h, tag items:{tenantId}]
         │
         ▼   (cache miss at BFF layer)
Loop ArdaQueryItemsRequest with
  { filter: buildTabFilter(tab), paginate: { size: 500 } }
until nextPage is empty OR HARD_MAX reached
         │
         ▼
Concatenate pages → stripToGridColumns → return full list
         │
         ▼   (cached at BFF and at client)
         ▼
AG Grid client-side row model receives all rows
         │
         ▼
User applies filter / sort / search → ZERO additional fetches
All interactions happen in memory
         │
         ▼
onFilterChanged / onSortChanged → dispatch to Redux for persistence

Mutations (create / update / delete / publish draft):

Send the write to the backend as today.
On success, call revalidateTag('items:{tenantId}') in the API route — drops the server-side BFF cache for subsequent reads by other Lambda instances / browser tabs.
On success, the client mutation hook patches the cached item in place via patchItemInCache / addItemToCache / removeItemFromCache. No refetch, no loading state — the grid re-renders only the affected row.
Only login, list mount, and the 2-hour TTL trigger a full refetch of the whole tenant dataset. Routine edits don’t.

5. Detailed design

5.1 Redux slice — deliberately small

Scope: Published Items tab only. That’s the only active tab today. If Draft or Recently Uploaded tabs are added later, extending the slice to per-tab state is straightforward (wrap in Record<tab, ...>), but we don’t build that complexity now.

File: src/store/slices/itemsFilterSortSlice.ts (new)

Stores filterModel, sortModel, and schemaVersion. Persisted via redux-persist. Deliberately excludes:

Rows. Rows live in the module-scoped cache inside useAllItemsQuery (§5.2). Putting 5k items in redux-persist would serialize them to localStorage on every dispatch — a performance disaster.
Pagination state. AG Grid’s built-in client-side pagination owns it. No cursor tokens anywhere in client state.
Search string. AG Grid quickFilter via local component state.

import { createSlice, PayloadAction } from '@reduxjs/toolkit';

export type ItemsFilterModel = Record<string, unknown>;
// AG Grid FilterModel is opaque — it's a per-column object keyed by colId:
//   { name: { filterType: 'text', type: 'contains', filter: 'bolt' },
//     'classification.type': { filterType: 'set', values: [...] },
//     cost: { filterType: 'number', type: 'lessThan', filter: 50 } }

export type ItemsSortModel = Array<{
  colId: string;
  sort: 'asc' | 'desc';
  sortIndex?: number;   // multi-column sort priority
}>;

interface ItemsFilterSortState {
  filterModel: ItemsFilterModel;
  sortModel: ItemsSortModel;
  schemaVersion: number;
}

export const CURRENT_SCHEMA_VERSION = 1;

const initialState: ItemsFilterSortState = {
  filterModel: {},
  sortModel: [],
  schemaVersion: CURRENT_SCHEMA_VERSION,
};

const itemsFilterSortSlice = createSlice({
  name: 'itemsFilterSort',
  initialState,
  reducers: {
    setFilterModel(state, action: PayloadAction<ItemsFilterModel>) {
      state.filterModel = action.payload;
    },
    setSortModel(state, action: PayloadAction<ItemsSortModel>) {
      state.sortModel = action.payload;
    },
    clearFilters(state) {
      state.filterModel = {};
    },
    clearSort(state) {
      state.sortModel = [];
    },
    resetAll() {
      return initialState;
    },
  },
});

export const {
  setFilterModel,
  setSortModel,
  clearFilters,
  clearSort,
  resetAll,
} = itemsFilterSortSlice.actions;
export default itemsFilterSortSlice.reducer;

The schemaVersion is a hedge against column renames — a version bump in code plus a migrate entry in persistConfig resets persisted state on the next hydrate, so users don’t restore filter state that references unknown column IDs.

5.2 Custom `useAllItemsQuery` hook — lean, no new library

File: src/hooks/useAllItemsQuery.ts (new)

This feature has exactly one cached endpoint. Adding React Query (or any new state library) for a single use case is overkill, introduces a second state-management mental model alongside Redux, and adds ~12KB to the bundle for features we don’t need (window-focus refetch, devtools, global query invalidation). Instead, we ship a ~90-line custom hook with a module-scoped cache.

Reads and behaves like React Query from the caller’s perspective, but the whole thing fits in one file you can understand in two minutes.

import { useCallback, useEffect, useState, useSyncExternalStore } from 'react';
import { useAppSelector } from '@/store/hooks';
import { selectTenantId } from '@/store/slices/authSlice';
import { selectActiveTab } from '@/store/slices/itemsSlice';
import { ardaClient } from '@/lib/ardaClient';
import type { GridItem } from '@/lib/mappers/itemsGridMapper';

// Module-scoped cache. Lives for the lifetime of the browser tab.
// Keyed by `${tenantId}:${tab}` so each tenant and each list tab has its own entry.
const cache = new Map<string, GridItem[]>();
// In-flight promises keyed the same way — lets two components mounting at the
// same time share a single network request instead of double-fetching.
const inflight = new Map<string, Promise<GridItem[]>>();
// Subscribers per key so useSyncExternalStore can re-render consumers on updates.
const subscribers = new Map<string, Set<() => void>>();

function cacheKey(tenantId: string, tab: string): string {
  return `${tenantId}:${tab}`;
}

function notify(key: string) {
  subscribers.get(key)?.forEach((cb) => cb());
}

function subscribe(key: string, cb: () => void) {
  let set = subscribers.get(key);
  if (!set) {
    set = new Set();
    subscribers.set(key, set);
  }
  set.add(cb);
  return () => {
    set!.delete(cb);
    if (set!.size === 0) subscribers.delete(key);
  };
}

async function fetchAndCache(
  key: string,
  tab: string,
): Promise<GridItem[]> {
  // Dedupe concurrent fetches for the same key.
  const existing = inflight.get(key);
  if (existing) return existing;

  const promise = ardaClient
    .fetchAllItems({ tab })
    .then((rows) => {
      cache.set(key, rows);
      inflight.delete(key);
      notify(key);
      return rows;
    })
    .catch((err) => {
      inflight.delete(key);
      throw err;
    });

  inflight.set(key, promise);
  return promise;
}

// Exported invalidation API for mutation handlers and login flow.
// Drops every cache entry for the given tenant across all tabs.
export function invalidateAllItemsCache(tenantId: string) {
  for (const key of Array.from(cache.keys())) {
    if (key.startsWith(`${tenantId}:`)) {
      cache.delete(key);
      notify(key);
    }
  }
}

export function useAllItemsQuery() {
  const tenantId = useAppSelector(selectTenantId);
  const tab = useAppSelector(selectActiveTab);
  const key = tenantId ? cacheKey(tenantId, tab) : null;

  // Subscribe to cache changes for this key so the component re-renders
  // when another mutation or another hook instance updates the cache.
  const data = useSyncExternalStore(
    useCallback((cb) => (key ? subscribe(key, cb) : () => {}), [key]),
    useCallback(() => (key ? cache.get(key) : undefined), [key]),
    () => undefined,
  );

  const [isLoading, setIsLoading] = useState(false);
  const [error, setError] = useState<Error | null>(null);

  const refetch = useCallback(async () => {
    if (!key || !tenantId) return;
    setIsLoading(true);
    setError(null);
    try {
      await fetchAndCache(key, tab);
    } catch (err) {
      setError(err as Error);
    } finally {
      setIsLoading(false);
    }
  }, [key, tenantId, tab]);

  // Goal doc: refresh on list mount. Every time this hook mounts (new ItemsPage
  // navigation, or tab change causing a re-key), trigger a background fetch.
  // The BFF unstable_cache with a 2h TTL absorbs the cost on the server side —
  // "always refetch" is just a BFF round-trip, not a full backend loop.
  useEffect(() => {
    refetch();
  }, [refetch]);

  return { data, isLoading, error, refetch };
}

What this gives us:

Mount refetch via useEffect(refetch, [refetch]) — fires on every ItemsPage mount and every tab change, per the Goal document’s refresh policy.
Request dedupe via the inflight map — two components mounting simultaneously share one network request.
Cross-hook reactivity via useSyncExternalStore — if another component (or a mutation handler) updates the cache, every mounted consumer re-renders with the new data.
Cached across component mounts within the same browser tab session — opening the details panel and closing it doesn’t re-fetch, because the module-scoped cache survives the ItemsPage unmount/remount cycle.
Per-tenant isolation — each tenant gets its own cache entry.
Tenant-wide invalidation API via invalidateAllItemsCache(tenantId) — called from the login flow (§5.3.3) and from mutation handlers (§5.3.4).

What we give up vs React Query:

Window-focus refetch (not in the Goal doc’s requirements)
Devtools (the cache is a Map — console.log works fine)
Stale-while-revalidate UX niceties like placeholderData — but since the cache entry is kept while a refetch runs, the grid never sees undefined data on a subsequent refetch anyway
Shared machinery across other cached endpoints — irrelevant since this is our only one

If the app later grows three or more cached endpoints that all want this behavior, revisit React Query then. For this feature, the custom hook is the right amount of code.

5.3 BFF fat-fetch endpoint and cache

5.3.1 Route

File: src/app/api/arda/items/all/route.ts (new)

import { NextRequest, NextResponse } from 'next/server';
import { processJWTForArda } from '@/lib/jwt';
import { extractErrorMessage } from '@/lib/errors';
import { generateRequestId } from '@/lib/api-route-utils';
import { getCachedAllItems } from '@/lib/cache/itemsAllCache';

export async function GET(request: NextRequest) {
  try {
    const jwtResult = await processJWTForArda(request);
    if (!jwtResult.success) {
      return NextResponse.json(
        { ok: false, error: jwtResult.error },
        { status: jwtResult.statusCode },
      );
    }

    const { userContext } = jwtResult;
    const url = new URL(request.url);
    const tab = url.searchParams.get('tab') ?? 'all';
    const requestId = generateRequestId();

    const cachedFetch = getCachedAllItems(userContext.tenantId, userContext);
    const { status, data, metadata } = await cachedFetch(tab, requestId);

    const response = NextResponse.json(data, { status });
    response.headers.set('X-Items-Count', String(metadata.count));
    response.headers.set('X-Items-Hard-Max-Hit', metadata.hardMaxHit ? 'true' : 'false');
    return response;
  } catch (error) {
    console.error('ARDA All Items Request Error:', error);
    return NextResponse.json(
      { ok: false, error: 'Upstream request failed', details: extractErrorMessage(error) },
      { status: 500 },
    );
  }
}

5.3.2 Cache wrapper with internal cursor loop

File: src/lib/cache/itemsAllCache.ts (new)

import { unstable_cache } from 'next/cache';
import { env } from '@/lib/env';
import type {
  ArdaQueryItemsRequest,
  ArdaQueryResponse,
  ArdaItemPayload,
} from '@/types/arda-api';
import type { UserContext } from '@/lib/jwt';
import { stripToGridColumns, type GridItem } from '@/lib/mappers/itemsGridMapper';

// Background TTL ceiling — matches the Goal document's "at minimum once every
// 2 hours" refresh policy. Explicit refresh triggers (write mutations, user login,
// list mount) cover freshness during an active session; this TTL is the safety net
// for sessions that stay open without interaction.
export const ITEMS_ALL_TTL_SECONDS = 2 * 60 * 60; // 7200s = 2h
export const BACKEND_PAGE_SIZE = 500;             // backend page size for the internal loop

// HARD_MAX is bound by the CloudFront 1MB response limit on Amplify Hosting,
// not by AG Grid browser performance (which tolerates ~20k). Math: ~0.5KB per
// stripped item × 5x gzip compression ≈ 100 bytes per item on the wire. At 1MB
// total, that's ~10k items theoretical, ~8k with safety margin.
// Prototype (§8) validates the real number against a live Amplify deployment
// before sprint 1 starts — this constant may move up or down based on what
// actual compression ratio we get for realistic item data.
export const HARD_MAX_ITEMS = 8_000;              // phase 2 trigger (CloudFront-bound)

export function itemsCacheTag(tenantId: string): string {
  return `items:${tenantId}`;
}

async function fetchPage(
  body: ArdaQueryItemsRequest,
  userContext: UserContext,
  requestId: string,
): Promise<ArdaQueryResponse> {
  const upstream = await fetch(`${env.BASE_URL}/v1/item/item/query`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${env.ARDA_API_KEY}`,
      'X-Request-ID': requestId,
      'X-Author': userContext.author,
      'X-Tenant-Id': userContext.tenantId,
      'X-oidc-subject': userContext.userId,
    },
    body: JSON.stringify(body),
    cache: 'no-store',
  });
  if (!upstream.ok) {
    throw new Error(`Upstream page fetch failed: ${upstream.status}`);
  }
  return upstream.json() as Promise<ArdaQueryResponse>;
}

function buildTabFilter(tab: string): ArdaQueryItemsRequest['filter'] {
  // Tab → filter mapping. Extend this switch as new tabs are added.
  switch (tab) {
    case 'active':
      return { and: [{ or: [{ locator: 'classification.status', eq: 'active' }] }] };
    case 'archived':
      return { and: [{ or: [{ locator: 'classification.status', eq: 'archived' }] }] };
    case 'all':
    default:
      return true;
  }
}

async function fetchAllItemsUncached(
  tab: string,
  userContext: UserContext,
  requestId: string,
): Promise<{
  status: number;
  data: GridItem[];
  metadata: { count: number; hardMaxHit: boolean };
}> {
  const all: ArdaItemPayload[] = [];
  let pageToken: string | undefined;
  let pageIndex = 0;

  while (true) {
    const page = await fetchPage(
      {
        filter: buildTabFilter(tab),
        paginate: { index: pageIndex, size: BACKEND_PAGE_SIZE },
        ...(pageToken ? { pageToken } : {}),
      },
      userContext,
      requestId,
    );

    all.push(...page.results.map(r => r.payload));

    if (!page.nextPage || all.length >= HARD_MAX_ITEMS) break;
    pageToken = page.nextPage;
    pageIndex += 1;
  }

  const hardMaxHit = all.length >= HARD_MAX_ITEMS;
  if (hardMaxHit) {
    console.warn(
      `[itemsAllCache] HARD_MAX_ITEMS (${HARD_MAX_ITEMS}) reached for tenant ${userContext.tenantId} tab ${tab}. ` +
        'This is the phase 2 trigger — escalate to Server-Side Row Model.',
    );
  }

  const gridItems = all.map(stripToGridColumns);
  return {
    status: 200,
    data: gridItems,
    metadata: { count: gridItems.length, hardMaxHit },
  };
}

// Tenant-scoped. Key is explicit: (tenantId, tab). userContext is closed over but
// NOT part of the key — item data is tenant-scoped, so users from the same tenant
// share cache entries. If this endpoint ever gains user-specific personalization,
// add userId to the key array.
export function getCachedAllItems(tenantId: string, userContext: UserContext) {
  return async (tab: string, requestId: string) => {
    const cached = unstable_cache(
      async () => fetchAllItemsUncached(tab, userContext, requestId),
      ['items-all', tenantId, tab],
      {
        tags: [itemsCacheTag(tenantId)],
        revalidate: ITEMS_ALL_TTL_SECONDS,
      },
    );
    return cached();
  };
}

Three critical design choices:

The loop runs inside unstable_cache. The cache stores the result of the entire loop, not individual page responses. On a cache hit the loop is skipped entirely — the cached full list is returned in one shot.
stripToGridColumns is the CloudFront response size mitigation. The grid renders ~15-20 fields out of the full item schema; everything else is omitted. The details panel still loads the full item payload on demand via the existing single-item GET endpoint. See §6.1 for the 1MB limit that forces this.
The response is explicitly compressed. The API route must set Content-Encoding: gzip on the response — either via Next.js’s default compress: true setting (verify it applies on Amplify’s SSR runtime) or by manually gzipping the body with zlib.gzipSync(). Without compression the 1MB CloudFront limit binds at ~2000 items instead of ~8000. See §6.1.

Note on unstable_cache deprecation: Next.js 16 marks unstable_cache as deprecated in favor of the new use cache directive (introduced stable in Next.js 16.0 as part of Cache Components). The deprecation is documentation-only — unstable_cache still works and will ship functional for the foreseeable future. We’re using it for sprint 1 because:

Migrating to use cache requires enabling the cacheComponents: true flag in next.config.ts, which affects the entire app’s caching behavior, not just this feature. That’s out of scope for a filter/sort sprint.
The serverless runtime story is the same either way. Both unstable_cache and use cache default to per-instance in-memory caching; neither gives cluster-wide persistence on Lambda without a custom cacheHandlers / use cache: remote configuration. Switching the directive name doesn’t change the Amplify Hosting cache-hit-rate characteristics (see §6.2).
A follow-up task can migrate unstable_cache → use cache + cacheTag + cacheLife once cacheComponents is enabled app-wide. The migration is a mechanical rename for our use case.

5.3.3 Four cache refresh triggers

All four Goal document refresh triggers are implemented:

Trigger	Where	How	Client-side effect
Item mutation (create/update/delete/publish)	BFF write routes + client mutation hook	`revalidateTag(itemsCacheTag(tenantId))` on BFF; client mutation hook calls `patchItemInCache` / `addItemToCache` / `removeItemFromCache`	Surgical patch — the single affected row is updated in place; grid re-renders only that row. No refetch, no loading state. See §5.3.4
User login	Sign-in flow	`invalidateAllItemsCache(tenantId)` on the client + a server action calling `revalidateTag`	Full drop. Next read triggers a fresh fat fetch
List mount	Items page	`useAllItemsQuery`’s mount effect runs every time `ItemsPage` mounts or the active tab changes	Background refetch. Existing cached rows stay visible during the fetch so there’s no flash
2-hour ceiling	BFF `unstable_cache`	`ITEMS_ALL_TTL_SECONDS = 7200` background TTL	Next request after expiry re-loops the backend

Write-path invalidation — one line per route, applied in items/route.ts POST, items/[entityId]/route.ts PUT/DELETE, and items/[entityId]/draft/route.ts on publish:

import { revalidateTag } from 'next/cache';
import { itemsCacheTag } from '@/lib/cache/itemsAllCache';

if (upstream.ok) {
  revalidateTag(itemsCacheTag(userContext.tenantId));
}

Tenant-wide scope. Any write invalidates the full tenant dataset; the next read re-loops once. Simple, correct, cheap.

Login-path invalidation — after authThunks.signIn succeeds:

Call a new server action revalidateItemsCache(tenantId) that wraps revalidateTag(itemsCacheTag(tenantId)) on the BFF side.
Call invalidateAllItemsCache(tenantId) on the client to drop every cached tab entry for that tenant.

Rationale: a user logging back in after hours expects fresh data regardless of whether a warm Lambda still has a snapshot.

List-mount refresh — useAllItemsQuery’s mount effect fires on every navigation to /items and on every active-tab change (because the grid is keyed on activeTab, forcing a remount per tab — see §5.4.3). If the BFF tag is fresh the BFF returns a cache hit in one shot. If stale, the BFF loops the backend once. This is the important compromise between the Goal doc’s “refresh on list mount” and Lambda per-instance cache reality: we guarantee the BFF is consulted on every mount, and the BFF honors staleness correctly.

5.3.4 Client-side mutation integration — surgical cache updates

Every item mutation (create / update / delete / publish draft) touches exactly one item. Full tenant-wide invalidation on every edit would trigger a 5-15 second refetch of 5,000 items every time a user saves a single field — that’s bad UX and wasted backend load. Mutations patch the cached array in place instead.

This is the default behavior, not an optional optimization. The module-scoped cache in §5.2 already has everything needed; we just export three small helpers and call them from mutation handlers.

Exported from useAllItemsQuery.ts:

/**
 * Surgically update one item in every cached tab entry for a tenant.
 * Called from mutation handlers after the backend confirms a write.
 */
export function patchItemInCache(
  tenantId: string,
  entityId: string,
  patch: Partial<GridItem>,
): void {
  for (const key of Array.from(cache.keys())) {
    if (!key.startsWith(`${tenantId}:`)) continue;
    const rows = cache.get(key);
    if (!rows) continue;
    const idx = rows.findIndex((r) => r.entityId === entityId);
    if (idx === -1) continue;
    const updated = [...rows];
    updated[idx] = { ...updated[idx], ...patch };
    cache.set(key, updated);
    notify(key);
  }
}

/** Surgically remove one item from every cached tab entry. */
export function removeItemFromCache(tenantId: string, entityId: string): void {
  for (const key of Array.from(cache.keys())) {
    if (!key.startsWith(`${tenantId}:`)) continue;
    const rows = cache.get(key);
    if (!rows) continue;
    const filtered = rows.filter((r) => r.entityId !== entityId);
    if (filtered.length !== rows.length) {
      cache.set(key, filtered);
      notify(key);
    }
  }
}

/** Append a newly-created item to a specific tab's cache entry. No-op if
 *  that tab hasn't been visited yet — the next mount will fetch it fresh. */
export function addItemToCache(
  tenantId: string,
  tab: string,
  item: GridItem,
): void {
  const key = `${tenantId}:${tab}`;
  const rows = cache.get(key);
  if (!rows) return;
  cache.set(key, [...rows, item]);
  notify(key);
}

Mutation hooks:

// src/hooks/useItemMutations.ts (new)
import { useAppSelector } from '@/store/hooks';
import { selectTenantId } from '@/store/slices/authSlice';
import {
  patchItemInCache,
  removeItemFromCache,
  addItemToCache,
} from '@/hooks/useAllItemsQuery';
import { ardaClient } from '@/lib/ardaClient';
import { stripToGridColumns } from '@/lib/mappers/itemsGridMapper';

export function useUpdateItem() {
  const tenantId = useAppSelector(selectTenantId);
  return async (entityId: string, patch: ItemPatch) => {
    const updated = await ardaClient.updateItem(entityId, patch);
    if (tenantId) {
      patchItemInCache(tenantId, entityId, stripToGridColumns(updated));
    }
    return updated;
  };
}

export function useCreateItem() {
  const tenantId = useAppSelector(selectTenantId);
  return async (input: CreateItemInput, tab: string) => {
    const created = await ardaClient.createItem(input);
    if (tenantId) {
      addItemToCache(tenantId, tab, stripToGridColumns(created));
    }
    return created;
  };
}

export function useDeleteItem() {
  const tenantId = useAppSelector(selectTenantId);
  return async (entityId: string) => {
    await ardaClient.deleteItem(entityId);
    if (tenantId) {
      removeItemFromCache(tenantId, entityId);
    }
  };
}

The BFF revalidateTag call still happens on the server side (see §5.3.3 write-path invalidation). This is defense-in-depth, not redundancy:

The client-side surgical patch gives instant UX — user sees their edit reflected without a loading state.
The BFF revalidateTag ensures server-side cache consistency for subsequent reads by a different Lambda instance, a fresh browser tab that doesn’t have the module-scoped cache populated yet, or another user in the same tenant.

Both are needed. Without the client-side patch, users see 5-15 seconds of loading on every edit. Without the BFF revalidateTag, stale BFF cache entries could serve old data to other browser tabs or tenants for up to 2 hours.

Error handling: the code above is pessimistic — it only patches the cache after the backend confirms success. If the save fails, the cache stays consistent and the user sees an error with their unsaved change still in the editor. This is the safer default for sprint 1.

A more responsive optimistic variant (patch the cache immediately, roll back on failure) is a sprint 2 candidate — it’s a small addition on top of the infrastructure above but adds rollback complexity that isn’t worth it until we’ve measured whether pessimistic updates feel fast enough.

When full-tenant invalidation IS still the right call: login, list mount (via useAllItemsQuery’s mount effect), and the 2-hour TTL ceiling. Those events want a fresh view of the whole dataset, not surgical patches. Those paths use invalidateAllItemsCache(tenantId) (also exported from useAllItemsQuery.ts) to drop every cache entry for the tenant. See §5.3.3 for the four refresh triggers and which path each one takes.

5.3.5 TTL reasoning

The numbers come from the Goal document, not arbitrary tuning:

BFF TTL: 2 hours — Goal doc’s stated minimum refresh ceiling. Explicit triggers (mutation, login, list mount) cover active-session staleness; this TTL is the safety net for idle sessions. Should be made configurable via env var per the Goal doc’s “should be configurable” qualifier — defer env plumbing to a follow-up unless sprint capacity allows.
Client cache: browser-tab lifetime, no TTL — the module-scoped Map in useAllItemsQuery persists for as long as the browser tab is open. It’s dropped by the explicit refresh triggers (mutation, login, list mount) and rebuilt on next use. No client-side TTL is needed because the BFF TTL is the source of truth for background freshness.
Brief-navigation optimization — because the cache is module-scoped (not component-state), opening the details panel and closing it, or navigating to /order-queue and back, returns instantly with zero network activity. The cache entry is only invalidated by a mutation or an explicit invalidateAllItemsCache call.

5.3.6 Metrics for phase 1

Emit response headers on every fat-fetch request:

X-Cache-Source: hit|miss (from a lightweight wrapper around the cached function call)
X-Items-Count: <n>
X-Items-Hard-Max-Hit: true|false

Log (not header): upstream loop duration, page count when uncached, total response body size (informs the 1MB CloudFront/Amplify limit watch).

Without these, phase 2 and phase 3 cannot be justified or ruled out on evidence.

5.3.7 Why not a shared Redis cache today

Requires new infrastructure (VPC, secrets, cost, ops).
No hit-rate data yet to justify the investment.
2-hour TTL plus tenant-wide revalidateTag on writes keeps staleness tolerable.

Phase 3 trigger: per-instance hit rate below ~30% after one production week, or product reports of cross-session staleness that TTL tuning cannot fix.

5.4 AG Grid wiring

5.4.1 Column definitions — filter and sort live on EACH column

File: src/components/table/columnPresets.tsx

Every filter and sort control in this feature is attached to an individual column header, not to a global toolbar. AG Grid’s column menu exposes:

A sort toggle (asc / desc / none) per column, with Shift-click adding secondary columns for multi-column sort.
A filter icon per column that opens a filter popover. The popover’s contents depend on the column’s filter type (text input, checkbox list, number range, date range, etc.).

Turning this on is literally two lines:

export const itemsDefaultColDef: ColDef<items.Item> = {
  sortable: true,          // per-column sort toggle in the header
  filter: true,            // was false — per-column filter icon in the header
  floatingFilter: false,   // off for sprint 1; can be added later as an always-visible row under headers
  resizable: true,
  suppressMovable: false,
  sortingOrder: ['asc', 'desc', null],
};

Then each column chooses its filter type based on the data it displays:

Column data shape	AG Grid filter	Operators the user gets	Example columns
Free text	`agTextColumnFilter`	contains, equals, starts with, ends with, not contains, not equal	`name`, `internalSKU`, `description`
Enum / fixed set	`agSetColumnFilter` (Enterprise)	checkbox list of every unique value in the column with “select all”	`classification.type`, `classification.subtype`, `status`
Number	`agNumberColumnFilter`	equals, not equal, greater than, less than, between, blank	`quantity`, `cost`, `sellPrice`
Date	`agDateColumnFilter`	equals, not equal, before, after, between	`createdAt`, `updatedAt`
Boolean	`agSetColumnFilter` with true/false values	two-checkbox list	`isActive`, `isArchived`

Select, action, and computed columns keep filter: false, sortable: false — they have nothing meaningful to filter or sort on.

All operators for all filter types work out of the box, because AG Grid runs against the full in-memory dataset. There is no “supported vs unsupported” matrix — every operator the popover exposes is functional. When a user sets “cost < 50” on one column and “supplier is Acme” on another and sorts by “updatedAt desc”, AG Grid combines all three with AND + the multi-sort priority and updates the visible rows instantly. None of that combination logic is code we write — it’s what we get from AG Grid’s filter/sort model.

5.4.2 Grid component props

File: src/app/items/ItemTableAGGrid.tsx

<ArdaGrid
  ref={gridRef}
  rowData={rows}                                 // from useAllItemsQuery() — full published items array
  columnDefs={columnDefs}
  defaultColDef={itemsDefaultColDef}              // filter: true + sortable: true per column (§5.4.1)
  rowModelType="clientSide"                      // explicit — the whole architecture depends on it
  enableFiltering                                // turns on the per-column filter menu in the header
  enableMultiSort                                // lets the user hold Shift to add secondary sort columns
  pagination                                     // AG Grid built-in client-side pagination
  paginationPageSize={50}
  paginationPageSizeSelector={[25, 50, 100, 200]}
  onFilterChanged={handleFilterChanged}          // writes filterModel to Redux
  onSortChanged={handleSortChanged}              // writes sortModel to Redux
  quickFilterText={searchValue}                  // global search across all rendered columns
  initialState={initialGridState}                // restores filter + sort from Redux
  loading={isLoading}
  overlayLoadingTemplate={LOADING_TEMPLATE}
  // ... existing props (selection, cell editing, notes handlers, etc.) ...
/>

Note: rows is the full array of published items from useAllItemsQuery, not a page. AG Grid’s built-in pagination handles displaying it 50 at a time. Each column in columnDefs gets its own filter icon, filter popover, and sort controls — nothing about “per-column” is custom code, it’s the default AG Grid behavior once filter: true and sortable: true are set.

Important clarification on prop names: enableFiltering and enableMultiSort are props on the project’s ArdaGrid wrapper, not on the raw AgGridReact component. AG Grid itself uses different mechanisms:

Filtering is enabled per-column via filter: true in the column definition (§5.4.1) — there is no top-level enableFiltering prop in AG Grid.
Multi-column sort is enabled by default via the multiSortKey grid option (defaults to 'shift', meaning users hold Shift and click column headers to add secondary sort columns). If we want multi-sort without any modifier key, we’d set alwaysMultiSort: true. To disable it entirely, suppressMultiSort: true. There is no enableMultiSort prop in AG Grid.

Task 8 of the sprint plan (§7.1) includes wiring these wrapper props through to the correct AG Grid options so they actually do what the prop names imply. Don’t grep AG Grid’s docs for enableMultiSort — you won’t find it.

5.4.3 Handlers and state restoration

import {
  setFilterModel,
  setSortModel,
} from '@/store/slices/itemsFilterSortSlice';

const dispatch = useAppDispatch();

const handleFilterChanged = useCallback((event: FilterChangedEvent) => {
  dispatch(setFilterModel(event.api.getFilterModel()));
}, [dispatch]);

const handleSortChanged = useCallback((event: SortChangedEvent) => {
  const sortModel = event.api.getColumnState()
    .filter(c => c.sort != null)
    .map(c => ({
      colId: c.colId,
      sort: c.sort as 'asc' | 'desc',
      sortIndex: c.sortIndex ?? undefined,
    }));
  dispatch(setSortModel(sortModel));
}, [dispatch]);

const persistedFilter = useAppSelector(s => s.itemsFilterSort.filterModel);
const persistedSort = useAppSelector(s => s.itemsFilterSort.sortModel);

const initialGridState: GridState = useMemo(() => ({
  filter: { filterModel: persistedFilter },
  sort: { sortModel: persistedSort },
}), []); // mount-only

Neither handler triggers a refetch — the whole point of the architecture. Filter and sort events update Redux for persistence; AG Grid handles everything else in memory. The empty dep array on initialGridState is deliberate: after mount AG Grid owns the live state, and Redux is updated via the change handlers.

Note on initialState: AG Grid’s initialState prop is only applied on mount (confirmed by AG Grid v34 docs: “It is only read once when the grid is created”). If other tabs (Draft, Recently Uploaded) become active in the future and need separate filter/sort state, the grid should be re-keyed on the active tab (key={activeTab}) to force a fresh mount per tab. For now with only the Published tab active, this isn’t needed.

The existing search box in src/app/items/page.tsx builds ArdaQueryItemsRequest filters with regex clauses on every keystroke. With the full dataset in memory, we replace this with AG Grid’s quickFilter:

const [searchValue, setSearchValue] = useState('');
// ...
<SearchInput value={searchValue} onChange={setSearchValue} />
// ItemTableAGGrid receives searchValue as the quickFilterText prop.

Instant, covers every rendered column, simpler code. items-search.spec.ts becomes a regression test against the new implementation (§7.4).

5.5 `GridItem` type and column stripper

File: src/lib/mappers/itemsGridMapper.ts (new)

import type { Item } from '@/types/items';

export interface GridItem {
  entityId: string;
  name: string;
  internalSKU: string;
  status: string;
  type: string;
  subtype: string;
  quantity: number;
  cost: number;
  sellPrice: number;
  supplier: string;
  location: string;
  createdAt: string;
  updatedAt: string;
  isActive: boolean;
  isArchived: boolean;
  // Deliberately narrow. No long descriptions, no extended attributes,
  // no file attachments. Full item payload is loaded on-demand by the
  // details panel via the existing single-item GET endpoint.
}

export function stripToGridColumns(item: Item): GridItem {
  return {
    entityId: item.entityId,
    name: item.name,
    internalSKU: item.internalSKU,
    status: item.classification?.status ?? '',
    type: item.classification?.type ?? '',
    subtype: item.classification?.subtype ?? '',
    quantity: item.quantity ?? 0,
    cost: item.cost ?? 0,
    sellPrice: item.sellPrice ?? 0,
    supplier: item.supplier?.name ?? '',
    location: item.location?.name ?? '',
    createdAt: item.createdAt,
    updatedAt: item.updatedAt,
    isActive: item.isActive ?? true,
    isArchived: item.isArchived ?? false,
  };
}

The exact field list will be refined against columnPresets.tsx during implementation. The goal is “exactly what the grid renders, nothing more.”

6. Amplify Hosting considerations

6.1 CloudFront 1MB response size — the principal risk

Amplify Hosting routes all SSR/API-route responses through CloudFront, which enforces a ~1MB limit on the combined response body and headers. This is confirmed by a real production bug report (aws-amplify/amplify-hosting#3214) where a Next.js SSR app hit: “The Lambda function returned an invalid response, the length of the body and header was 1.5mb bytes which exceeds the cloudfront limit of 1mb bytes.”

This is meaningfully tighter than a raw Lambda function limit — it applies to every Amplify Hosting SSR response, including our fat-fetch endpoint. This is the hardest technical constraint on the architecture.

Sizing math for 5k items:

Full Item JSON: ~1-3KB per row → 5k items ≈ 5-15MB raw → blows the limit by 5-15x uncompressed.
With stripToGridColumns (~0.5KB per grid row): 5k × 0.5KB ≈ 2.5MB raw → still over the limit uncompressed.
With stripping + Content-Encoding: gzip on the response: ~500KB gzipped → fits under 1MB with headroom, assuming CloudFront measures post-compression.

The compression piece is load-bearing. The architecture only works if:

The API route response is compressed (Content-Encoding: gzip) before it reaches CloudFront, AND
CloudFront measures the post-compression size for its 1MB check.

Next.js has compress: true in next.config.ts by default, but the Amplify Hosting Compute runtime’s behavior with that setting is not guaranteed to “just work” — it depends on how the runtime intermediates Lambda responses to CloudFront. This is the single thing the prototype must validate against a real Amplify dev deployment before sprint 1 starts.

Mitigations built into the design:

stripToGridColumns returns only grid-relevant fields. Target: ≤0.5KB per item.
If default Next.js compression doesn’t apply on Amplify, explicitly gzip the response body in the API route using zlib and set Content-Encoding: gzip headers. Gives ~3-4x compression on JSON. Works regardless of the runtime’s default behavior.
HARD_MAX_ITEMS = 8_000 cap in the BFF as a primary safety net — 8k is the estimated ceiling under the 1MB CloudFront limit assuming ~0.5KB per stripped item and ~5x gzip compression. The prototype will validate the real number on a live Amplify deployment.
Prototype (§8) MUST validate actual response size on the dev Amplify environment, not just local. Local math doesn’t tell us whether CloudFront will accept the response.

6.1.1 Escalation plan if the 1MB limit is hit in production

In order of increasing complexity:

More aggressive stripping — cut GridItem to <200 bytes per row (entityId + name + a few IDs). Richer columns move to lazy-load on row hover/expand. Feasible if product accepts reduced grid density. Buys us headroom to ~5k-8k items.
Manual compression in the API route — explicitly gzip the response body with zlib and set Content-Encoding: gzip headers. Works regardless of whether the runtime compresses by default. Should land as part of task 3 in the sprint so we don’t depend on implicit behavior.
Streaming responses — Amplify Hosting Compute supports Next.js streaming responses (ReadableStream / IterableReadableStream), which have documented exemptions from the CloudFront body-size check. Moderate implementation cost: useAllItemsQuery needs to consume a stream progressively.
Migrate to AG Grid Server-Side Row Model against the BFF cache — instead of sending the whole dataset, respond to SSRM datasource requests with one page at a time. This is effectively phase 2 pulled forward without the backend dependency, but it’s a significant architectural shift and loses most of the client-side-everything benefits.

Recommendation: land mitigation #2 (manual compression) as part of the initial BFF route implementation in task 3, not as a later fix. Treat it as default architecture, not contingency. Reserve #3 and #4 as escape hatches if the prototype shows #1 + #2 still exceeds the limit.

HARD_MAX_ITEMS = 8_000 reflects the CloudFront constraint. The cap is lower than AG Grid’s browser performance ceiling (~20k) because the response-size limit binds first. Updated in the §5.3.2 code block.

6.2 Per-instance cache on Lambda

unstable_cache defaults to a filesystem cache handler which on Lambda is ephemeral /tmp. Cache hits only span warm-invocation reuse on the same Lambda instance. For this feature that’s acceptable: 2-hour TTL plus one query per tenant means hit rate should be reasonable. Measure and escalate to phase 3 only if production metrics demand it.

6.3 Inline build spec gotcha

Three Amplify apps with inline build specs override the committed amplify.yml. Any pipeline change must be applied per-app via aws amplify update-app or it silently does nothing. Not a feature concern — just a repeat reminder for anyone touching CI as part of this feature.

6.4 Unit tests in the build pipeline

New unit tests must be reliable or they block deployments. All new tests in this design target pure functions or mocked grid APIs — no timing assertions.

7. Sprint scope

7.1 Task breakdown

Estimates: S ≈ 0.5 day, M ≈ 1-2 days, L ≈ 3-5 days. Round up when in doubt — filter/sort work in a mature grid is historically underestimated.

#	Task	Size	Files
1	`itemsFilterSortSlice` + persist config + schema version migration	S	`itemsFilterSortSlice.ts` (new), `rootReducer.ts`
2	`GridItem` type + `stripToGridColumns` mapper	S	`itemsGridMapper.ts` (new), `types/items.ts`
3	Fat-fetch BFF route with cursor loop, HARD_MAX guard, response headers	L	`app/api/arda/items/all/route.ts` (new), `lib/cache/itemsAllCache.ts` (new)
4	`revalidateTag` calls in write routes	S	`items/route.ts`, `items/[entityId]/route.ts`, draft route
5	`fetchAllItems` in `ardaClient`	S	`ardaClient.ts`
6	Custom `useAllItemsQuery` hook (module-scoped cache, dedupe, `useSyncExternalStore`) + `invalidateAllItemsCache` + mutation helpers	S	`useAllItemsQuery.ts` (new), `useItemMutations.ts` (new)
7	Enable `filter: true` + per-column filter types on every applicable column	M	`columnPresets.tsx`
8	Wire `onFilterChanged` / `onSortChanged` handlers + `initialState` restoration	M	`ItemTableAGGrid.tsx`
9	Items page migration from cursor fetching to `useAllItemsQuery`; remove cursor pagination UI; search box → `quickFilter`	L	`items/page.tsx`, `itemsSlice.ts` (remove pagination fields)
10	MSW handler for `/api/arda/items/all`	M	`src/mocks/handlers/`, `src/mocks/data/`
11	Unit tests: slice + migration, cursor loop, HARD_MAX guard, column stripper, `useAllItemsQuery` dedupe + invalidation	M	`*.test.ts` files
12	Unit tests: grid handlers dispatch + `quickFilter` propagation	S	`ItemTableAGGrid.test.tsx`
13	E2E: new `items-filter.spec.ts` covering all filter types + persistence across reloads	L	new file
14	E2E: extend `items-grid-interactions.spec.ts` for multi-sort + client-side pagination	S	existing
15	E2E: update `items-search.spec.ts` as a regression test against client-side `quickFilter`	S	existing
16	Post-implementation update to this proposal	S	this file

Rough total: 3 L + 4 M + 9 S (16 tasks). Shrunk vs the original plan because task 6 is now a self-contained custom hook instead of a new dependency + provider setup. Realistic for one sprint with one engineer plus buffer, or two engineers parallelizing BFF/API work against grid/UI work.

7.2 Dependencies

None blocking.
Soft risk: unknown backend cap on paginate.size. The internal cursor loop handles either case — if the backend returns everything in one page, the loop is a harmless one-iteration pass.
Hard risk over time: tenant item counts growing past ~8k (CloudFront-bound). Mitigated by HARD_MAX metric and the phase 2 escape hatch.
DX nicety: add NEXT_PUBLIC_AG_GRID_LICENSE_KEY to .env.example with a pointer to a shared dev key so local developers don’t see a watermark. Optional.

7.3 Risks and mitigations

Risk	Likelihood	Impact	Mitigation
Lambda response size exceeds limit with real data	Medium	Feature cannot ship	`stripToGridColumns`; prototype validates payload size first
Tenant has >8k items	Low-Medium	HARD_MAX triggers, grid shows incomplete data	HARD_MAX metric + documented phase 2 path. Prototype validates the real 8k threshold against live Amplify.
Initial fat fetch feels slow to users	Medium	UX regression from perceived latency	Aggressive BFF + client caching, loading skeleton, cached data shown during background refetch
Persisted filter model references renamed column IDs	Low	Restored state breaks grid	`schemaVersion` bump resets state on rehydrate
AG Grid `quickFilter` differs from old regex search	Low	Results shift slightly	Regression spec covers expected matches; document in PR
Backend `paginate.size` cap forces many loop iterations	Low	Slow fat-fetch on first miss	Metrics; tune `BACKEND_PAGE_SIZE` if allowed
Per-instance BFF hit rate is poor in production	Medium	Feature works but backend load higher than hoped	Phase 3 (shared cache) escalation, gated on metrics
Amplify build blocks on flaky tests	Low	Deploys blocked	Pure-function + mocked-grid tests only, no timing assertions

7.4 Test plan

Unit tests:

itemsFilterSortSlice.test.ts — reducer correctness, schema version migration.
itemsAllCache.test.ts — cursor loop terminates on empty nextPage, HARD_MAX guard fires at the right count, tag is present, stripToGridColumns is applied.
itemsGridMapper.test.ts — stripped item has exactly the grid-needed fields, no PII leakage beyond them.
useAllItemsQuery.test.ts — query key composition by (tenantId, tab).
ItemTableAGGrid.test.tsx additions — filter/sort handlers dispatch; quickFilter propagates; initialState restoration from Redux.

E2E tests (Playwright, mock mode):

items-filter.spec.ts (new):
- Apply text filter → rows narrow in place, no network request.
- Apply set filter with 2 values → verify rows.
- Combine text + set filter.
- Apply filter → reload → filter restored.
- Switch tabs → filter on original tab persists.
- Clear filters button resets to unfiltered.
items-grid-interactions.spec.ts (extend):
- Multi-column sort → verify row order.
- Sort persists across reload.
- Pagination page size selector works against the in-memory dataset.
items-search.spec.ts (regression):
- Search filters grid via quickFilter.
- Search clears on navigation away.

Manual smoke test (pre-deploy):

Local with MSW: load 5k mock items, confirm time-to-interactive, exercise filter/sort/pagination.
Dev Amplify env: verify fat-fetch response size, cache hit/miss headers, HARD_MAX header absent under normal load, no watermark visible.

7.5 Out of sprint 1 scope

Server-side filter/sort (phase 2, not available in current API).
AG Grid Server-Side Row Model migration (phase 2 escape hatch, only if HARD_MAX triggers).
Shared Redis BFF cache (phase 3, metrics-gated).
Saved filter views / shareable URLs.
Floating filters.
Advanced filter expression builder.
Optimistic updates for inline edits (optional, sprint 2 candidate).
Canary @arda-cards/api-proxy EntityDataGrid migration.
Env var plumbing for ITEMS_ALL_TTL_SECONDS (the constant is fine for sprint 1).

8. Feasibility prototype (in `ux-prototype` Storybook)

The prototype’s job is to validate that 5k rows with filter + sort + pagination is comfortable in AG Grid client-side row model, and that a stripped-column payload stays under the Lambda response limit.

8.1 Minimum prototype surface

One Storybook story rendering an AG Grid with 5,000 mock items in GridItem shape.
At least four columns with different filter types: agTextColumnFilter, agSetColumnFilter, agNumberColumnFilter, agDateColumnFilter.
Multi-column sort enabled.
AG Grid built-in pagination, page size 50.
A JSON panel showing the live filter model and sort model.
A diagnostics panel showing: total rows, payload size estimate (JSON.stringify(rows).length, optionally gzipped), initial render time, time-to-filter-applied after a filter change.

8.2 What the prototype validates

Performance at 5k. Initial render, filter/sort responsiveness, memory footprint.
Payload size reality check. Does stripped GridItem × 5k fit comfortably inside the gzipped Lambda response limit?
Filter model stability. Confirms FilterModel is JSON-serializable and restorable via initialState.
AG Grid pagination UX. Confirms the built-in pagination is an acceptable replacement for the current cursor UI.

8.3 What the prototype does NOT do

Does not talk to the real backend.
Does not test BFF cache or cursor loop (Next.js-side concerns).
Does not validate Redux persistence (unit tested in arda-frontend-app).
Does not migrate to canary EntityDataGrid primitives.

8.4 Fail-fast conditions

If the prototype shows any of the following at 5k rows, stop sprint 1 planning and revisit the design:

Initial render > 2s.
Filter/sort interaction > 200ms.
Stripped payload > 1MB gzipped (the CloudFront response-size limit from §6.1 — measured post-compression on the dev Amplify deployment, not locally).

Any of these suggests Server-Side Row Model or partial loading is needed, which means a backend conversation and a different sprint shape.

9. Migration path

PR sequence, each independently shippable and targeting dev:

PR 1 — Slice + persistence + GridItem mapper. Purely additive. No UI changes.
PR 2 — BFF fat-fetch route + cache wrapper + MSW handler. Additive. New endpoint nothing consumes yet. Unit tested.
PR 3 — revalidateTag calls in write routes. Additive, one line per route.
PR 4 — useAllItemsQuery custom hook + fetchAllItems in ardaClient + mutation helpers. Additive client plumbing. No new dependencies.
PR 5 — Enable filter UI + wire handlers + initialState restoration. User-visible but transitional: filters work client-side over the still-cursor-paginated dataset. Users can interact with filter UI but it only narrows the visible page. Ship behind a feature flag if stakeholders want rollback headroom.
PR 6 — The flip. Switch items page from cursor fetching to useAllItemsQuery, remove cursor pagination UI, enable AG Grid built-in pagination, migrate search box to quickFilter. Architecturally significant PR — deserves extra review. E2E tests updated alongside.
PR 7 — Remove dead code. Cursor fields from itemsSlice, old server-side search logic from items/page.tsx, any orphaned handlers.
PR 8 — Docs update to this file capturing anything that changed between design and implementation.

PR 5 is the feature-flag moment if the team wants progressive rollout. Otherwise PRs 5 and 6 can land together.

Each PR targets dev. No force-pushes. Conventional commit footer per project norms.

10. Answering the ticket acceptance criteria

Criterion	Met by
Architecture document covers data flow, state management, column filter type mapping, and migration path	§4 (data flow), §5 (state + grid wiring + column filter types), §9 (migration path)
Feasibility prototype demonstrates at least single-column filter + sort with AG Grid Enterprise	§8 — Storybook story with 5k rows and four filter types
Sprint proposal has task-level estimates and identifies dependencies	§7.1 (tasks with sizes), §7.2 (no blocking deps)
Prototype validates that BFF caching approach works with the current API pagination model	§5.3 — BFF absorbs cursor pagination internally, caches the full list per tenant for 2 hours with four explicit refresh triggers
Validates the architectural decisions in the Goal document	§2 — every adopted decision is mapped to a concrete implementation; §2.2 explicitly interprets the “server side rendering of the page against cached items” phrase

Honest restatement of the cache criterion: the existing cursor pagination is incompatible with client-side filter/sort over any dataset larger than one page, so we abandon cursor pagination as a client-facing concept and let the BFF absorb it. The client asks for “all items for this tenant and tab,” the BFF loops the backend cursor internally until exhausted, strips each item to grid columns, and caches the result as a single entity under a 2-hour TTL with four explicit refresh triggers. This matches the Goal document’s adopted BFF-cache design and satisfies the ticket’s validation requirement.

11. Decisions recorded

Decision	Alternative considered	Why chosen
Client-side everything (fat fetch + AG Grid in-memory)	Hybrid with server-side narrowing via translator	Server-side sort and richer filter operators are not available in the current API; 5k items is within AG Grid’s comfort zone
BFF absorbs cursor loop internally	Expose cursor pagination to the client	Client-side filter/sort over a single page is broken UX; cursor mechanics are an implementation detail
Stripped `GridItem` payload, not full `Item`	Return full items	Lambda response size ~1MB cap — stripping keeps us safely under it
`HARD_MAX_ITEMS = 8_000`	No cap, or 20k (browser perf)	Bounded by the Amplify CloudFront 1MB response limit (~0.5KB per stripped item × 5x gzip compression ≈ 10k theoretical, 8k with safety margin). Prototype validates against real Amplify dev. This is tighter than the browser-performance ceiling of ~20k
Custom `useAllItemsQuery` hook, not React Query	Add `@tanstack/react-query` as a new dep	Only one cached endpoint in this feature. A ~90-line module-scoped hook delivers mount-refetch, dedupe, cross-component reactivity, and tenant-wide invalidation with no new library or provider. React Query is worth adopting when the app has three or more cached endpoints sharing the machinery — for one endpoint it’s overhead
Published-only scope for sprint 1	Per-tab `Record<tab, filterModel>`	Only the Published Items tab is active today. Building per-tab state infrastructure for tabs that don’t exist yet is premature complexity. If Draft/Recently Uploaded tabs go live later, extending the slice to `Record<tab, ...>` is a small follow-up
No `key={activeTab}` remount needed for sprint 1	Re-key the grid on tab change	Only one tab is active (Published), so no tab switching to handle. If tabs are added later, `key={activeTab}` forces a clean remount per tab
Separate `itemsFilterSortSlice` from `itemsSlice`	Extend `itemsSlice`	Avoids merge friction with in-flight `itemsSlice` work, keeps persistence targeted
Schema version in the slice	No version	Cheap hedge against column renames invalidating persisted state
AG Grid `quickFilter` for search	Keep server-side regex	With full dataset in memory, client-side search is instant and covers every column
AG Grid built-in client-side pagination	Custom pagination UI	Native to client-side row model; custom would fight the grid
`rowModelType: 'clientSide'` explicit	Rely on default	The whole architecture depends on it; make it visible in code
`unstable_cache` per-Lambda-instance BFF cache	In-memory LRU	Ergonomic, composes with `revalidateTag`, same per-instance limitation as LRU but with better primitives
2-hour BFF TTL	30s / 3 min	Matches Goal doc’s stated refresh ceiling; explicit triggers cover active-session staleness
Tenant-wide `revalidateTag` on writes	Per-entity tags	Simple, correct, cheap; the TTL bounds staleness anyway
Phase 2 reserved for Server-Side Row Model + backend	Ship phase 2 alongside	Server-side sort/filter is a future API capability; phase 2 is scoped for when it becomes available
Phase 3 reserved for shared Redis cache	Ship phase 3 in sprint 1	No metrics to justify infra cost; per-instance cache + short TTL is enough for phase 1
Surgical cache patches on mutation, NOT full invalidation	Full tenant-wide invalidate + refetch on every edit	A 5-15s loading spinner after saving a single field is bad UX and wasted backend load. Mutation hooks patch one row in place via `patchItemInCache` / `addItemToCache` / `removeItemFromCache`. BFF `revalidateTag` still fires for cross-tab / cross-user consistency. Full invalidation reserved for login, list mount, and 2h TTL
Pessimistic surgical updates in sprint 1	Optimistic (patch immediately, roll back on error)	Simpler default; backend confirms before the cache mutates. Optimistic is a sprint 2 candidate once pessimistic is measured and if UX demands it
Feature flag on PR 5, flip on PR 6	All-at-once	Lets stakeholders preview the filter UI before the cursor-to-full-list flip; rollback headroom
Interpret “server side rendering of the page” as “serve data from cache”	Implement AG Grid SSRM against BFF cache	At 5k items, client-side is simpler, faster, and matches the Goal doc’s refresh-policy semantics

End of proposal.