Spike #855 — Item List Filter & Sort: Proposal (Historical)
This document is historical. It describes the originally proposed architecture (AG Grid native column filters, client-side row model, custom
useAllItemsQueryhook). After team review, the implementation pivoted to CloudScape PropertyFilter + AG Grid Server-Side Row Model (SSRM) + BFF-side filter/sort engines.For the canonical reference, read architecture.md. For what actually shipped, see implementation-notes.md.
Constants quoted in this document (e.g.,
HARD_MAX_ITEMS = 8_000,ITEMS_ALL_TTL_SECONDS = 7200, the 1MB CloudFront limit framing) reflect the original proposal. The shipped design is different:HARD_MAX_ITEMSis a BFF-memory cap (not CloudFront-derived),CACHE_TTL_SECONDS = 300(5 min), tenant-scoped invalidation tag'items-all:' + tenantId, capacity additionally bounded by the 2MBunstable_cacheper-entry limit. Don’t copy-paste from this file — see architecture.md for the canonical values.The core BFF caching architecture (
unstable_cache,revalidateTag, pagination loop) is exactly what this proposal designed — only the filter UI and data delivery layers changed.
Ticket: Arda-cards/management#855
Branch: feature/items-sorting-filtering
Date: 2026-04-14
Author: David Quintanilla
This document was the single deliverable for spike #855. It combined the investigation findings, architecture proposal, sprint scope, feasibility prototype plan, and migration path into one file. It was the input for sprint planning.
0. TL;DR
Section titled “0. TL;DR”- Implement AG Grid Enterprise multi-column filter + multi-sort for
/items, entirely client-side, against a full-tenant dataset fetched once per session and cached at the BFF. - No backend changes required. The current API does not support server-side sort or extended filter operators, so the feature is designed to ship independently.
- BFF implements a fat-fetch endpoint that loops the existing cursor-paginated backend API in 500-row pages up to an 8k safety cap (bounded by the Amplify CloudFront 1MB response limit, not browser performance), strips each item down to grid-relevant columns, and caches the result per tenant for 2 hours with four explicit refresh triggers (write, login, list mount, TTL ceiling).
- AG Grid client-side row model runs filter, sort, pagination, and search in the browser with sub-100ms latency over ~5k items.
- Filter and sort model are persisted in a small Redux slice (Published Items tab only). Rows are kept in a lightweight custom hook with a module-scoped cache — no new state-management library. Cursor pagination disappears from the client UI entirely.
- One sprint for one engineer, plus buffer. No blocking dependencies. Phase 2 (real backend filter/sort) and phase 3 (shared Redis cache) are documented escape hatches gated on production metrics, not planned work.
- Every decision in the project’s Goal document is honored. §2 maps each adopted decision to its concrete implementation.
1. Context and constraints
Section titled “1. Context and constraints”1.1 What the ticket asks for
Section titled “1.1 What the ticket asks for”- Architecture document covering data flow, state management, column filter type mapping, and migration path.
- Feasibility prototype in the
ux-prototypeStorybook demonstrating at least single-column filter + sort with AG Grid Enterprise. - Sprint proposal with task-level estimates, identified dependencies, and test plan.
Acceptance criteria include validating the BFF caching approach against the current API pagination model and validating the architectural decisions in the Goal document.
1.2 Authoritative sources read for this proposal
Section titled “1.2 Authoritative sources read for this proposal”- Goal document — adopted decisions for BFF caching, AG Grid native features, 500-page loop, ~5k items per tenant, 2-hour refresh ceiling
- Application vs Canary Audit — capability and implementation diff between
arda-frontend-appand the canarycreateEntityDataGrid - Getting Started guide
- Tickets: #742 parent epic, #781 phase 1/2 filter plan, #824 Query DSL search, #611 rename Filter→Search, #749 AG Grid Enterprise license
arda-frontend-appcodebase: items page,columnPresets,ArdaGrid,ItemTableAGGrid,ardaClient, BFF routes, store slices,amplify.yml
1.3 The hard constraint
Section titled “1.3 The hard constraint”The backend API does not currently support server-side sort or extended filter operators. This feature is designed to ship independently, with no backend changes required. The architecture follows directly from that constraint.
1.4 Baseline facts about the existing app
Section titled “1.4 Baseline facts about the existing app”- Next.js 16.1.6 App Router, React 19.2.4, TS 5.9.3. Redux Toolkit + redux-persist.
- Auth: AWS Cognito SDK direct (not Amplify Auth despite the name).
- Hosting: AWS Amplify Hosting (CI/CD + Lambda for SSR routes). Three apps (dev/stage/prod), each with inline build specs that override the committed
amplify.yml. - AG Grid Community + Enterprise 34.3.1 installed and registered. License active on deployed environments via
NEXT_PUBLIC_AG_GRID_LICENSE_KEYset in each Amplify Console and picked up byamplify.yml:16. Local dev shows a watermark — harmless. - Current
/itemsflow:ItemsPage→ardaClient.queryItems()→ POST/api/arda/items/query→ POST{BASE_URL}/v1/item/item/query. No caching anywhere —cache: 'no-store'hardcoded in every route. - Backend filter DSL today:
and/or/eq/regex. The existing search box already uses this via regex clauses and works in production. No sort field. No range/in/null/not operators. - Pagination is cursor-based (
thisPage/nextPage/previousPagetokens). Default 50 items per page. - AG Grid wiring:
sortable: true, filter: falseat default col def. Sort is client-side over the current page only.enableFiltering,onFilterChanged,enableMultiSort,onSortChangedprops exist onArdaGridbut are never passed. - Tenant item counts estimated ~5k. Not measured, not guaranteed.
@arda-cards/api-proxyand canarycreateEntityDataGridprimitives are NOT consumed by this app today. Canary migration is a separate future project.
1.5 Amplify Hosting runtime reality
Section titled “1.5 Amplify Hosting runtime reality”- API routes run as Lambda functions. Cold starts matter. Memory and
/tmpare per-instance. - Any
NEXT_PUBLIC_*env var set in the Amplify Console flows into.envat build time viaamplify.yml:16grep pattern. No build-spec change needed to add public env vars. - Inline build specs in each Amplify app override the committed
amplify.yml. Pipeline changes must be applied per-app viaaws amplify update-appor they silently do nothing. - Unit tests run in the build pipeline (
amplify.yml:26). Flaky tests block deployments, not just CI runs. - Amplify Hosting SSR responses are gated by CloudFront’s ~1MB response body + headers limit, NOT the looser API Gateway 6MB limit. Confirmed by production bug report aws-amplify/amplify-hosting#3214. The fat-fetch endpoint must respect this — §5.3.2 addresses it via column stripping plus explicit gzip compression. This is the single tightest constraint on the design (see §6.1).
1.6 Confirmed answers to earlier unknowns
Section titled “1.6 Confirmed answers to earlier unknowns”- Backend sort? No. The API has no sort field.
- Backend DSL extension? No.
and/or/eq/regexis what we have. - Does the existing server-side filter path work today? Yes. Search box uses it in production.
- Items per tenant? Estimated ~5k. Some tenants likely larger.
- Why frontend-only? Server-side sort/filter is not available in the current API.
- Backend
paginate.sizecap? Unknown. The internal loop handles either case. - Product freshness expectations? Unknown. 2-hour TTL + explicit triggers is a reasonable default; confirm before shipping.
- Shared Redis on the table? Unknown. Not blocking phase 1.
- Lambda response size at 5k items? Must be validated in the prototype (§8). Column stripping is the mitigation.
2. Alignment with the Goal document
Section titled “2. Alignment with the Goal document”2.1 Decisions adopted by the Goal document
Section titled “2.1 Decisions adopted by the Goal document”The Goal document in the Arda-cards/documentation repo adopted these decisions before this spike started. This proposal implements all of them. Quoted phrases are from the Goal doc’s “General Guidance & Adopted Decisions” section.
- BFF-level tenant cache. “The BFF will retrieve the complete set of items for a tenant from the back end and will keep a cache in its memory, keyed by tenant id.” → Next.js
unstable_cachewrapping a BFF-internal cursor loop, keyed by tenant (§5.3.2). - AG Grid native sort/filter/pagination with Redux integration. “The SPA and BFF will use the AG Grid native Sort, Filter and Pagination with its integration with Redux.” → AG Grid client-side row model with per-column
filter: true, Shift-click multi-sort (default viamultiSortKey='shift'), and built-inpagination. Filter/sort model persisted in a new per-tab Redux slice (§5.1, §5.4). Note: the project’sArdaGridwrapper exposes these asenableFiltering/enableMultiSortprops that translate to the underlying AG Grid options — see §5.4.2 for the wrapper-prop clarification. - 500-row-page backend loop until exhausted. “The BFF will read from the back end in pages of 500 items until all items for a tenant are retrieved.” →
BACKEND_PAGE_SIZE = 500with an internal cursor loop in the BFF fat-fetch route (§5.3.2). - ~5k items per tenant assumption. “Current estimates (2026-04-02) are < 5000 items per tenant.” → Drives the client-side row model choice (comfortable at this scale) and the
HARD_MAX_ITEMS = 8_000safety cap in §5.3.2 — note this cap is bounded by the Amplify CloudFront 1MB response limit (see §6.1), NOT by AG Grid browser performance. - Cache refresh triggers. “The BFF cache will be refreshed from the back end when: (1) there is an update, creation or deletion of an item, (2) a user for the tenant logs in, (3) an Item List is mounted in a user session, (4) at a minimum once every 2 hours (should be configurable).” →
revalidateTagon every write (§5.3.3), a refetch-on-mount effect in the customuseAllItemsQueryhook (§5.2), explicit login-time cache bust viainvalidateAllItemsCache(§5.3.3),ITEMS_ALL_TTL_SECONDS = 7200background TTL (§5.3.2). - Cache eviction deferred. “Cache Eviction will need to be designed once we have a better understanding of the memory pressure we have in production.” → Acknowledged.
unstable_cacheon Lambda handles eviction implicitly via Lambda instance lifecycle. - AG Grid native UX is an acceptable starting point. “It is also O.K. to start with a subset of those capabilities.” → Sprint 1 enables the filter types listed in §5.4.1; floating filters, the advanced filter expression builder, and saved views are deferred.
2.2 Interpretation note: “server side rendering of the page”
Section titled “2.2 Interpretation note: “server side rendering of the page””The Goal document also contains this phrase:
“BFF will do a server side rendering of the page to be shown to the user against its cached items.”
In context, this means the BFF serves data from its in-memory cache to the SPA — not that the BFF implements AG Grid’s Server-Side Row Model (SSRM) API. This proposal reads “rendering” as “serving,” and implements it accordingly: the BFF returns the full tenant dataset from cache in one response, and AG Grid’s client-side row model handles filter, sort, and pagination in the browser without further round-trips.
Why this interpretation at 5k items:
- Client-side row model is the idiomatic “native” mode the Goal document calls for. Sub-100ms interaction latency, zero network round-trips per filter change.
- SSRM against the BFF cache would require reimplementing AG Grid’s filter model semantics in Node, a dedicated set-filter-values endpoint per column, and a network round trip for every filter/sort event. At 5k items this is complexity with no user-visible benefit.
- Cold starts are cheaper. One BFF cursor loop pays for the whole user session. SSRM mode would potentially pay the loop cost on every Lambda cold start a user hits.
- Future AG Grid features (advanced filter, quickFilter, grouping) work natively in client-side mode. Many need custom server implementations in SSRM.
- The “refresh on list mount” and “2-hour ceiling” policies in the Goal document only make sense if the client holds a snapshot across interactions. SSRM makes every interaction a live query and the refresh policy becomes meaningless.
If HARD_MAX_ITEMS fires in production (tenant counts grow past ~8k, CloudFront-bound), phase 2 migrates to real backend filter/sort. That is where SSRM becomes the right tool.
2.3 What this proposal does NOT change from the Goal document
Section titled “2.3 What this proposal does NOT change from the Goal document”- Tenant cache lives in the BFF, not the client. ✓
- Cache is refreshed on mutations, login, list mount, and 2-hour ceiling. ✓
- AG Grid native sort/filter/pagination drive the UX. ✓
- Redux persists the filter/sort state across reloads and tabs. ✓
- Cursor pagination to the real backend happens in the BFF, not the SPA. ✓
3. Phased roadmap
Section titled “3. Phased roadmap”| Phase | Scope | Ships when | Backend dep | Infra dep |
|---|---|---|---|---|
| Phase 1 | AG Grid native filter/sort/search/pagination client-side over full tenant dataset; Redux-persisted filter/sort model (Published Items tab); fat-fetch BFF route with unstable_cache + revalidateTag; custom useAllItemsQuery hook with module-scoped cache; HARD_MAX safety metric | Sprint 1 | None | None |
| Phase 2 (escape hatch) | Migrate to AG Grid Server-Side Row Model against the real backend. Requires backend sort + richer filter DSL | Only if HARD_MAX fires in production | Yes — full backend filter/sort | None |
| Phase 3 (escape hatch) | Custom Next.js cacheHandler backed by Redis/ElastiCache or DynamoDB; cluster-wide revalidateTag | Only if per-instance hit rate is too low | None | Redis/ElastiCache |
Phases 2 and 3 are documented paths, not planned work.
4. Data flow
Section titled “4. Data flow”User navigates to /items │ ▼ItemsPage mounts │ ▼useAllItemsQuery(tenantId, tab) [custom hook, module-scoped cache, refetch on mount] │ ▼ (cache miss at client layer)GET /api/arda/items/all?tab=<tab> │ ▼getCachedAllItems() [BFF, unstable_cache, TTL 2h, tag items:{tenantId}] │ ▼ (cache miss at BFF layer)Loop ArdaQueryItemsRequest with { filter: buildTabFilter(tab), paginate: { size: 500 } }until nextPage is empty OR HARD_MAX reached │ ▼Concatenate pages → stripToGridColumns → return full list │ ▼ (cached at BFF and at client) ▼AG Grid client-side row model receives all rows │ ▼User applies filter / sort / search → ZERO additional fetchesAll interactions happen in memory │ ▼onFilterChanged / onSortChanged → dispatch to Redux for persistenceMutations (create / update / delete / publish draft):
- Send the write to the backend as today.
- On success, call
revalidateTag('items:{tenantId}')in the API route — drops the server-side BFF cache for subsequent reads by other Lambda instances / browser tabs. - On success, the client mutation hook patches the cached item in place via
patchItemInCache/addItemToCache/removeItemFromCache. No refetch, no loading state — the grid re-renders only the affected row. - Only login, list mount, and the 2-hour TTL trigger a full refetch of the whole tenant dataset. Routine edits don’t.
5. Detailed design
Section titled “5. Detailed design”5.1 Redux slice — deliberately small
Section titled “5.1 Redux slice — deliberately small”Scope: Published Items tab only. That’s the only active tab today. If Draft or Recently Uploaded tabs are added later, extending the slice to per-tab state is straightforward (wrap in Record<tab, ...>), but we don’t build that complexity now.
File: src/store/slices/itemsFilterSortSlice.ts (new)
Stores filterModel, sortModel, and schemaVersion. Persisted via redux-persist. Deliberately excludes:
- Rows. Rows live in the module-scoped cache inside
useAllItemsQuery(§5.2). Putting 5k items in redux-persist would serialize them to localStorage on every dispatch — a performance disaster. - Pagination state. AG Grid’s built-in client-side pagination owns it. No cursor tokens anywhere in client state.
- Search string. AG Grid
quickFiltervia local component state.
import { createSlice, PayloadAction } from '@reduxjs/toolkit';
export type ItemsFilterModel = Record<string, unknown>;// AG Grid FilterModel is opaque — it's a per-column object keyed by colId:// { name: { filterType: 'text', type: 'contains', filter: 'bolt' },// 'classification.type': { filterType: 'set', values: [...] },// cost: { filterType: 'number', type: 'lessThan', filter: 50 } }
export type ItemsSortModel = Array<{ colId: string; sort: 'asc' | 'desc'; sortIndex?: number; // multi-column sort priority}>;
interface ItemsFilterSortState { filterModel: ItemsFilterModel; sortModel: ItemsSortModel; schemaVersion: number;}
export const CURRENT_SCHEMA_VERSION = 1;
const initialState: ItemsFilterSortState = { filterModel: {}, sortModel: [], schemaVersion: CURRENT_SCHEMA_VERSION,};
const itemsFilterSortSlice = createSlice({ name: 'itemsFilterSort', initialState, reducers: { setFilterModel(state, action: PayloadAction<ItemsFilterModel>) { state.filterModel = action.payload; }, setSortModel(state, action: PayloadAction<ItemsSortModel>) { state.sortModel = action.payload; }, clearFilters(state) { state.filterModel = {}; }, clearSort(state) { state.sortModel = []; }, resetAll() { return initialState; }, },});
export const { setFilterModel, setSortModel, clearFilters, clearSort, resetAll,} = itemsFilterSortSlice.actions;export default itemsFilterSortSlice.reducer;The schemaVersion is a hedge against column renames — a version bump in code plus a migrate entry in persistConfig resets persisted state on the next hydrate, so users don’t restore filter state that references unknown column IDs.
5.2 Custom useAllItemsQuery hook — lean, no new library
Section titled “5.2 Custom useAllItemsQuery hook — lean, no new library”File: src/hooks/useAllItemsQuery.ts (new)
This feature has exactly one cached endpoint. Adding React Query (or any new state library) for a single use case is overkill, introduces a second state-management mental model alongside Redux, and adds ~12KB to the bundle for features we don’t need (window-focus refetch, devtools, global query invalidation). Instead, we ship a ~90-line custom hook with a module-scoped cache.
Reads and behaves like React Query from the caller’s perspective, but the whole thing fits in one file you can understand in two minutes.
import { useCallback, useEffect, useState, useSyncExternalStore } from 'react';import { useAppSelector } from '@/store/hooks';import { selectTenantId } from '@/store/slices/authSlice';import { selectActiveTab } from '@/store/slices/itemsSlice';import { ardaClient } from '@/lib/ardaClient';import type { GridItem } from '@/lib/mappers/itemsGridMapper';
// Module-scoped cache. Lives for the lifetime of the browser tab.// Keyed by `${tenantId}:${tab}` so each tenant and each list tab has its own entry.const cache = new Map<string, GridItem[]>();// In-flight promises keyed the same way — lets two components mounting at the// same time share a single network request instead of double-fetching.const inflight = new Map<string, Promise<GridItem[]>>();// Subscribers per key so useSyncExternalStore can re-render consumers on updates.const subscribers = new Map<string, Set<() => void>>();
function cacheKey(tenantId: string, tab: string): string { return `${tenantId}:${tab}`;}
function notify(key: string) { subscribers.get(key)?.forEach((cb) => cb());}
function subscribe(key: string, cb: () => void) { let set = subscribers.get(key); if (!set) { set = new Set(); subscribers.set(key, set); } set.add(cb); return () => { set!.delete(cb); if (set!.size === 0) subscribers.delete(key); };}
async function fetchAndCache( key: string, tab: string,): Promise<GridItem[]> { // Dedupe concurrent fetches for the same key. const existing = inflight.get(key); if (existing) return existing;
const promise = ardaClient .fetchAllItems({ tab }) .then((rows) => { cache.set(key, rows); inflight.delete(key); notify(key); return rows; }) .catch((err) => { inflight.delete(key); throw err; });
inflight.set(key, promise); return promise;}
// Exported invalidation API for mutation handlers and login flow.// Drops every cache entry for the given tenant across all tabs.export function invalidateAllItemsCache(tenantId: string) { for (const key of Array.from(cache.keys())) { if (key.startsWith(`${tenantId}:`)) { cache.delete(key); notify(key); } }}
export function useAllItemsQuery() { const tenantId = useAppSelector(selectTenantId); const tab = useAppSelector(selectActiveTab); const key = tenantId ? cacheKey(tenantId, tab) : null;
// Subscribe to cache changes for this key so the component re-renders // when another mutation or another hook instance updates the cache. const data = useSyncExternalStore( useCallback((cb) => (key ? subscribe(key, cb) : () => {}), [key]), useCallback(() => (key ? cache.get(key) : undefined), [key]), () => undefined, );
const [isLoading, setIsLoading] = useState(false); const [error, setError] = useState<Error | null>(null);
const refetch = useCallback(async () => { if (!key || !tenantId) return; setIsLoading(true); setError(null); try { await fetchAndCache(key, tab); } catch (err) { setError(err as Error); } finally { setIsLoading(false); } }, [key, tenantId, tab]);
// Goal doc: refresh on list mount. Every time this hook mounts (new ItemsPage // navigation, or tab change causing a re-key), trigger a background fetch. // The BFF unstable_cache with a 2h TTL absorbs the cost on the server side — // "always refetch" is just a BFF round-trip, not a full backend loop. useEffect(() => { refetch(); }, [refetch]);
return { data, isLoading, error, refetch };}What this gives us:
- Mount refetch via
useEffect(refetch, [refetch])— fires on everyItemsPagemount and everytabchange, per the Goal document’s refresh policy. - Request dedupe via the
inflightmap — two components mounting simultaneously share one network request. - Cross-hook reactivity via
useSyncExternalStore— if another component (or a mutation handler) updates the cache, every mounted consumer re-renders with the new data. - Cached across component mounts within the same browser tab session — opening the details panel and closing it doesn’t re-fetch, because the module-scoped cache survives the
ItemsPageunmount/remount cycle. - Per-tenant isolation — each tenant gets its own cache entry.
- Tenant-wide invalidation API via
invalidateAllItemsCache(tenantId)— called from the login flow (§5.3.3) and from mutation handlers (§5.3.4).
What we give up vs React Query:
- Window-focus refetch (not in the Goal doc’s requirements)
- Devtools (the cache is a
Map—console.logworks fine) - Stale-while-revalidate UX niceties like
placeholderData— but since the cache entry is kept while a refetch runs, the grid never seesundefineddata on a subsequent refetch anyway - Shared machinery across other cached endpoints — irrelevant since this is our only one
If the app later grows three or more cached endpoints that all want this behavior, revisit React Query then. For this feature, the custom hook is the right amount of code.
5.3 BFF fat-fetch endpoint and cache
Section titled “5.3 BFF fat-fetch endpoint and cache”5.3.1 Route
Section titled “5.3.1 Route”File: src/app/api/arda/items/all/route.ts (new)
import { NextRequest, NextResponse } from 'next/server';import { processJWTForArda } from '@/lib/jwt';import { extractErrorMessage } from '@/lib/errors';import { generateRequestId } from '@/lib/api-route-utils';import { getCachedAllItems } from '@/lib/cache/itemsAllCache';
export async function GET(request: NextRequest) { try { const jwtResult = await processJWTForArda(request); if (!jwtResult.success) { return NextResponse.json( { ok: false, error: jwtResult.error }, { status: jwtResult.statusCode }, ); }
const { userContext } = jwtResult; const url = new URL(request.url); const tab = url.searchParams.get('tab') ?? 'all'; const requestId = generateRequestId();
const cachedFetch = getCachedAllItems(userContext.tenantId, userContext); const { status, data, metadata } = await cachedFetch(tab, requestId);
const response = NextResponse.json(data, { status }); response.headers.set('X-Items-Count', String(metadata.count)); response.headers.set('X-Items-Hard-Max-Hit', metadata.hardMaxHit ? 'true' : 'false'); return response; } catch (error) { console.error('ARDA All Items Request Error:', error); return NextResponse.json( { ok: false, error: 'Upstream request failed', details: extractErrorMessage(error) }, { status: 500 }, ); }}5.3.2 Cache wrapper with internal cursor loop
Section titled “5.3.2 Cache wrapper with internal cursor loop”File: src/lib/cache/itemsAllCache.ts (new)
import { unstable_cache } from 'next/cache';import { env } from '@/lib/env';import type { ArdaQueryItemsRequest, ArdaQueryResponse, ArdaItemPayload,} from '@/types/arda-api';import type { UserContext } from '@/lib/jwt';import { stripToGridColumns, type GridItem } from '@/lib/mappers/itemsGridMapper';
// Background TTL ceiling — matches the Goal document's "at minimum once every// 2 hours" refresh policy. Explicit refresh triggers (write mutations, user login,// list mount) cover freshness during an active session; this TTL is the safety net// for sessions that stay open without interaction.export const ITEMS_ALL_TTL_SECONDS = 2 * 60 * 60; // 7200s = 2hexport const BACKEND_PAGE_SIZE = 500; // backend page size for the internal loop
// HARD_MAX is bound by the CloudFront 1MB response limit on Amplify Hosting,// not by AG Grid browser performance (which tolerates ~20k). Math: ~0.5KB per// stripped item × 5x gzip compression ≈ 100 bytes per item on the wire. At 1MB// total, that's ~10k items theoretical, ~8k with safety margin.// Prototype (§8) validates the real number against a live Amplify deployment// before sprint 1 starts — this constant may move up or down based on what// actual compression ratio we get for realistic item data.export const HARD_MAX_ITEMS = 8_000; // phase 2 trigger (CloudFront-bound)
export function itemsCacheTag(tenantId: string): string { return `items:${tenantId}`;}
async function fetchPage( body: ArdaQueryItemsRequest, userContext: UserContext, requestId: string,): Promise<ArdaQueryResponse> { const upstream = await fetch(`${env.BASE_URL}/v1/item/item/query`, { method: 'POST', headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${env.ARDA_API_KEY}`, 'X-Request-ID': requestId, 'X-Author': userContext.author, 'X-Tenant-Id': userContext.tenantId, 'X-oidc-subject': userContext.userId, }, body: JSON.stringify(body), cache: 'no-store', }); if (!upstream.ok) { throw new Error(`Upstream page fetch failed: ${upstream.status}`); } return upstream.json() as Promise<ArdaQueryResponse>;}
function buildTabFilter(tab: string): ArdaQueryItemsRequest['filter'] { // Tab → filter mapping. Extend this switch as new tabs are added. switch (tab) { case 'active': return { and: [{ or: [{ locator: 'classification.status', eq: 'active' }] }] }; case 'archived': return { and: [{ or: [{ locator: 'classification.status', eq: 'archived' }] }] }; case 'all': default: return true; }}
async function fetchAllItemsUncached( tab: string, userContext: UserContext, requestId: string,): Promise<{ status: number; data: GridItem[]; metadata: { count: number; hardMaxHit: boolean };}> { const all: ArdaItemPayload[] = []; let pageToken: string | undefined; let pageIndex = 0;
while (true) { const page = await fetchPage( { filter: buildTabFilter(tab), paginate: { index: pageIndex, size: BACKEND_PAGE_SIZE }, ...(pageToken ? { pageToken } : {}), }, userContext, requestId, );
all.push(...page.results.map(r => r.payload));
if (!page.nextPage || all.length >= HARD_MAX_ITEMS) break; pageToken = page.nextPage; pageIndex += 1; }
const hardMaxHit = all.length >= HARD_MAX_ITEMS; if (hardMaxHit) { console.warn( `[itemsAllCache] HARD_MAX_ITEMS (${HARD_MAX_ITEMS}) reached for tenant ${userContext.tenantId} tab ${tab}. ` + 'This is the phase 2 trigger — escalate to Server-Side Row Model.', ); }
const gridItems = all.map(stripToGridColumns); return { status: 200, data: gridItems, metadata: { count: gridItems.length, hardMaxHit }, };}
// Tenant-scoped. Key is explicit: (tenantId, tab). userContext is closed over but// NOT part of the key — item data is tenant-scoped, so users from the same tenant// share cache entries. If this endpoint ever gains user-specific personalization,// add userId to the key array.export function getCachedAllItems(tenantId: string, userContext: UserContext) { return async (tab: string, requestId: string) => { const cached = unstable_cache( async () => fetchAllItemsUncached(tab, userContext, requestId), ['items-all', tenantId, tab], { tags: [itemsCacheTag(tenantId)], revalidate: ITEMS_ALL_TTL_SECONDS, }, ); return cached(); };}Three critical design choices:
- The loop runs inside
unstable_cache. The cache stores the result of the entire loop, not individual page responses. On a cache hit the loop is skipped entirely — the cached full list is returned in one shot. stripToGridColumnsis the CloudFront response size mitigation. The grid renders ~15-20 fields out of the full item schema; everything else is omitted. The details panel still loads the full item payload on demand via the existing single-itemGETendpoint. See §6.1 for the 1MB limit that forces this.- The response is explicitly compressed. The API route must set
Content-Encoding: gzipon the response — either via Next.js’s defaultcompress: truesetting (verify it applies on Amplify’s SSR runtime) or by manually gzipping the body withzlib.gzipSync(). Without compression the 1MB CloudFront limit binds at ~2000 items instead of ~8000. See §6.1.
Note on unstable_cache deprecation: Next.js 16 marks unstable_cache as deprecated in favor of the new use cache directive (introduced stable in Next.js 16.0 as part of Cache Components). The deprecation is documentation-only — unstable_cache still works and will ship functional for the foreseeable future. We’re using it for sprint 1 because:
- Migrating to
use cacherequires enabling thecacheComponents: trueflag innext.config.ts, which affects the entire app’s caching behavior, not just this feature. That’s out of scope for a filter/sort sprint. - The serverless runtime story is the same either way. Both
unstable_cacheanduse cachedefault to per-instance in-memory caching; neither gives cluster-wide persistence on Lambda without a customcacheHandlers/use cache: remoteconfiguration. Switching the directive name doesn’t change the Amplify Hosting cache-hit-rate characteristics (see §6.2). - A follow-up task can migrate
unstable_cache→use cache+cacheTag+cacheLifeoncecacheComponentsis enabled app-wide. The migration is a mechanical rename for our use case.
5.3.3 Four cache refresh triggers
Section titled “5.3.3 Four cache refresh triggers”All four Goal document refresh triggers are implemented:
| Trigger | Where | How | Client-side effect |
|---|---|---|---|
| Item mutation (create/update/delete/publish) | BFF write routes + client mutation hook | revalidateTag(itemsCacheTag(tenantId)) on BFF; client mutation hook calls patchItemInCache / addItemToCache / removeItemFromCache | Surgical patch — the single affected row is updated in place; grid re-renders only that row. No refetch, no loading state. See §5.3.4 |
| User login | Sign-in flow | invalidateAllItemsCache(tenantId) on the client + a server action calling revalidateTag | Full drop. Next read triggers a fresh fat fetch |
| List mount | Items page | useAllItemsQuery’s mount effect runs every time ItemsPage mounts or the active tab changes | Background refetch. Existing cached rows stay visible during the fetch so there’s no flash |
| 2-hour ceiling | BFF unstable_cache | ITEMS_ALL_TTL_SECONDS = 7200 background TTL | Next request after expiry re-loops the backend |
Write-path invalidation — one line per route, applied in items/route.ts POST, items/[entityId]/route.ts PUT/DELETE, and items/[entityId]/draft/route.ts on publish:
import { revalidateTag } from 'next/cache';import { itemsCacheTag } from '@/lib/cache/itemsAllCache';
if (upstream.ok) { revalidateTag(itemsCacheTag(userContext.tenantId));}Tenant-wide scope. Any write invalidates the full tenant dataset; the next read re-loops once. Simple, correct, cheap.
Login-path invalidation — after authThunks.signIn succeeds:
- Call a new server action
revalidateItemsCache(tenantId)that wrapsrevalidateTag(itemsCacheTag(tenantId))on the BFF side. - Call
invalidateAllItemsCache(tenantId)on the client to drop every cached tab entry for that tenant.
Rationale: a user logging back in after hours expects fresh data regardless of whether a warm Lambda still has a snapshot.
List-mount refresh — useAllItemsQuery’s mount effect fires on every navigation to /items and on every active-tab change (because the grid is keyed on activeTab, forcing a remount per tab — see §5.4.3). If the BFF tag is fresh the BFF returns a cache hit in one shot. If stale, the BFF loops the backend once. This is the important compromise between the Goal doc’s “refresh on list mount” and Lambda per-instance cache reality: we guarantee the BFF is consulted on every mount, and the BFF honors staleness correctly.
5.3.4 Client-side mutation integration — surgical cache updates
Section titled “5.3.4 Client-side mutation integration — surgical cache updates”Every item mutation (create / update / delete / publish draft) touches exactly one item. Full tenant-wide invalidation on every edit would trigger a 5-15 second refetch of 5,000 items every time a user saves a single field — that’s bad UX and wasted backend load. Mutations patch the cached array in place instead.
This is the default behavior, not an optional optimization. The module-scoped cache in §5.2 already has everything needed; we just export three small helpers and call them from mutation handlers.
Exported from useAllItemsQuery.ts:
/** * Surgically update one item in every cached tab entry for a tenant. * Called from mutation handlers after the backend confirms a write. */export function patchItemInCache( tenantId: string, entityId: string, patch: Partial<GridItem>,): void { for (const key of Array.from(cache.keys())) { if (!key.startsWith(`${tenantId}:`)) continue; const rows = cache.get(key); if (!rows) continue; const idx = rows.findIndex((r) => r.entityId === entityId); if (idx === -1) continue; const updated = [...rows]; updated[idx] = { ...updated[idx], ...patch }; cache.set(key, updated); notify(key); }}
/** Surgically remove one item from every cached tab entry. */export function removeItemFromCache(tenantId: string, entityId: string): void { for (const key of Array.from(cache.keys())) { if (!key.startsWith(`${tenantId}:`)) continue; const rows = cache.get(key); if (!rows) continue; const filtered = rows.filter((r) => r.entityId !== entityId); if (filtered.length !== rows.length) { cache.set(key, filtered); notify(key); } }}
/** Append a newly-created item to a specific tab's cache entry. No-op if * that tab hasn't been visited yet — the next mount will fetch it fresh. */export function addItemToCache( tenantId: string, tab: string, item: GridItem,): void { const key = `${tenantId}:${tab}`; const rows = cache.get(key); if (!rows) return; cache.set(key, [...rows, item]); notify(key);}Mutation hooks:
// src/hooks/useItemMutations.ts (new)import { useAppSelector } from '@/store/hooks';import { selectTenantId } from '@/store/slices/authSlice';import { patchItemInCache, removeItemFromCache, addItemToCache,} from '@/hooks/useAllItemsQuery';import { ardaClient } from '@/lib/ardaClient';import { stripToGridColumns } from '@/lib/mappers/itemsGridMapper';
export function useUpdateItem() { const tenantId = useAppSelector(selectTenantId); return async (entityId: string, patch: ItemPatch) => { const updated = await ardaClient.updateItem(entityId, patch); if (tenantId) { patchItemInCache(tenantId, entityId, stripToGridColumns(updated)); } return updated; };}
export function useCreateItem() { const tenantId = useAppSelector(selectTenantId); return async (input: CreateItemInput, tab: string) => { const created = await ardaClient.createItem(input); if (tenantId) { addItemToCache(tenantId, tab, stripToGridColumns(created)); } return created; };}
export function useDeleteItem() { const tenantId = useAppSelector(selectTenantId); return async (entityId: string) => { await ardaClient.deleteItem(entityId); if (tenantId) { removeItemFromCache(tenantId, entityId); } };}The BFF revalidateTag call still happens on the server side (see §5.3.3 write-path invalidation). This is defense-in-depth, not redundancy:
- The client-side surgical patch gives instant UX — user sees their edit reflected without a loading state.
- The BFF
revalidateTagensures server-side cache consistency for subsequent reads by a different Lambda instance, a fresh browser tab that doesn’t have the module-scoped cache populated yet, or another user in the same tenant.
Both are needed. Without the client-side patch, users see 5-15 seconds of loading on every edit. Without the BFF revalidateTag, stale BFF cache entries could serve old data to other browser tabs or tenants for up to 2 hours.
Error handling: the code above is pessimistic — it only patches the cache after the backend confirms success. If the save fails, the cache stays consistent and the user sees an error with their unsaved change still in the editor. This is the safer default for sprint 1.
A more responsive optimistic variant (patch the cache immediately, roll back on failure) is a sprint 2 candidate — it’s a small addition on top of the infrastructure above but adds rollback complexity that isn’t worth it until we’ve measured whether pessimistic updates feel fast enough.
When full-tenant invalidation IS still the right call: login, list mount (via useAllItemsQuery’s mount effect), and the 2-hour TTL ceiling. Those events want a fresh view of the whole dataset, not surgical patches. Those paths use invalidateAllItemsCache(tenantId) (also exported from useAllItemsQuery.ts) to drop every cache entry for the tenant. See §5.3.3 for the four refresh triggers and which path each one takes.
5.3.5 TTL reasoning
Section titled “5.3.5 TTL reasoning”The numbers come from the Goal document, not arbitrary tuning:
- BFF TTL: 2 hours — Goal doc’s stated minimum refresh ceiling. Explicit triggers (mutation, login, list mount) cover active-session staleness; this TTL is the safety net for idle sessions. Should be made configurable via env var per the Goal doc’s “should be configurable” qualifier — defer env plumbing to a follow-up unless sprint capacity allows.
- Client cache: browser-tab lifetime, no TTL — the module-scoped
MapinuseAllItemsQuerypersists for as long as the browser tab is open. It’s dropped by the explicit refresh triggers (mutation, login, list mount) and rebuilt on next use. No client-side TTL is needed because the BFF TTL is the source of truth for background freshness. - Brief-navigation optimization — because the cache is module-scoped (not component-state), opening the details panel and closing it, or navigating to
/order-queueand back, returns instantly with zero network activity. The cache entry is only invalidated by a mutation or an explicitinvalidateAllItemsCachecall.
5.3.6 Metrics for phase 1
Section titled “5.3.6 Metrics for phase 1”Emit response headers on every fat-fetch request:
X-Cache-Source: hit|miss(from a lightweight wrapper around the cached function call)X-Items-Count: <n>X-Items-Hard-Max-Hit: true|false
Log (not header): upstream loop duration, page count when uncached, total response body size (informs the 1MB CloudFront/Amplify limit watch).
Without these, phase 2 and phase 3 cannot be justified or ruled out on evidence.
5.3.7 Why not a shared Redis cache today
Section titled “5.3.7 Why not a shared Redis cache today”- Requires new infrastructure (VPC, secrets, cost, ops).
- No hit-rate data yet to justify the investment.
- 2-hour TTL plus tenant-wide
revalidateTagon writes keeps staleness tolerable.
Phase 3 trigger: per-instance hit rate below ~30% after one production week, or product reports of cross-session staleness that TTL tuning cannot fix.
5.4 AG Grid wiring
Section titled “5.4 AG Grid wiring”5.4.1 Column definitions — filter and sort live on EACH column
Section titled “5.4.1 Column definitions — filter and sort live on EACH column”File: src/components/table/columnPresets.tsx
Every filter and sort control in this feature is attached to an individual column header, not to a global toolbar. AG Grid’s column menu exposes:
- A sort toggle (asc / desc / none) per column, with Shift-click adding secondary columns for multi-column sort.
- A filter icon per column that opens a filter popover. The popover’s contents depend on the column’s filter type (text input, checkbox list, number range, date range, etc.).
Turning this on is literally two lines:
export const itemsDefaultColDef: ColDef<items.Item> = { sortable: true, // per-column sort toggle in the header filter: true, // was false — per-column filter icon in the header floatingFilter: false, // off for sprint 1; can be added later as an always-visible row under headers resizable: true, suppressMovable: false, sortingOrder: ['asc', 'desc', null],};Then each column chooses its filter type based on the data it displays:
| Column data shape | AG Grid filter | Operators the user gets | Example columns |
|---|---|---|---|
| Free text | agTextColumnFilter | contains, equals, starts with, ends with, not contains, not equal | name, internalSKU, description |
| Enum / fixed set | agSetColumnFilter (Enterprise) | checkbox list of every unique value in the column with “select all” | classification.type, classification.subtype, status |
| Number | agNumberColumnFilter | equals, not equal, greater than, less than, between, blank | quantity, cost, sellPrice |
| Date | agDateColumnFilter | equals, not equal, before, after, between | createdAt, updatedAt |
| Boolean | agSetColumnFilter with true/false values | two-checkbox list | isActive, isArchived |
Select, action, and computed columns keep filter: false, sortable: false — they have nothing meaningful to filter or sort on.
All operators for all filter types work out of the box, because AG Grid runs against the full in-memory dataset. There is no “supported vs unsupported” matrix — every operator the popover exposes is functional. When a user sets “cost < 50” on one column and “supplier is Acme” on another and sorts by “updatedAt desc”, AG Grid combines all three with AND + the multi-sort priority and updates the visible rows instantly. None of that combination logic is code we write — it’s what we get from AG Grid’s filter/sort model.
5.4.2 Grid component props
Section titled “5.4.2 Grid component props”File: src/app/items/ItemTableAGGrid.tsx
<ArdaGrid ref={gridRef} rowData={rows} // from useAllItemsQuery() — full published items array columnDefs={columnDefs} defaultColDef={itemsDefaultColDef} // filter: true + sortable: true per column (§5.4.1) rowModelType="clientSide" // explicit — the whole architecture depends on it enableFiltering // turns on the per-column filter menu in the header enableMultiSort // lets the user hold Shift to add secondary sort columns pagination // AG Grid built-in client-side pagination paginationPageSize={50} paginationPageSizeSelector={[25, 50, 100, 200]} onFilterChanged={handleFilterChanged} // writes filterModel to Redux onSortChanged={handleSortChanged} // writes sortModel to Redux quickFilterText={searchValue} // global search across all rendered columns initialState={initialGridState} // restores filter + sort from Redux loading={isLoading} overlayLoadingTemplate={LOADING_TEMPLATE} // ... existing props (selection, cell editing, notes handlers, etc.) .../>Note: rows is the full array of published items from useAllItemsQuery, not a page. AG Grid’s built-in pagination handles displaying it 50 at a time. Each column in columnDefs gets its own filter icon, filter popover, and sort controls — nothing about “per-column” is custom code, it’s the default AG Grid behavior once filter: true and sortable: true are set.
Important clarification on prop names: enableFiltering and enableMultiSort are props on the project’s ArdaGrid wrapper, not on the raw AgGridReact component. AG Grid itself uses different mechanisms:
- Filtering is enabled per-column via
filter: truein the column definition (§5.4.1) — there is no top-levelenableFilteringprop in AG Grid. - Multi-column sort is enabled by default via the
multiSortKeygrid option (defaults to'shift', meaning users hold Shift and click column headers to add secondary sort columns). If we want multi-sort without any modifier key, we’d setalwaysMultiSort: true. To disable it entirely,suppressMultiSort: true. There is noenableMultiSortprop in AG Grid.
Task 8 of the sprint plan (§7.1) includes wiring these wrapper props through to the correct AG Grid options so they actually do what the prop names imply. Don’t grep AG Grid’s docs for enableMultiSort — you won’t find it.
5.4.3 Handlers and state restoration
Section titled “5.4.3 Handlers and state restoration”import { setFilterModel, setSortModel,} from '@/store/slices/itemsFilterSortSlice';
const dispatch = useAppDispatch();
const handleFilterChanged = useCallback((event: FilterChangedEvent) => { dispatch(setFilterModel(event.api.getFilterModel()));}, [dispatch]);
const handleSortChanged = useCallback((event: SortChangedEvent) => { const sortModel = event.api.getColumnState() .filter(c => c.sort != null) .map(c => ({ colId: c.colId, sort: c.sort as 'asc' | 'desc', sortIndex: c.sortIndex ?? undefined, })); dispatch(setSortModel(sortModel));}, [dispatch]);
const persistedFilter = useAppSelector(s => s.itemsFilterSort.filterModel);const persistedSort = useAppSelector(s => s.itemsFilterSort.sortModel);
const initialGridState: GridState = useMemo(() => ({ filter: { filterModel: persistedFilter }, sort: { sortModel: persistedSort },}), []); // mount-onlyNeither handler triggers a refetch — the whole point of the architecture. Filter and sort events update Redux for persistence; AG Grid handles everything else in memory. The empty dep array on initialGridState is deliberate: after mount AG Grid owns the live state, and Redux is updated via the change handlers.
Note on initialState: AG Grid’s initialState prop is only applied on mount (confirmed by AG Grid v34 docs: “It is only read once when the grid is created”). If other tabs (Draft, Recently Uploaded) become active in the future and need separate filter/sort state, the grid should be re-keyed on the active tab (key={activeTab}) to force a fresh mount per tab. For now with only the Published tab active, this isn’t needed.
5.4.4 Search box migration
Section titled “5.4.4 Search box migration”The existing search box in src/app/items/page.tsx builds ArdaQueryItemsRequest filters with regex clauses on every keystroke. With the full dataset in memory, we replace this with AG Grid’s quickFilter:
const [searchValue, setSearchValue] = useState('');// ...<SearchInput value={searchValue} onChange={setSearchValue} />// ItemTableAGGrid receives searchValue as the quickFilterText prop.Instant, covers every rendered column, simpler code. items-search.spec.ts becomes a regression test against the new implementation (§7.4).
5.5 GridItem type and column stripper
Section titled “5.5 GridItem type and column stripper”File: src/lib/mappers/itemsGridMapper.ts (new)
import type { Item } from '@/types/items';
export interface GridItem { entityId: string; name: string; internalSKU: string; status: string; type: string; subtype: string; quantity: number; cost: number; sellPrice: number; supplier: string; location: string; createdAt: string; updatedAt: string; isActive: boolean; isArchived: boolean; // Deliberately narrow. No long descriptions, no extended attributes, // no file attachments. Full item payload is loaded on-demand by the // details panel via the existing single-item GET endpoint.}
export function stripToGridColumns(item: Item): GridItem { return { entityId: item.entityId, name: item.name, internalSKU: item.internalSKU, status: item.classification?.status ?? '', type: item.classification?.type ?? '', subtype: item.classification?.subtype ?? '', quantity: item.quantity ?? 0, cost: item.cost ?? 0, sellPrice: item.sellPrice ?? 0, supplier: item.supplier?.name ?? '', location: item.location?.name ?? '', createdAt: item.createdAt, updatedAt: item.updatedAt, isActive: item.isActive ?? true, isArchived: item.isArchived ?? false, };}The exact field list will be refined against columnPresets.tsx during implementation. The goal is “exactly what the grid renders, nothing more.”
6. Amplify Hosting considerations
Section titled “6. Amplify Hosting considerations”6.1 CloudFront 1MB response size — the principal risk
Section titled “6.1 CloudFront 1MB response size — the principal risk”Amplify Hosting routes all SSR/API-route responses through CloudFront, which enforces a ~1MB limit on the combined response body and headers. This is confirmed by a real production bug report (aws-amplify/amplify-hosting#3214) where a Next.js SSR app hit: “The Lambda function returned an invalid response, the length of the body and header was 1.5mb bytes which exceeds the cloudfront limit of 1mb bytes.”
This is meaningfully tighter than a raw Lambda function limit — it applies to every Amplify Hosting SSR response, including our fat-fetch endpoint. This is the hardest technical constraint on the architecture.
Sizing math for 5k items:
- Full
ItemJSON: ~1-3KB per row → 5k items ≈ 5-15MB raw → blows the limit by 5-15x uncompressed. - With
stripToGridColumns(~0.5KB per grid row): 5k × 0.5KB ≈ 2.5MB raw → still over the limit uncompressed. - With stripping +
Content-Encoding: gzipon the response: ~500KB gzipped → fits under 1MB with headroom, assuming CloudFront measures post-compression.
The compression piece is load-bearing. The architecture only works if:
- The API route response is compressed (
Content-Encoding: gzip) before it reaches CloudFront, AND - CloudFront measures the post-compression size for its 1MB check.
Next.js has compress: true in next.config.ts by default, but the Amplify Hosting Compute runtime’s behavior with that setting is not guaranteed to “just work” — it depends on how the runtime intermediates Lambda responses to CloudFront. This is the single thing the prototype must validate against a real Amplify dev deployment before sprint 1 starts.
Mitigations built into the design:
stripToGridColumnsreturns only grid-relevant fields. Target: ≤0.5KB per item.- If default Next.js compression doesn’t apply on Amplify, explicitly gzip the response body in the API route using
zliband setContent-Encoding: gzipheaders. Gives ~3-4x compression on JSON. Works regardless of the runtime’s default behavior. HARD_MAX_ITEMS = 8_000cap in the BFF as a primary safety net — 8k is the estimated ceiling under the 1MB CloudFront limit assuming ~0.5KB per stripped item and ~5x gzip compression. The prototype will validate the real number on a live Amplify deployment.- Prototype (§8) MUST validate actual response size on the dev Amplify environment, not just local. Local math doesn’t tell us whether CloudFront will accept the response.
6.1.1 Escalation plan if the 1MB limit is hit in production
Section titled “6.1.1 Escalation plan if the 1MB limit is hit in production”In order of increasing complexity:
- More aggressive stripping — cut
GridItemto <200 bytes per row (entityId + name + a few IDs). Richer columns move to lazy-load on row hover/expand. Feasible if product accepts reduced grid density. Buys us headroom to ~5k-8k items. - Manual compression in the API route — explicitly gzip the response body with
zliband setContent-Encoding: gzipheaders. Works regardless of whether the runtime compresses by default. Should land as part of task 3 in the sprint so we don’t depend on implicit behavior. - Streaming responses — Amplify Hosting Compute supports Next.js streaming responses (
ReadableStream/IterableReadableStream), which have documented exemptions from the CloudFront body-size check. Moderate implementation cost:useAllItemsQueryneeds to consume a stream progressively. - Migrate to AG Grid Server-Side Row Model against the BFF cache — instead of sending the whole dataset, respond to SSRM datasource requests with one page at a time. This is effectively phase 2 pulled forward without the backend dependency, but it’s a significant architectural shift and loses most of the client-side-everything benefits.
Recommendation: land mitigation #2 (manual compression) as part of the initial BFF route implementation in task 3, not as a later fix. Treat it as default architecture, not contingency. Reserve #3 and #4 as escape hatches if the prototype shows #1 + #2 still exceeds the limit.
HARD_MAX_ITEMS = 8_000 reflects the CloudFront constraint. The cap is lower than AG Grid’s browser performance ceiling (~20k) because the response-size limit binds first. Updated in the §5.3.2 code block.
6.2 Per-instance cache on Lambda
Section titled “6.2 Per-instance cache on Lambda”unstable_cache defaults to a filesystem cache handler which on Lambda is ephemeral /tmp. Cache hits only span warm-invocation reuse on the same Lambda instance. For this feature that’s acceptable: 2-hour TTL plus one query per tenant means hit rate should be reasonable. Measure and escalate to phase 3 only if production metrics demand it.
6.3 Inline build spec gotcha
Section titled “6.3 Inline build spec gotcha”Three Amplify apps with inline build specs override the committed amplify.yml. Any pipeline change must be applied per-app via aws amplify update-app or it silently does nothing. Not a feature concern — just a repeat reminder for anyone touching CI as part of this feature.
6.4 Unit tests in the build pipeline
Section titled “6.4 Unit tests in the build pipeline”New unit tests must be reliable or they block deployments. All new tests in this design target pure functions or mocked grid APIs — no timing assertions.
7. Sprint scope
Section titled “7. Sprint scope”7.1 Task breakdown
Section titled “7.1 Task breakdown”Estimates: S ≈ 0.5 day, M ≈ 1-2 days, L ≈ 3-5 days. Round up when in doubt — filter/sort work in a mature grid is historically underestimated.
| # | Task | Size | Files |
|---|---|---|---|
| 1 | itemsFilterSortSlice + persist config + schema version migration | S | itemsFilterSortSlice.ts (new), rootReducer.ts |
| 2 | GridItem type + stripToGridColumns mapper | S | itemsGridMapper.ts (new), types/items.ts |
| 3 | Fat-fetch BFF route with cursor loop, HARD_MAX guard, response headers | L | app/api/arda/items/all/route.ts (new), lib/cache/itemsAllCache.ts (new) |
| 4 | revalidateTag calls in write routes | S | items/route.ts, items/[entityId]/route.ts, draft route |
| 5 | fetchAllItems in ardaClient | S | ardaClient.ts |
| 6 | Custom useAllItemsQuery hook (module-scoped cache, dedupe, useSyncExternalStore) + invalidateAllItemsCache + mutation helpers | S | useAllItemsQuery.ts (new), useItemMutations.ts (new) |
| 7 | Enable filter: true + per-column filter types on every applicable column | M | columnPresets.tsx |
| 8 | Wire onFilterChanged / onSortChanged handlers + initialState restoration | M | ItemTableAGGrid.tsx |
| 9 | Items page migration from cursor fetching to useAllItemsQuery; remove cursor pagination UI; search box → quickFilter | L | items/page.tsx, itemsSlice.ts (remove pagination fields) |
| 10 | MSW handler for /api/arda/items/all | M | src/mocks/handlers/, src/mocks/data/ |
| 11 | Unit tests: slice + migration, cursor loop, HARD_MAX guard, column stripper, useAllItemsQuery dedupe + invalidation | M | *.test.ts files |
| 12 | Unit tests: grid handlers dispatch + quickFilter propagation | S | ItemTableAGGrid.test.tsx |
| 13 | E2E: new items-filter.spec.ts covering all filter types + persistence across reloads | L | new file |
| 14 | E2E: extend items-grid-interactions.spec.ts for multi-sort + client-side pagination | S | existing |
| 15 | E2E: update items-search.spec.ts as a regression test against client-side quickFilter | S | existing |
| 16 | Post-implementation update to this proposal | S | this file |
Rough total: 3 L + 4 M + 9 S (16 tasks). Shrunk vs the original plan because task 6 is now a self-contained custom hook instead of a new dependency + provider setup. Realistic for one sprint with one engineer plus buffer, or two engineers parallelizing BFF/API work against grid/UI work.
7.2 Dependencies
Section titled “7.2 Dependencies”- None blocking.
- Soft risk: unknown backend cap on
paginate.size. The internal cursor loop handles either case — if the backend returns everything in one page, the loop is a harmless one-iteration pass. - Hard risk over time: tenant item counts growing past ~8k (CloudFront-bound). Mitigated by HARD_MAX metric and the phase 2 escape hatch.
- DX nicety: add
NEXT_PUBLIC_AG_GRID_LICENSE_KEYto.env.examplewith a pointer to a shared dev key so local developers don’t see a watermark. Optional.
7.3 Risks and mitigations
Section titled “7.3 Risks and mitigations”| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Lambda response size exceeds limit with real data | Medium | Feature cannot ship | stripToGridColumns; prototype validates payload size first |
| Tenant has >8k items | Low-Medium | HARD_MAX triggers, grid shows incomplete data | HARD_MAX metric + documented phase 2 path. Prototype validates the real 8k threshold against live Amplify. |
| Initial fat fetch feels slow to users | Medium | UX regression from perceived latency | Aggressive BFF + client caching, loading skeleton, cached data shown during background refetch |
| Persisted filter model references renamed column IDs | Low | Restored state breaks grid | schemaVersion bump resets state on rehydrate |
AG Grid quickFilter differs from old regex search | Low | Results shift slightly | Regression spec covers expected matches; document in PR |
Backend paginate.size cap forces many loop iterations | Low | Slow fat-fetch on first miss | Metrics; tune BACKEND_PAGE_SIZE if allowed |
| Per-instance BFF hit rate is poor in production | Medium | Feature works but backend load higher than hoped | Phase 3 (shared cache) escalation, gated on metrics |
| Amplify build blocks on flaky tests | Low | Deploys blocked | Pure-function + mocked-grid tests only, no timing assertions |
7.4 Test plan
Section titled “7.4 Test plan”Unit tests:
itemsFilterSortSlice.test.ts— reducer correctness, schema version migration.itemsAllCache.test.ts— cursor loop terminates on emptynextPage, HARD_MAX guard fires at the right count, tag is present,stripToGridColumnsis applied.itemsGridMapper.test.ts— stripped item has exactly the grid-needed fields, no PII leakage beyond them.useAllItemsQuery.test.ts— query key composition by(tenantId, tab).ItemTableAGGrid.test.tsxadditions — filter/sort handlers dispatch;quickFilterpropagates;initialStaterestoration from Redux.
E2E tests (Playwright, mock mode):
items-filter.spec.ts(new):- Apply text filter → rows narrow in place, no network request.
- Apply set filter with 2 values → verify rows.
- Combine text + set filter.
- Apply filter → reload → filter restored.
- Switch tabs → filter on original tab persists.
- Clear filters button resets to unfiltered.
items-grid-interactions.spec.ts(extend):- Multi-column sort → verify row order.
- Sort persists across reload.
- Pagination page size selector works against the in-memory dataset.
items-search.spec.ts(regression):- Search filters grid via
quickFilter. - Search clears on navigation away.
- Search filters grid via
Manual smoke test (pre-deploy):
- Local with MSW: load 5k mock items, confirm time-to-interactive, exercise filter/sort/pagination.
- Dev Amplify env: verify fat-fetch response size, cache hit/miss headers, HARD_MAX header absent under normal load, no watermark visible.
7.5 Out of sprint 1 scope
Section titled “7.5 Out of sprint 1 scope”- Server-side filter/sort (phase 2, not available in current API).
- AG Grid Server-Side Row Model migration (phase 2 escape hatch, only if HARD_MAX triggers).
- Shared Redis BFF cache (phase 3, metrics-gated).
- Saved filter views / shareable URLs.
- Floating filters.
- Advanced filter expression builder.
- Optimistic updates for inline edits (optional, sprint 2 candidate).
- Canary
@arda-cards/api-proxyEntityDataGridmigration. - Env var plumbing for
ITEMS_ALL_TTL_SECONDS(the constant is fine for sprint 1).
8. Feasibility prototype (in ux-prototype Storybook)
Section titled “8. Feasibility prototype (in ux-prototype Storybook)”The prototype’s job is to validate that 5k rows with filter + sort + pagination is comfortable in AG Grid client-side row model, and that a stripped-column payload stays under the Lambda response limit.
8.1 Minimum prototype surface
Section titled “8.1 Minimum prototype surface”- One Storybook story rendering an AG Grid with 5,000 mock items in
GridItemshape. - At least four columns with different filter types:
agTextColumnFilter,agSetColumnFilter,agNumberColumnFilter,agDateColumnFilter. - Multi-column sort enabled.
- AG Grid built-in pagination, page size 50.
- A JSON panel showing the live filter model and sort model.
- A diagnostics panel showing: total rows, payload size estimate (
JSON.stringify(rows).length, optionally gzipped), initial render time, time-to-filter-applied after a filter change.
8.2 What the prototype validates
Section titled “8.2 What the prototype validates”- Performance at 5k. Initial render, filter/sort responsiveness, memory footprint.
- Payload size reality check. Does stripped
GridItem× 5k fit comfortably inside the gzipped Lambda response limit? - Filter model stability. Confirms
FilterModelis JSON-serializable and restorable viainitialState. - AG Grid pagination UX. Confirms the built-in pagination is an acceptable replacement for the current cursor UI.
8.3 What the prototype does NOT do
Section titled “8.3 What the prototype does NOT do”- Does not talk to the real backend.
- Does not test BFF cache or cursor loop (Next.js-side concerns).
- Does not validate Redux persistence (unit tested in
arda-frontend-app). - Does not migrate to canary
EntityDataGridprimitives.
8.4 Fail-fast conditions
Section titled “8.4 Fail-fast conditions”If the prototype shows any of the following at 5k rows, stop sprint 1 planning and revisit the design:
- Initial render > 2s.
- Filter/sort interaction > 200ms.
- Stripped payload > 1MB gzipped (the CloudFront response-size limit from §6.1 — measured post-compression on the dev Amplify deployment, not locally).
Any of these suggests Server-Side Row Model or partial loading is needed, which means a backend conversation and a different sprint shape.
9. Migration path
Section titled “9. Migration path”PR sequence, each independently shippable and targeting dev:
- PR 1 — Slice + persistence +
GridItemmapper. Purely additive. No UI changes. - PR 2 — BFF fat-fetch route + cache wrapper + MSW handler. Additive. New endpoint nothing consumes yet. Unit tested.
- PR 3 —
revalidateTagcalls in write routes. Additive, one line per route. - PR 4 —
useAllItemsQuerycustom hook +fetchAllItemsinardaClient+ mutation helpers. Additive client plumbing. No new dependencies. - PR 5 — Enable filter UI + wire handlers +
initialStaterestoration. User-visible but transitional: filters work client-side over the still-cursor-paginated dataset. Users can interact with filter UI but it only narrows the visible page. Ship behind a feature flag if stakeholders want rollback headroom. - PR 6 — The flip. Switch items page from cursor fetching to
useAllItemsQuery, remove cursor pagination UI, enable AG Grid built-in pagination, migrate search box toquickFilter. Architecturally significant PR — deserves extra review. E2E tests updated alongside. - PR 7 — Remove dead code. Cursor fields from
itemsSlice, old server-side search logic fromitems/page.tsx, any orphaned handlers. - PR 8 — Docs update to this file capturing anything that changed between design and implementation.
PR 5 is the feature-flag moment if the team wants progressive rollout. Otherwise PRs 5 and 6 can land together.
Each PR targets dev. No force-pushes. Conventional commit footer per project norms.
10. Answering the ticket acceptance criteria
Section titled “10. Answering the ticket acceptance criteria”| Criterion | Met by |
|---|---|
| Architecture document covers data flow, state management, column filter type mapping, and migration path | §4 (data flow), §5 (state + grid wiring + column filter types), §9 (migration path) |
| Feasibility prototype demonstrates at least single-column filter + sort with AG Grid Enterprise | §8 — Storybook story with 5k rows and four filter types |
| Sprint proposal has task-level estimates and identifies dependencies | §7.1 (tasks with sizes), §7.2 (no blocking deps) |
| Prototype validates that BFF caching approach works with the current API pagination model | §5.3 — BFF absorbs cursor pagination internally, caches the full list per tenant for 2 hours with four explicit refresh triggers |
| Validates the architectural decisions in the Goal document | §2 — every adopted decision is mapped to a concrete implementation; §2.2 explicitly interprets the “server side rendering of the page against cached items” phrase |
Honest restatement of the cache criterion: the existing cursor pagination is incompatible with client-side filter/sort over any dataset larger than one page, so we abandon cursor pagination as a client-facing concept and let the BFF absorb it. The client asks for “all items for this tenant and tab,” the BFF loops the backend cursor internally until exhausted, strips each item to grid columns, and caches the result as a single entity under a 2-hour TTL with four explicit refresh triggers. This matches the Goal document’s adopted BFF-cache design and satisfies the ticket’s validation requirement.
11. Decisions recorded
Section titled “11. Decisions recorded”| Decision | Alternative considered | Why chosen |
|---|---|---|
| Client-side everything (fat fetch + AG Grid in-memory) | Hybrid with server-side narrowing via translator | Server-side sort and richer filter operators are not available in the current API; 5k items is within AG Grid’s comfort zone |
| BFF absorbs cursor loop internally | Expose cursor pagination to the client | Client-side filter/sort over a single page is broken UX; cursor mechanics are an implementation detail |
Stripped GridItem payload, not full Item | Return full items | Lambda response size ~1MB cap — stripping keeps us safely under it |
HARD_MAX_ITEMS = 8_000 | No cap, or 20k (browser perf) | Bounded by the Amplify CloudFront 1MB response limit (~0.5KB per stripped item × 5x gzip compression ≈ 10k theoretical, 8k with safety margin). Prototype validates against real Amplify dev. This is tighter than the browser-performance ceiling of ~20k |
Custom useAllItemsQuery hook, not React Query | Add @tanstack/react-query as a new dep | Only one cached endpoint in this feature. A ~90-line module-scoped hook delivers mount-refetch, dedupe, cross-component reactivity, and tenant-wide invalidation with no new library or provider. React Query is worth adopting when the app has three or more cached endpoints sharing the machinery — for one endpoint it’s overhead |
| Published-only scope for sprint 1 | Per-tab Record<tab, filterModel> | Only the Published Items tab is active today. Building per-tab state infrastructure for tabs that don’t exist yet is premature complexity. If Draft/Recently Uploaded tabs go live later, extending the slice to Record<tab, ...> is a small follow-up |
No key={activeTab} remount needed for sprint 1 | Re-key the grid on tab change | Only one tab is active (Published), so no tab switching to handle. If tabs are added later, key={activeTab} forces a clean remount per tab |
Separate itemsFilterSortSlice from itemsSlice | Extend itemsSlice | Avoids merge friction with in-flight itemsSlice work, keeps persistence targeted |
| Schema version in the slice | No version | Cheap hedge against column renames invalidating persisted state |
AG Grid quickFilter for search | Keep server-side regex | With full dataset in memory, client-side search is instant and covers every column |
| AG Grid built-in client-side pagination | Custom pagination UI | Native to client-side row model; custom would fight the grid |
rowModelType: 'clientSide' explicit | Rely on default | The whole architecture depends on it; make it visible in code |
unstable_cache per-Lambda-instance BFF cache | In-memory LRU | Ergonomic, composes with revalidateTag, same per-instance limitation as LRU but with better primitives |
| 2-hour BFF TTL | 30s / 3 min | Matches Goal doc’s stated refresh ceiling; explicit triggers cover active-session staleness |
Tenant-wide revalidateTag on writes | Per-entity tags | Simple, correct, cheap; the TTL bounds staleness anyway |
| Phase 2 reserved for Server-Side Row Model + backend | Ship phase 2 alongside | Server-side sort/filter is a future API capability; phase 2 is scoped for when it becomes available |
| Phase 3 reserved for shared Redis cache | Ship phase 3 in sprint 1 | No metrics to justify infra cost; per-instance cache + short TTL is enough for phase 1 |
| Surgical cache patches on mutation, NOT full invalidation | Full tenant-wide invalidate + refetch on every edit | A 5-15s loading spinner after saving a single field is bad UX and wasted backend load. Mutation hooks patch one row in place via patchItemInCache / addItemToCache / removeItemFromCache. BFF revalidateTag still fires for cross-tab / cross-user consistency. Full invalidation reserved for login, list mount, and 2h TTL |
| Pessimistic surgical updates in sprint 1 | Optimistic (patch immediately, roll back on error) | Simpler default; backend confirms before the cache mutates. Optimistic is a sprint 2 candidate once pessimistic is measured and if UX demands it |
| Feature flag on PR 5, flip on PR 6 | All-at-once | Lets stakeholders preview the filter UI before the cursor-to-full-list flip; rollback headroom |
| Interpret “server side rendering of the page” as “serve data from cache” | Implement AG Grid SSRM against BFF cache | At 5k items, client-side is simpler, faster, and matches the Goal doc’s refresh-policy semantics |
End of proposal.
Copyright: © Arda Systems 2025-2026, All rights reserved