PDEV-442 — First-pass findings: product slow responses

Linear: PDEV-442 Session: 2026-05-13 (UTC), tenant Arda-live on https://live.app.arda.cards. Tooling: Chrome via claude-in-chrome MCP for in-session capture; Sentry MCP (arda-systems / arda-frontend) for 24h fleet aggregates.

[!important] Amplify SSR compute is not configurable. The “Amplify Lambda CPU / memory posture” section at the end of this document recommends bumping SSR Lambda memorySize (1,769 MB → 3,072 MB) and notes cold-start mitigation via provisioned concurrency. These recommendations are not actionable: AWS Amplify Hosting exposes no IaC, CLI, or console path to configure SSR runtime memory size, vCPU, reserved concurrency, or provisioned concurrency. Verified against the full AWS::Amplify::App CloudFormation property list and Amplify Hosting documentation. AWS::Amplify::App.CacheConfig exists as a configurable property but its only meaningful alternative value (AMPLIFY_MANAGED_NO_COOKIES) would cross-serve cached SSR responses across authenticated users — a correctness regression in our multi-tenant app — so the default AMPLIFY_MANAGED is the only safe choice and there is no useful tuning available there either.

Treat the compute-side recommendations as informational only; actionable performance work for the front end lives in PDEV-489 (ISR adoption, bundle reduction, N+1 elimination, BFF parallelisation). Migrating SSR off Amplify Hosting onto a custom Lambda + CloudFront stack would unlock these levers but is out of scope for the slow-responses project.

TL;DR

The single largest source of slowness is the /items page. The table data itself loads fast (~440 ms via /api/arda/items/query-ssrm), but for every row the page fans out two per-row requests to the kanban service to fetch the card data. With the test tenant’s 6 items that is 12 extra requests, each taking 1.2–1.7 s on a warm session. Sentry shows the same call pattern catastrophically degraded at fleet scale: the proxy route GET /api/arda/kanban/kanban-card/query-by-item averages 38 s with a p95 of 158 s over 2,921 hits in the last 24 h, and the /items pageload transaction lands at p95 19.5 s (vs. /order-queue p95 2.4 s).

Order Queue, by contrast, looks healthy: it does 3 batched calls (/api/arda/kanban/kanban-card/details/{requested,requesting,in-process}), ~390–465 ms each.

What was exercised

Step	URL	Notes
Initial nav	`/items?justSignedIn=true`	Session already authenticated from prior Chrome run
Reload	`/items`	Baseline timings (cold-ish, after session warm-up)
Click Order Queue	`/order-queue`	Snapshot post-nav timings
Click Items	`/items`	Confirm the per-row fan-out reproduces on every entry
Filter	`/items` (filter combobox + Enter)	Single `/items/query-ssrm` re-fetch
Hard reload	`/items` (Cmd+Shift+R)	Cold-cache timings
4-visit aggregate	`/order-queue` → `/items` × 4	Per-call timings, browser-side, with payload sizes

Sort columns (clicked column header buttons) did not produce visible network activity — likely a sort-options menu rather than direct sort. Worth a deeper look in a follow-up.

In-session measurements

All durations are wall-clock from the browser’s resource-timing API.

/items reload — 22 API calls, 12 dominated by per-row fan-out

Endpoint	Count	Avg ms	Max ms
`/api/arda/kanban/kanban-card/query-details-by-item`	6	1,706	1,801
`/api/arda/kanban/kanban-card/query-by-item`	6	1,535	1,721
`/api/arda/tenant/agent-for/query`	1	3,144	3,144
`/api/arda/tenant/{tenantId}`	1	3,029	3,029
`/api/arda/kanban/kanban-card/details/requesting`	1	1,120	1,120
`/api/arda/kanban/kanban-card/details/in-process`	1	1,060	1,060

Notes:

The two query-{details-,}by-item endpoints fire one request per row in parallel from ~777 ms after navigation. With 6 rows the page absorbs the cost; with 60 rows the upstream pool/headroom collapses (see Sentry below).
/tenant/agent-for/query and /tenant/{tenantId} are session-scoped lookups that block first paint and take ~3 s each, in parallel.
DOMContentLoaded was 248 ms; the user-perceived wait is dominated by the per-row data fetch, not the document or bundle.

Items → Order Queue → Items (re-entry)

Page	Total API	Notable
`/order-queue`	3	All `kanban-card/details/{state}` — 389–465 ms
`/items` (re-entry)	13	1 `query-ssrm` 444 ms; 6× `query-details-by-item` avg 1,518 ms; 6× `query-by-item` avg 1,186 ms

Per-row fan-out reproduces deterministically on every entry to /items — the results are not cached across the navigation. Order Queue does not exhibit this pattern: it consolidates by lane.

Hard cold reload of `/items` (Cmd+Shift+R)

The picture under a hard reload is darker than the warm case because the per-row fan-out runs slower against a cold session:

Metric	Value
TTFB	27 ms
First Contentful Paint	532 ms
DOMContentLoaded	167 ms
`load` event	580 ms
Scripts loaded	34 files, 1.65 MB transferred / 6.06 MB decoded
Stylesheets	13 files, 173 KB / 898 KB decoded
Last API completion (table fully populated)	~4,976 ms

Per-row fan-out timings (cold):

Endpoint	Count	Avg ms	Max ms	Start–End window
`kanban-card/query-details-by-item`	6	2,887	3,985	990 → 4,976
`kanban-card/query-by-item`	6	2,756	3,958	990 → 4,950
`kanban-card/details/{state}` × 3	3	444–779	779	786 → 1,564
`tenant/query`	1	229

FCP is reached at 532 ms but the table’s “card data” columns remain in skeleton state for ~5 seconds, matching the visible “Loading card data…” / “Checking for available cards…” placeholders in the initial DOM snapshot.

Bundle size is large (6 MB decoded JS, 34 script files) but on this network it is not the dominant contributor — the cold wait is the per-row N+1.

Four-visit aggregate to `/items` (instrumented Chrome)

Methodology: navigate /order-queue → clear resource timings → navigate /items → wait 8 s → snapshot every /api/* resource (startTime, duration, transferSize). Repeated 4 times in sequence (visits A, B, C, D). The tenant has 6 items, so per-row endpoints fire 6 times per visit.

Endpoint	Calls/visit	Samples	Min ms	Avg ms	Max ms	Avg payload (bytes)
`/api/arda/kanban/kanban-card/query-details-by-item`	6	24	883	2,032	3,471	3,271
`/api/arda/kanban/kanban-card/query-by-item`	6	24	625	1,942	3,434	1,995
`/api/arda/kanban/kanban-card/details/requesting`	1	4	884	1,852	2,471	2,702
`/api/arda/kanban/kanban-card/details/requested`	1	4	807	1,726	2,329	908
`/api/arda/tenant/query`	1	4	202	1,653	3,145	1,718
`/api/arda/kanban/kanban-card/details/in-process`	1	4	801	1,519	2,136	920
`/api/arda/items/query-ssrm`	1	4	103	204	250	7,920
`/api/arda/tenant/{tenantId}`	1	4	137	204	266	998
`/api/storage/cdn-cookies`	1	4	104	194	302	300
`/api/arda/user-account/query`	1	4	148	174	209	1,739
`/api/arda/tenant/agent-for/query`	1	4	140	156	162	1,545
`/api/pylon/email-hash`	1	4	96	104	116	391

Per-row fan-out, broken down by visit (avg / max ms across the 6 calls):

Endpoint	Visit A	Visit B	Visit C	Visit D
`kanban-card/query-details-by-item`	1,564 / 3,471	2,263 / 2,520	1,797 / 1,845	2,504 / 2,572
`kanban-card/query-by-item`	2,298 / 3,434	1,856 / 2,278	1,489 / 1,584	2,125 / 2,368

Notes from this aggregate:

The N+1 fan-out reproduces 100% deterministically: every entry to /items issues exactly 6 + 6 per-row kanban requests, regardless of prior navigation. No caching across navigations.
The two per-row endpoints sit at ~2 s avg, with worst-case calls in any single visit hitting 3.4–3.5 s. With 6 rows the parallel batch finishes in ~3.5 s; with 60 rows the upstream pool will saturate.
tenant/query shows the expected long-tail behavior at the browser level: avg 1.65 s but min 0.2 s and max 3.15 s — same shape as Sentry’s BFF transaction view (huge avg/p75 gap on tenant lookups).
items/query-ssrm (the actual table data) is consistently fast (avg 204 ms) — the slowness is not in fetching the rows, it’s the per-row card data.
Total payload across all 22 calls per visit is ~36 KB — this is not a bandwidth problem.

Filter on items (typed “card”, pressed Enter)

Endpoint	Count	Avg ms
`/api/arda/items/query-ssrm`	1	438

No kanban-card fan-out re-fired — likely the filter returned a subset already in the row-data cache, or the cards’ react-query keys deduped the refetch. To verify whether the fan-out repeats on every filter that returns new rows, re-run with a search that surfaces rows not yet seen in the session.

Sentry corroboration (last 24 h, `arda-frontend`)

Pageload transactions sorted by p95

Transaction	Count	Avg	p75	p95
`/items`	25	5.2 s	8.5 s	19.6 s
`/signin`	32	4.0 s	4.5 s	10.6 s
`/signup`	10	3.9 s	5.8 s	5.8 s
`/order-queue`	10	2.1 s	2.4 s	2.4 s
`/print-viewer`	150	1.7 s	2.4 s	2.6 s
`/reset-password`	5	1.8 s	1.8 s	1.8 s

/items is the slowest pageload in the product and the gap to /order-queue (2.4 s p95) is roughly 8×.

Slowest backend proxy transactions (Next.js route handlers)

Transaction	Count	Avg	p75	p95
`GET /api/arda/kanban/kanban-card/query-by-item`	2,921	38 s	16.6 s	158.6 s
`GET /api/arda/kanban/kanban-card/[eId]`	40	1.4 s	2.5 s	5.8 s
`GET /api/arda/items/lookup-locations`	50	1.2 s	0.3 s	8.1 s
`GET /api/arda/tenant/[tenantId]`	171	4.1 s	0.15 s	0.40 s

Two distinct signals:

query-by-item is genuinely slow end-to-end (38 s avg, 158 s p95). Top http.client spans inside it are upstream calls to prod.alpha001.io.arda.cards/v1/kanban/kanban-card/for-item/{itemId}, each averaging 8–17 s. Frontend is faithfully reporting upstream pain.
/api/arda/tenant/[tenantId] has a huge avg/p75 gap (4.1 s avg, 0.15 s p75) — long-tail pathological cases hiding behind a healthy median. Two of those calls on this session each took ~3 s.

BFF outbound calls (`http.client` to `prod.alpha001.io.arda.cards`)

The same span data, but aggregated by the parent BFF route, isolates how much time is spent in the upstream operations service vs. inside the BFF itself.

BFF route (parent transaction)	Count	Avg	p75	p95
`POST /api/arda/kanban/kanban-card/query-details-by-item`	875	5.5 s	7.9 s	13.7 s
`GET /api/arda/kanban/kanban-card/query-by-item`	2,845	4.0 s	4.8 s	13.5 s
`POST /api/arda/kanban/kanban-card/print-card`	30	3.2 s	3.7 s	4.9 s
`POST /api/arda/items/query-ssrm`	250	0.52 s	0.79 s	1.84 s
`POST /api/arda/item/item/print-breadcrumb`	5	1.22 s	1.22 s	1.22 s
`POST /api/arda/kanban/kanban-card/details/requesting`	1,040	0.55 s	0.55 s	0.97 s
`POST /api/arda/kanban/kanban-card/details/in-process`	1,120	0.47 s	0.46 s	0.89 s
`POST /api/arda/kanban/kanban-card/details/requested`	1,085	0.44 s	0.42 s	0.79 s
`POST /api/arda/items`	25	0.34 s	0.37 s	0.41 s
`POST /api/arda/kanban/kanban-card/details/fulfilled`	5	0.21 s	0.21 s	0.21 s
`POST /api/image-upload`	20	0.17 s	0.19 s	0.19 s
`GET /api/arda/items/lookup-units`	50	0.13 s	0.16 s	0.24 s
`GET /api/arda/items/lookup-suppliers`	45	0.13 s	0.14 s	0.19 s
`GET /api/arda/items/lookup-locations`	50	0.09 s	0.11 s	0.16 s
`POST /api/arda/kanban/kanban-card/[eId]/event/request`	35	0.09 s	0.09 s	0.22 s
`GET /api/arda/tenant/[tenantId]`	175	0.06 s	0.07 s	0.14 s
`POST /api/arda/user-account/query`	160	0.07 s	0.08 s	0.12 s
`POST /api/arda/tenant/agent-for/query`	195	0.06 s	0.08 s	0.12 s
`POST /api/arda/tenant/query`	180	0.06 s	0.06 s	0.10 s
`GET /api/arda/kanban/kanban-card/[eId]`	40	0.08 s	0.07 s	0.31 s
`PUT /api/arda/user-account/[eId]`	10	0.09 s	0.11 s	0.11 s
`GET /api/arda/items/[entityId]`	10	0.10 s	0.11 s	0.11 s
`POST /api/arda/kanban/kanban-card/[eId]/event/accept`	15	0.08 s	0.10 s	0.10 s
`POST /api/arda/kanban/kanban-card/[eId]/event/start-processing`	10	0.08 s	0.09 s	0.09 s

Outbound calls aggregated by upstream URL pattern

Sentry’s auto-grouping does not normalize UUIDs in span.description, so the raw per-span.description view gives one row per itemId. The table below is the same data re-aggregated by upstream URL pattern using wildcard queries, so it directly answers “how slow is each upstream endpoint regardless of which BFF route invoked it”.

Upstream URL pattern	Count	Avg	p75	p95
`GET /v1/kanban/kanban-card/for-item/{itemId}`	2,846	4.0 s	4.8 s	13.5 s
`POST /v1/kanban/kanban-card/details`	875	5.5 s	7.9 s	13.7 s
`POST /v1/kanban/kanban-card/details/in-process`	1,120	0.47 s	0.46 s	0.89 s
`POST /v1/kanban/kanban-card/details/requested`	1,085	0.44 s	0.42 s	0.79 s
`POST /v1/kanban/kanban-card/details/requesting`	1,040	0.55 s	0.55 s	0.97 s
`POST /v1/tenant/tenant/query`	180	0.06 s	0.06 s	0.10 s
`GET /v1/tenant/tenant/{tenantId}` (sum)	~175	0.05–0.14 s	—	≤0.15 s

Two upstream endpoints stand out clearly: for-item/{itemId} and kanban-card/details (the POST variant fed by query-details-by-item). Both run at p95 ~13.5 s and account for essentially all of the genuine upstream-side slowness.

BFF transaction vs. upstream `http.client` — where does the time go?

The BFF transaction totals (in “Slowest backend proxy transactions” above) are far higher than the upstream call durations, even though the counts are essentially 1:1 (i.e., the BFF is not looping or retrying):

Route	BFF transactions	Upstream calls	Ratio
`query-by-item`	2,921	2,846	0.97
`tenant/[tenantId]`	171	~175	≈1.0
`lookup-locations`	50	50	1.0

Route	BFF transaction p95	Upstream `http.client` p95	Unexplained gap
`GET /api/arda/kanban/kanban-card/query-by-item`	158.6 s	13.5 s	~145 s inside BFF
`GET /api/arda/kanban/kanban-card/[eId]`	5.8 s	0.31 s	~5.5 s inside BFF
`GET /api/arda/items/lookup-locations`	8.1 s	0.16 s	~7.9 s inside BFF
`GET /api/arda/tenant/[tenantId]`	0.40 s (p95); 4.1 s avg	0.14 s	Most of the avg is inside BFF

The gap is therefore not explained by repeated upstream queries. Given that the app is served by AWS Amplify Hosting with a Serverless Next.js back end, the strongest candidates are:

Lambda cold starts. The BFF transaction timer in Sentry’s Node SDK starts at the beginning of the Lambda invocation, which on a cold container includes Lambda init + Next.js server-bundle load + middleware. With a 6 MB-decoded JS bundle, cold starts in the 5–10 s range are plausible. This fits the 5–8 s gap on otherwise-cheap routes (kanban-card/[eId], lookup-locations, tenant/[tenantId]) that have no slow upstream call.
Per-container request queueing / concurrency cap. Amplify SSR Compute caps concurrent requests per Lambda container; new requests wait for an idle slot, and that wait is inside the BFF transaction span. Combined with slow upstream calls, this would stack waits and could explain the 145 s outlier extent on query-by-item.
Aborted / errored transactions that never reached upstream. The 2,921→2,846 inbound/outbound delta on query-by-item is ~76 transactions (~2.6%). Hard timeouts or auth failures before fetch() would all sit in the BFF transaction tail without contributing to the upstream p95.

So slowness is two-source, not “the BFF loops”:

Operations is genuinely slow for for-item/{itemId} and kanban-card/details — upstream p95 ~13.5 s per call.
Amplify SSR cold-start and queueing add 5–8 s baseline on cold routes and stretch the tail on already-slow routes.

Hypotheses to triage next

In order of expected impact, lowest-effort first:

Eliminate the per-row fan-out on /items. The query-ssrm endpoint should return enough kanban data (counts, status, the few fields shown in the row) for the table’s “card data” columns to render without a second round-trip. If a richer per-row payload is too heavy, add a single batched kanban-card/query-details-for-items?ids=... endpoint so the table can issue exactly one extra request, not N.
Make the tenant/session lookups parallel with — not gating — the table load. /tenant/agent-for/query + /tenant/{tenantId} together account for ~3 s of head-of-line latency on every entry to a logged-in page.
Investigate the upstream kanban-card/for-item/{itemId} operation in the operations component. Sentry’s frontend spans show the call averaging 8–17 s. Since operations is not yet instrumented in Sentry, the next step is CloudWatch / kubectl logs for this endpoint to look for slow queries, sequential DB reads, or N+1 inside the service.
Look at /api/arda/items/lookup-locations — p95 8 s with 50 calls suggests an occasional pathological tenant or cold cache.
Confirm filter/sort behavior in the items table. Sort header click produced no network activity; need to check whether AG Grid is using client-side sort (acceptable for SSRM-resident pages) or whether the sort request is dropped silently.

Open questions for triage

Are the per-row kanban calls a deliberate progressive-render pattern, or an accidental fan-out introduced by a useQuery hook inside the row renderer?
Is the upstream kanban-card/for-item/{itemId} route paged, or does it return every card for an item every time? Tenants with high card counts per item would compound the slowness multiplicatively.
Why is the proxy route’s p95 (158 s) so much higher than the underlying upstream span’s p95 (17 s)? Candidates: serial calls in the route handler, request queueing in the Next.js server, or upstream timeouts being retried.

Suggested next steps

Operations-side dig (CloudWatch + kubectl) on the upstream kanban-card/for-item/{itemId} route, since arda-frontend Sentry already proves it is the dominant contributor and operations isn’t instrumented.
Code dig in arda-frontend-app for the items page row renderer to confirm the source of the per-row fan-out and prototype a batched endpoint.
Second observation pass with a larger tenant to confirm scaling behavior (the 6-row tenant masks the worst case).

Amplify Lambda CPU / memory posture (24 h, 23,292 invocations)

Sampled from REPORT lines in /aws/amplify/duhexavnwh88g via CloudWatch Logs Insights:

Metric	Value
Configured `MemorySize`	1,024 MB
Allocated vCPU (proportional to memory)	~0.58 vCPU
`Max Memory Used` — max seen	253 MB (≈ 25 % of budget)
`Max Memory Used` — p99	214 MB
`Max Memory Used` — avg	193 MB
Cold-start rate	1,891 / 23,292 = 8.1 %
Cold-start `Init Duration` — avg	2,355 ms
Cold-start `Init Duration` — max	3,366 ms
`Duration` — avg	1,826 ms
`Duration` — p95	6,612 ms
`Duration` — p99	10,419 ms
`Duration` — max	19,033 ms

Memory is not the bottleneck

Even with the new caching, the Lambda peaks at 253 MB / 1,024 MB — ~25 % utilisation. The unstable_cache setup in cachedItems.ts (HARD_MAX_ITEMS = 15_000, stripped + mapped copy per tenant) fits comfortably. The cache could roughly double in size without pushing memory limits.

CPU is very likely a bottleneck

At 1,024 MB, Lambda allocates ~0.58 vCPU. Three independent signals:

Cold-start Init Duration averages 2.35 s — Next.js module load is CPU-bound. More vCPU shortens this proportionally.
8.1 % cold-start rate — frequent enough that the cold-start tax alone is a multi-percent contributor to the BFF latency distribution.
query-ssrm/route.ts runs applyFilters + applySorting over up to 15,000 in-process items on every scroll request, and on the first block also builds filter-option sets across all rows. This is pure CPU on the request hot path.

Recommendation: bump `memorySize`

Lambda pricing is per GB-ms, so the cost increase is small if duration drops proportionally (the expected outcome for CPU-bound work).

Target `memorySize`	Approx vCPU	Expected effect
2,048 MB	~1.16 vCPU	Halves cold-start init; meaningful speed-up on `query-ssrm` filter/sort
3,072 MB	~1.75 vCPU	~3× cold-start improvement; near-saturation of CPU on single-core JS
1,769 MB	1.00 vCPU	Smallest size that gives a full vCPU; cheapest defensible upgrade

Recommend starting at 1,769 MB as the smallest size that crosses the 1-vCPU line, and re-measuring the same metrics over 24 h. If cold-start duration drops by ≥40 % and the BFF p95 contributions drop proportionally, hold there; otherwise step to 3,072 MB.

Caveats

Max Memory Used is per-invocation, not a steady-state cache size. Persistent caches are counted because they’re resident — 253 MB max already includes the cache.
The cache is per-Lambda-container. Each cold container starts empty and pays full upstream cost on first request. Higher cold-start rate → lower cache hit ratio. Bumping vCPU shortens cold-start time and helps cache amortisation in turn.

PDEV-442 — First-pass findings: product slow responses

PDEV-442 — First-pass findings: product slow responses

TL;DR

What was exercised

In-session measurements

/items reload — 22 API calls, 12 dominated by per-row fan-out

Items → Order Queue → Items (re-entry)

Hard cold reload of /items (Cmd+Shift+R)

Four-visit aggregate to /items (instrumented Chrome)

Filter on items (typed “card”, pressed Enter)

Sentry corroboration (last 24 h, arda-frontend)

Pageload transactions sorted by p95

Slowest backend proxy transactions (Next.js route handlers)

BFF outbound calls (http.client to prod.alpha001.io.arda.cards)

Outbound calls aggregated by upstream URL pattern

BFF transaction vs. upstream http.client — where does the time go?

Hypotheses to triage next

Open questions for triage

Suggested next steps

Amplify Lambda CPU / memory posture (24 h, 23,292 invocations)

Memory is not the bottleneck

CPU is very likely a bottleneck

Recommendation: bump memorySize

Caveats

Hard cold reload of `/items` (Cmd+Shift+R)

Four-visit aggregate to `/items` (instrumented Chrome)

Sentry corroboration (last 24 h, `arda-frontend`)

BFF outbound calls (`http.client` to `prod.alpha001.io.arda.cards`)

BFF transaction vs. upstream `http.client` — where does the time go?

Recommendation: bump `memorySize`