Skip to content

PDEV-442 — First-pass findings: product slow responses

PDEV-442 — First-pass findings: product slow responses

Section titled “PDEV-442 — First-pass findings: product slow responses”

Linear: PDEV-442 Session: 2026-05-13 (UTC), tenant Arda-live on https://live.app.arda.cards. Tooling: Chrome via claude-in-chrome MCP for in-session capture; Sentry MCP (arda-systems / arda-frontend) for 24h fleet aggregates.

[!important] Amplify SSR compute is not configurable. The “Amplify Lambda CPU / memory posture” section at the end of this document recommends bumping SSR Lambda memorySize (1,769 MB → 3,072 MB) and notes cold-start mitigation via provisioned concurrency. These recommendations are not actionable: AWS Amplify Hosting exposes no IaC, CLI, or console path to configure SSR runtime memory size, vCPU, reserved concurrency, or provisioned concurrency. Verified against the full AWS::Amplify::App CloudFormation property list and Amplify Hosting documentation. AWS::Amplify::App.CacheConfig exists as a configurable property but its only meaningful alternative value (AMPLIFY_MANAGED_NO_COOKIES) would cross-serve cached SSR responses across authenticated users — a correctness regression in our multi-tenant app — so the default AMPLIFY_MANAGED is the only safe choice and there is no useful tuning available there either.

Treat the compute-side recommendations as informational only; actionable performance work for the front end lives in PDEV-489 (ISR adoption, bundle reduction, N+1 elimination, BFF parallelisation). Migrating SSR off Amplify Hosting onto a custom Lambda + CloudFront stack would unlock these levers but is out of scope for the slow-responses project.

The single largest source of slowness is the /items page. The table data itself loads fast (~440 ms via /api/arda/items/query-ssrm), but for every row the page fans out two per-row requests to the kanban service to fetch the card data. With the test tenant’s 6 items that is 12 extra requests, each taking 1.2–1.7 s on a warm session. Sentry shows the same call pattern catastrophically degraded at fleet scale: the proxy route GET /api/arda/kanban/kanban-card/query-by-item averages 38 s with a p95 of 158 s over 2,921 hits in the last 24 h, and the /items pageload transaction lands at p95 19.5 s (vs. /order-queue p95 2.4 s).

Order Queue, by contrast, looks healthy: it does 3 batched calls (/api/arda/kanban/kanban-card/details/{requested,requesting,in-process}), ~390–465 ms each.

StepURLNotes
Initial nav/items?justSignedIn=trueSession already authenticated from prior Chrome run
Reload/itemsBaseline timings (cold-ish, after session warm-up)
Click Order Queue/order-queueSnapshot post-nav timings
Click Items/itemsConfirm the per-row fan-out reproduces on every entry
Filter/items (filter combobox + Enter)Single /items/query-ssrm re-fetch
Hard reload/items (Cmd+Shift+R)Cold-cache timings
4-visit aggregate/order-queue/items × 4Per-call timings, browser-side, with payload sizes

Sort columns (clicked column header buttons) did not produce visible network activity — likely a sort-options menu rather than direct sort. Worth a deeper look in a follow-up.

All durations are wall-clock from the browser’s resource-timing API.

/items reload — 22 API calls, 12 dominated by per-row fan-out

Section titled “/items reload — 22 API calls, 12 dominated by per-row fan-out”
EndpointCountAvg msMax ms
/api/arda/kanban/kanban-card/query-details-by-item61,7061,801
/api/arda/kanban/kanban-card/query-by-item61,5351,721
/api/arda/tenant/agent-for/query13,1443,144
/api/arda/tenant/{tenantId}13,0293,029
/api/arda/kanban/kanban-card/details/requesting11,1201,120
/api/arda/kanban/kanban-card/details/in-process11,0601,060

Notes:

  • The two query-{details-,}by-item endpoints fire one request per row in parallel from ~777 ms after navigation. With 6 rows the page absorbs the cost; with 60 rows the upstream pool/headroom collapses (see Sentry below).
  • /tenant/agent-for/query and /tenant/{tenantId} are session-scoped lookups that block first paint and take ~3 s each, in parallel.
  • DOMContentLoaded was 248 ms; the user-perceived wait is dominated by the per-row data fetch, not the document or bundle.

Items → Order Queue → Items (re-entry)

Section titled “Items → Order Queue → Items (re-entry)”
PageTotal APINotable
/order-queue3All kanban-card/details/{state} — 389–465 ms
/items (re-entry)131 query-ssrm 444 ms; query-details-by-item avg 1,518 ms; query-by-item avg 1,186 ms

Per-row fan-out reproduces deterministically on every entry to /items — the results are not cached across the navigation. Order Queue does not exhibit this pattern: it consolidates by lane.

The picture under a hard reload is darker than the warm case because the per-row fan-out runs slower against a cold session:

MetricValue
TTFB27 ms
First Contentful Paint532 ms
DOMContentLoaded167 ms
load event580 ms
Scripts loaded34 files, 1.65 MB transferred / 6.06 MB decoded
Stylesheets13 files, 173 KB / 898 KB decoded
Last API completion (table fully populated)~4,976 ms

Per-row fan-out timings (cold):

EndpointCountAvg msMax msStart–End window
kanban-card/query-details-by-item62,8873,985990 → 4,976
kanban-card/query-by-item62,7563,958990 → 4,950
kanban-card/details/{state} × 33444–779779786 → 1,564
tenant/query1229

FCP is reached at 532 ms but the table’s “card data” columns remain in skeleton state for ~5 seconds, matching the visible “Loading card data…” / “Checking for available cards…” placeholders in the initial DOM snapshot.

Bundle size is large (6 MB decoded JS, 34 script files) but on this network it is not the dominant contributor — the cold wait is the per-row N+1.

Four-visit aggregate to /items (instrumented Chrome)

Section titled “Four-visit aggregate to /items (instrumented Chrome)”

Methodology: navigate /order-queue → clear resource timings → navigate /items → wait 8 s → snapshot every /api/* resource (startTime, duration, transferSize). Repeated 4 times in sequence (visits A, B, C, D). The tenant has 6 items, so per-row endpoints fire 6 times per visit.

EndpointCalls/visitSamplesMin msAvg msMax msAvg payload (bytes)
/api/arda/kanban/kanban-card/query-details-by-item6248832,0323,4713,271
/api/arda/kanban/kanban-card/query-by-item6246251,9423,4341,995
/api/arda/kanban/kanban-card/details/requesting148841,8522,4712,702
/api/arda/kanban/kanban-card/details/requested148071,7262,329908
/api/arda/tenant/query142021,6533,1451,718
/api/arda/kanban/kanban-card/details/in-process148011,5192,136920
/api/arda/items/query-ssrm141032042507,920
/api/arda/tenant/{tenantId}14137204266998
/api/storage/cdn-cookies14104194302300
/api/arda/user-account/query141481742091,739
/api/arda/tenant/agent-for/query141401561621,545
/api/pylon/email-hash1496104116391

Per-row fan-out, broken down by visit (avg / max ms across the 6 calls):

EndpointVisit AVisit BVisit CVisit D
kanban-card/query-details-by-item1,564 / 3,4712,263 / 2,5201,797 / 1,8452,504 / 2,572
kanban-card/query-by-item2,298 / 3,4341,856 / 2,2781,489 / 1,5842,125 / 2,368

Notes from this aggregate:

  • The N+1 fan-out reproduces 100% deterministically: every entry to /items issues exactly 6 + 6 per-row kanban requests, regardless of prior navigation. No caching across navigations.
  • The two per-row endpoints sit at ~2 s avg, with worst-case calls in any single visit hitting 3.4–3.5 s. With 6 rows the parallel batch finishes in ~3.5 s; with 60 rows the upstream pool will saturate.
  • tenant/query shows the expected long-tail behavior at the browser level: avg 1.65 s but min 0.2 s and max 3.15 s — same shape as Sentry’s BFF transaction view (huge avg/p75 gap on tenant lookups).
  • items/query-ssrm (the actual table data) is consistently fast (avg 204 ms) — the slowness is not in fetching the rows, it’s the per-row card data.
  • Total payload across all 22 calls per visit is ~36 KB — this is not a bandwidth problem.

Filter on items (typed “card”, pressed Enter)

Section titled “Filter on items (typed “card”, pressed Enter)”
EndpointCountAvg ms
/api/arda/items/query-ssrm1438

No kanban-card fan-out re-fired — likely the filter returned a subset already in the row-data cache, or the cards’ react-query keys deduped the refetch. To verify whether the fan-out repeats on every filter that returns new rows, re-run with a search that surfaces rows not yet seen in the session.

Sentry corroboration (last 24 h, arda-frontend)

Section titled “Sentry corroboration (last 24 h, arda-frontend)”
TransactionCountAvgp75p95
/items255.2 s8.5 s19.6 s
/signin324.0 s4.5 s10.6 s
/signup103.9 s5.8 s5.8 s
/order-queue102.1 s2.4 s2.4 s
/print-viewer1501.7 s2.4 s2.6 s
/reset-password51.8 s1.8 s1.8 s

/items is the slowest pageload in the product and the gap to /order-queue (2.4 s p95) is roughly 8×.

Slowest backend proxy transactions (Next.js route handlers)

Section titled “Slowest backend proxy transactions (Next.js route handlers)”
TransactionCountAvgp75p95
GET /api/arda/kanban/kanban-card/query-by-item2,92138 s16.6 s158.6 s
GET /api/arda/kanban/kanban-card/[eId]401.4 s2.5 s5.8 s
GET /api/arda/items/lookup-locations501.2 s0.3 s8.1 s
GET /api/arda/tenant/[tenantId]1714.1 s0.15 s0.40 s

Two distinct signals:

  1. query-by-item is genuinely slow end-to-end (38 s avg, 158 s p95). Top http.client spans inside it are upstream calls to prod.alpha001.io.arda.cards/v1/kanban/kanban-card/for-item/{itemId}, each averaging 8–17 s. Frontend is faithfully reporting upstream pain.
  2. /api/arda/tenant/[tenantId] has a huge avg/p75 gap (4.1 s avg, 0.15 s p75) — long-tail pathological cases hiding behind a healthy median. Two of those calls on this session each took ~3 s.

BFF outbound calls (http.client to prod.alpha001.io.arda.cards)

Section titled “BFF outbound calls (http.client to prod.alpha001.io.arda.cards)”

The same span data, but aggregated by the parent BFF route, isolates how much time is spent in the upstream operations service vs. inside the BFF itself.

BFF route (parent transaction)CountAvgp75p95
POST /api/arda/kanban/kanban-card/query-details-by-item8755.5 s7.9 s13.7 s
GET /api/arda/kanban/kanban-card/query-by-item2,8454.0 s4.8 s13.5 s
POST /api/arda/kanban/kanban-card/print-card303.2 s3.7 s4.9 s
POST /api/arda/items/query-ssrm2500.52 s0.79 s1.84 s
POST /api/arda/item/item/print-breadcrumb51.22 s1.22 s1.22 s
POST /api/arda/kanban/kanban-card/details/requesting1,0400.55 s0.55 s0.97 s
POST /api/arda/kanban/kanban-card/details/in-process1,1200.47 s0.46 s0.89 s
POST /api/arda/kanban/kanban-card/details/requested1,0850.44 s0.42 s0.79 s
POST /api/arda/items250.34 s0.37 s0.41 s
POST /api/arda/kanban/kanban-card/details/fulfilled50.21 s0.21 s0.21 s
POST /api/image-upload200.17 s0.19 s0.19 s
GET /api/arda/items/lookup-units500.13 s0.16 s0.24 s
GET /api/arda/items/lookup-suppliers450.13 s0.14 s0.19 s
GET /api/arda/items/lookup-locations500.09 s0.11 s0.16 s
POST /api/arda/kanban/kanban-card/[eId]/event/request350.09 s0.09 s0.22 s
GET /api/arda/tenant/[tenantId]1750.06 s0.07 s0.14 s
POST /api/arda/user-account/query1600.07 s0.08 s0.12 s
POST /api/arda/tenant/agent-for/query1950.06 s0.08 s0.12 s
POST /api/arda/tenant/query1800.06 s0.06 s0.10 s
GET /api/arda/kanban/kanban-card/[eId]400.08 s0.07 s0.31 s
PUT /api/arda/user-account/[eId]100.09 s0.11 s0.11 s
GET /api/arda/items/[entityId]100.10 s0.11 s0.11 s
POST /api/arda/kanban/kanban-card/[eId]/event/accept150.08 s0.10 s0.10 s
POST /api/arda/kanban/kanban-card/[eId]/event/start-processing100.08 s0.09 s0.09 s

Outbound calls aggregated by upstream URL pattern

Section titled “Outbound calls aggregated by upstream URL pattern”

Sentry’s auto-grouping does not normalize UUIDs in span.description, so the raw per-span.description view gives one row per itemId. The table below is the same data re-aggregated by upstream URL pattern using wildcard queries, so it directly answers “how slow is each upstream endpoint regardless of which BFF route invoked it”.

Upstream URL patternCountAvgp75p95
GET /v1/kanban/kanban-card/for-item/{itemId}2,8464.0 s4.8 s13.5 s
POST /v1/kanban/kanban-card/details8755.5 s7.9 s13.7 s
POST /v1/kanban/kanban-card/details/in-process1,1200.47 s0.46 s0.89 s
POST /v1/kanban/kanban-card/details/requested1,0850.44 s0.42 s0.79 s
POST /v1/kanban/kanban-card/details/requesting1,0400.55 s0.55 s0.97 s
POST /v1/tenant/tenant/query1800.06 s0.06 s0.10 s
GET /v1/tenant/tenant/{tenantId} (sum)~1750.05–0.14 s≤0.15 s

Two upstream endpoints stand out clearly: for-item/{itemId} and kanban-card/details (the POST variant fed by query-details-by-item). Both run at p95 ~13.5 s and account for essentially all of the genuine upstream-side slowness.

BFF transaction vs. upstream http.client — where does the time go?

Section titled “BFF transaction vs. upstream http.client — where does the time go?”

The BFF transaction totals (in “Slowest backend proxy transactions” above) are far higher than the upstream call durations, even though the counts are essentially 1:1 (i.e., the BFF is not looping or retrying):

RouteBFF transactionsUpstream callsRatio
query-by-item2,9212,8460.97
tenant/[tenantId]171~175≈1.0
lookup-locations50501.0
RouteBFF transaction p95Upstream http.client p95Unexplained gap
GET /api/arda/kanban/kanban-card/query-by-item158.6 s13.5 s~145 s inside BFF
GET /api/arda/kanban/kanban-card/[eId]5.8 s0.31 s~5.5 s inside BFF
GET /api/arda/items/lookup-locations8.1 s0.16 s~7.9 s inside BFF
GET /api/arda/tenant/[tenantId]0.40 s (p95); 4.1 s avg0.14 sMost of the avg is inside BFF

The gap is therefore not explained by repeated upstream queries. Given that the app is served by AWS Amplify Hosting with a Serverless Next.js back end, the strongest candidates are:

  1. Lambda cold starts. The BFF transaction timer in Sentry’s Node SDK starts at the beginning of the Lambda invocation, which on a cold container includes Lambda init + Next.js server-bundle load + middleware. With a 6 MB-decoded JS bundle, cold starts in the 5–10 s range are plausible. This fits the 5–8 s gap on otherwise-cheap routes (kanban-card/[eId], lookup-locations, tenant/[tenantId]) that have no slow upstream call.
  2. Per-container request queueing / concurrency cap. Amplify SSR Compute caps concurrent requests per Lambda container; new requests wait for an idle slot, and that wait is inside the BFF transaction span. Combined with slow upstream calls, this would stack waits and could explain the 145 s outlier extent on query-by-item.
  3. Aborted / errored transactions that never reached upstream. The 2,921→2,846 inbound/outbound delta on query-by-item is ~76 transactions (~2.6%). Hard timeouts or auth failures before fetch() would all sit in the BFF transaction tail without contributing to the upstream p95.

So slowness is two-source, not “the BFF loops”:

  1. Operations is genuinely slow for for-item/{itemId} and kanban-card/details — upstream p95 ~13.5 s per call.
  2. Amplify SSR cold-start and queueing add 5–8 s baseline on cold routes and stretch the tail on already-slow routes.

In order of expected impact, lowest-effort first:

  1. Eliminate the per-row fan-out on /items. The query-ssrm endpoint should return enough kanban data (counts, status, the few fields shown in the row) for the table’s “card data” columns to render without a second round-trip. If a richer per-row payload is too heavy, add a single batched kanban-card/query-details-for-items?ids=... endpoint so the table can issue exactly one extra request, not N.
  2. Make the tenant/session lookups parallel with — not gating — the table load. /tenant/agent-for/query + /tenant/{tenantId} together account for ~3 s of head-of-line latency on every entry to a logged-in page.
  3. Investigate the upstream kanban-card/for-item/{itemId} operation in the operations component. Sentry’s frontend spans show the call averaging 8–17 s. Since operations is not yet instrumented in Sentry, the next step is CloudWatch / kubectl logs for this endpoint to look for slow queries, sequential DB reads, or N+1 inside the service.
  4. Look at /api/arda/items/lookup-locations — p95 8 s with 50 calls suggests an occasional pathological tenant or cold cache.
  5. Confirm filter/sort behavior in the items table. Sort header click produced no network activity; need to check whether AG Grid is using client-side sort (acceptable for SSRM-resident pages) or whether the sort request is dropped silently.
  • Are the per-row kanban calls a deliberate progressive-render pattern, or an accidental fan-out introduced by a useQuery hook inside the row renderer?
  • Is the upstream kanban-card/for-item/{itemId} route paged, or does it return every card for an item every time? Tenants with high card counts per item would compound the slowness multiplicatively.
  • Why is the proxy route’s p95 (158 s) so much higher than the underlying upstream span’s p95 (17 s)? Candidates: serial calls in the route handler, request queueing in the Next.js server, or upstream timeouts being retried.
  • Operations-side dig (CloudWatch + kubectl) on the upstream kanban-card/for-item/{itemId} route, since arda-frontend Sentry already proves it is the dominant contributor and operations isn’t instrumented.
  • Code dig in arda-frontend-app for the items page row renderer to confirm the source of the per-row fan-out and prototype a batched endpoint.
  • Second observation pass with a larger tenant to confirm scaling behavior (the 6-row tenant masks the worst case).

Amplify Lambda CPU / memory posture (24 h, 23,292 invocations)

Section titled “Amplify Lambda CPU / memory posture (24 h, 23,292 invocations)”

Sampled from REPORT lines in /aws/amplify/duhexavnwh88g via CloudWatch Logs Insights:

MetricValue
Configured MemorySize1,024 MB
Allocated vCPU (proportional to memory)~0.58 vCPU
Max Memory Used — max seen253 MB (≈ 25 % of budget)
Max Memory Used — p99214 MB
Max Memory Used — avg193 MB
Cold-start rate1,891 / 23,292 = 8.1 %
Cold-start Init Duration — avg2,355 ms
Cold-start Init Duration — max3,366 ms
Duration — avg1,826 ms
Duration — p956,612 ms
Duration — p9910,419 ms
Duration — max19,033 ms

Even with the new caching, the Lambda peaks at 253 MB / 1,024 MB — ~25 % utilisation. The unstable_cache setup in cachedItems.ts (HARD_MAX_ITEMS = 15_000, stripped + mapped copy per tenant) fits comfortably. The cache could roughly double in size without pushing memory limits.

At 1,024 MB, Lambda allocates ~0.58 vCPU. Three independent signals:

  1. Cold-start Init Duration averages 2.35 s — Next.js module load is CPU-bound. More vCPU shortens this proportionally.
  2. 8.1 % cold-start rate — frequent enough that the cold-start tax alone is a multi-percent contributor to the BFF latency distribution.
  3. query-ssrm/route.ts runs applyFilters + applySorting over up to 15,000 in-process items on every scroll request, and on the first block also builds filter-option sets across all rows. This is pure CPU on the request hot path.

Lambda pricing is per GB-ms, so the cost increase is small if duration drops proportionally (the expected outcome for CPU-bound work).

Target memorySizeApprox vCPUExpected effect
2,048 MB~1.16 vCPUHalves cold-start init; meaningful speed-up on query-ssrm filter/sort
3,072 MB~1.75 vCPU~3× cold-start improvement; near-saturation of CPU on single-core JS
1,769 MB1.00 vCPUSmallest size that gives a full vCPU; cheapest defensible upgrade

Recommend starting at 1,769 MB as the smallest size that crosses the 1-vCPU line, and re-measuring the same metrics over 24 h. If cold-start duration drops by ≥40 % and the BFF p95 contributions drop proportionally, hold there; otherwise step to 3,072 MB.

  • Max Memory Used is per-invocation, not a steady-state cache size. Persistent caches are counted because they’re resident — 253 MB max already includes the cache.
  • The cache is per-Lambda-container. Each cold container starts empty and pays full upstream cost on first request. Higher cold-start rate → lower cache hit ratio. Bumping vCPU shortens cold-start time and helps cache amortisation in turn.

Copyright: (c) Arda Systems 2025-2026, All rights reserved