Amazon Import — Specification Analysis
This project is restricted to the Next.js BFF route surface in
arda-frontend-app. The chosen design splits keyword search from URL/ASIN
lookup into two sibling routes, with shared URL/ASIN input normalisation:
POST /api/amazon/import— preserved response contract, but its set of accepted URL/ASIN input shapes is broadened by the new shared normalisation layer (see URL/ASIN input normalisation).POST /api/amazon/search— new. Flexible search input shape; the BFF client layer (creatorsClient.searchItems) disambiguates and maps onto the Amazon Creators API.
Existing route — POST /api/amazon/import (broadened acceptance)
Section titled “Existing route — POST /api/amazon/import (broadened acceptance)”| Field | Value |
|---|---|
| Method, path | POST /api/amazon/import |
| Request body | { "input": "<URL or ASIN or text containing exactly one ASIN>" } |
| Success | 200 with the { ok: true, data: AmazonImportDto } envelope (single item). The /api/amazon/search route below uses the same wire envelope; only the inner data shape differs (AmazonImportDto[] vs single AmazonImportDto). |
| Errors | UNRECOGNIZED_AMAZON_URL, UNSUPPORTED_SHORT_LINK, UNSUPPORTED_AMAZON_LOCALE, ITEM_NOT_FOUND, AMAZON_API_ERROR |
The response contract is preserved verbatim — same DTO, same error
codes, same HTTP statuses. What changes is the set of inputs accepted on
the UNRECOGNIZED_AMAZON_URL boundary: the route now delegates extraction
to extractAsinLenient (see below), which accepts schemeless URLs,
path-only inputs, lowercase bare ASINs, additional US sub-hosts, old-form
product paths, and plain text containing exactly one ASIN. Previously
rejected inputs that map cleanly to a single ASIN now succeed; inputs
that genuinely don’t carry an ASIN still return UNRECOGNIZED_AMAZON_URL.
URL/ASIN input normalisation
Section titled “URL/ASIN input normalisation”User-reported feedback: the current extractAsin in
src/lib/shared/amazon/asin.ts is too restrictive — it rejects pasted
inputs that drop the URL scheme, that are path-only, that use lowercase
ASINs, that come from mobile / Smile / Kindle sub-hosts, that use the
old /exec/obidos/ASIN/ form, or that mix a single ASIN with
surrounding prose. This project widens that acceptance set with a
shared normalisation layer used by both routes.
The layer is split into two functions to keep /search’s
ASIN-shortcut dispatcher from over-interpreting a keyword query that
happens to contain an ASIN.
extractAsin(input) — strict
Section titled “extractAsin(input) — strict”The canonical extractor used by /search’s multi-token ASIN-shortcut
dispatcher. Accepts only inputs that are unambiguously an ASIN or an
Amazon product URL — no plain-text fallback.
Acceptance set:
| Shape | Example | Notes |
|---|---|---|
Bare ASIN (10 chars [A-Za-z0-9]) | B08N5WRWNW, b08n5wrwnw | New: case-folded to uppercase before pattern test. Outer whitespace trimmed. |
| Canonical product URL (full scheme, US host) | https://www.amazon.com/dp/B08N5WRWNW, etc. | All four existing canonical forms: /dp/, /<slug>/dp/, /gp/product/, /gp/aw/d/. |
| New: Old-form product URL | https://www.amazon.com/exec/obidos/ASIN/B08N5WRWNW | Also matches /o/ASIN/<ASIN>. |
| New: Extended US sub-hosts | https://m.amazon.com/dp/B08N5WRWNW, smile.amazon.com, read.amazon.com | Added to the US allow-list alongside www.amazon.com and amazon.com. |
| New: Schemeless URL | www.amazon.com/dp/B08N5WRWNW, amazon.com/dp/B08N5WRWNW | When new URL(input) throws and the input starts with (www\.)?amazon\.com/ or a known US sub-host, retry with https:// prepended. |
New: Path-only URL (with leading /) | /dp/B08N5WRWNW, /gp/product/B08N5WRWNW, /<slug>/dp/<ASIN>, /exec/obidos/ASIN/<ASIN>, /o/ASIN/<ASIN> | Treated as https://www.amazon.com<path>. |
New: Path-only URL (without leading /) | dp/B08N5WRWNW, Some-Product/dp/B08N5WRWNW, gp/product/B08N5WRWNW, gp/aw/d/B08N5WRWNW, exec/obidos/ASIN/B08N5WRWNW, o/ASIN/B08N5WRWNW | Same as above; prepend https://www.amazon.com/. |
Rejection set:
| Shape | Code |
|---|---|
a.co/..., amzn.to/... | UNSUPPORTED_SHORT_LINK |
amazon.<non-US-tld>/dp/<ASIN> (incl. co.uk, de, ca, co.jp, …) | UNSUPPORTED_AMAZON_LOCALE |
Look-alike host (myamazon.com, amazonn.com, …) | UNRECOGNIZED_AMAZON_URL |
URL parses, host is US Amazon, but path is non-product (/s?k=…, /cart, /) | UNRECOGNIZED_AMAZON_URL |
| Plain text without a parseable URL or bare ASIN | UNRECOGNIZED_AMAZON_URL |
extractAsinLenient(input) — permissive
Section titled “extractAsinLenient(input) — permissive”Used by /api/amazon/import. Runs the same step ladder as strict
extractAsin (bare ASIN → new URL() → scheme prepend → path-only)
and adds one final fallback: plain-text ASIN extraction, fired
only when no earlier step produced a parseable Amazon URL.
In other words: the side effect of “URL parsed but path is non-product
→ stop at step 3” applies in lenient too. A user pasting
https://www.amazon.com/s?k=B08N5WRWNW into /api/amazon/import
still receives UNRECOGNIZED_AMAZON_URL, even though the input
contains an ASIN-shaped token — the URL parsed and we honour its
non-product semantics.
The plain-text fallback fires when every URL-parse attempt
(original input, scheme-prepend retry, path-only normalisation) threw
or failed to land on a US Amazon host. UNSUPPORTED_SHORT_LINK and
UNSUPPORTED_AMAZON_LOCALE from any of those parse attempts
short-circuit and return as-is — we never silently reroute a
recognised-but-rejected URL to plain-text extraction.
Implementation note: extractAsinLenient cannot be a thin wrapper
that inspects extractAsin’s return value, because strict returns
the same UNRECOGNIZED_AMAZON_URL code for both “URL parsed but
non-product” and “no URL parsed at all”. The two functions share
internal step helpers but compose differently: lenient adds the
plain-text step on the no-URL-parsed-at-all branch.
Plain-text extraction pattern (case-insensitive heuristic with digit-presence guard):
const ASIN_IN_TEXT_RE = /\b(B[A-Za-z0-9]{9}|\d{9}[\dXx])\b/g;// Accept only when the matched token contains at least one digit// (`B[A-Za-z0-9]{9}` branch) OR is an ISBN-10 (`\d{9}[\dXx]` branch).Rule:
- Find all tokens in the trimmed input matching
ASIN_IN_TEXT_RE. - Filter the matches:
- For the
B[A-Za-z0-9]{9}branch, require ≥1 digit in the token. This rejects 10-letter English words likeBackbreaker,Background,Blueprintsthat start withBbut contain no digits. - The
\d{9}[\dXx]branch (ISBN-10) is already digit-heavy and needs no guard.
- For the
- Upper-case the surviving matches.
- Accept exactly one match. Zero matches →
UNRECOGNIZED_AMAZON_URL. Two or more distinct matches →UNRECOGNIZED_AMAZON_URL(the caller hit/import, which is single-item by contract; ambiguity should surface, not be silently resolved).
Resolution ordering
Section titled “Resolution ordering”extractAsinLenient(input) follows this order. The first step that
returns a non-UNRECOGNIZED_AMAZON_URL result wins.
- Trim outer whitespace.
- Bare-ASIN check (case-folded) — uppercases the input then matches
^[A-Z0-9]{10}$. new URL(input)— if parseable with anhttp:orhttps:protocol:- Short-link host →
UNSUPPORTED_SHORT_LINK. - Non-US Amazon host →
UNSUPPORTED_AMAZON_LOCALE. - US Amazon host (incl. new sub-hosts) → canonical or old-form path
match → accept. Non-matching path →
UNRECOGNIZED_AMAZON_URL; do NOT fall through to plain-text extraction (a search/cart URL is not an import). - Non-Amazon host → fall through to step 4.
- URLs with a non-
http/httpsprotocol (e.g.product:,mailto:) are not authoritative: they fall through to step 4 even thoughnew URL()succeeded.
- Short-link host →
- Scheme prepend — if input starts with
(www\.)?amazon\.com/,m\.amazon\.com/,smile\.amazon\.com/, orread\.amazon\.com/, retry step 3 withhttps://prepended. - Path-only — if input matches one of the canonical or old-form
path patterns with or without a leading
/, treat ashttps://www.amazon.com/<normalised-path>and retry step 3. - Plain-text ASIN extraction (lenient only —
extractAsinstrict stops here and returnsUNRECOGNIZED_AMAZON_URL). - Otherwise →
UNRECOGNIZED_AMAZON_URL.
Stopping at step 3 when the URL parses but the path is non-product is intentional. It honours the user’s signal: if they pasted a parseable URL, we trust its semantics over hunting for ASIN-shaped tokens in its query string.
WHATWG parser quirk — literal-space inputs
Section titled “WHATWG parser quirk — literal-space inputs”Node’s WHATWG URL parser (and the browser’s) does parse inputs
that contain literal spaces — it percent-encodes them. That means
new URL('https://www.amazon.com/dp/B08N5WRWNW great price!') does
not throw; it returns a URL whose pathname is
/dp/B08N5WRWNW%20great%20price!, which then fails canonical-path
matching at step 3 and would, by the strict “stop at step 3” rule,
return UNRECOGNIZED_AMAZON_URL.
This would defeat the “URL + trailing prose → plain-text ASIN
extraction” side effect we explicitly accepted. The implementation
patches this with a hasLiteralSpace check at step 3: when the
parsed URL classifies as US-Amazon-non-product AND the original
input contains a literal space character, the step downgrades the
outcome from recognised-rejection (which would stop the lenient
pipeline) to no-parseable-url (which lets lenient fall through to
plain-text ASIN extraction).
In effect: a clean URL on a non-product path stops as designed, but a
URL with prose appended falls through. Strict extractAsin returns
UNRECOGNIZED_AMAZON_URL either way — only lenient’s fall-through
behaviour cares about the distinction.
Acknowledged side effects
Section titled “Acknowledged side effects”- URL + trailing prose: an input like
https://www.amazon.com/dp/B08N5WRWNW great price!parses cleanly withnew URL()(WHATWG percent-encodes the literal space — see WHATWG parser quirk — literal-space inputs above), so step 3’shasLiteralSpaceguard downgrades the parse outcome tono-parseable-urland the pipeline falls through to plain-text ASIN extraction. The ASIN is found and imported. Effectively the same outcome as if we’d extracted the URL from surrounding prose — accepted for/import’s user-intent model. /importaccepting prose: a comment likeI want to import B08N5WRWNW pleaseresolves to importingB08N5WRWNW. The user’s intent is unambiguous when only one ASIN appears.- Multiple ASINs in prose at
/import: an input likeB08N5WRWNW and also B0EXAMPLE2returnsUNRECOGNIZED_AMAZON_URL./importis a single-item route by contract; ambiguous input should fail loudly, not pick one ASIN. Multi-ASIN paste belongs to/search’s strict-tokenised dispatcher.
Where each function is called
Section titled “Where each function is called”| Caller | Function | Rationale |
|---|---|---|
/api/amazon/import route handler | extractAsinLenient | Single-input route; user intent is to import a specific product, even when their paste includes prose. |
/api/amazon/search — ASIN-shortcut dispatcher (single bare ASIN or single URL whole-input case) | extractAsin (strict) | Same input shape as /import’s strict path; consistent treatment. |
/api/amazon/search — multi-token tokeniser | extractAsin (strict) | Plain-text extraction here would silently reroute keyword queries to import. The user typed search terms; we honour that. |
Test matrix additions
Section titled “Test matrix additions”Atop the existing 30-ish tests in asin.test.ts, the normalisation
work adds roughly:
- 6 bare-ASIN case-folding cases (lowercase, mixed-case, leading/trailing whitespace + lowercase).
- 12 schemeless / path-only URL acceptance cases (each of the six path forms × with/without leading slash).
- 4 schemeless URL acceptance cases (
www.amazon.com/dp/...,amazon.com/gp/product/...,m.amazon.com/dp/..., etc.). - 4 extended US sub-host acceptance cases (
m.,smile.,read.× at least one path form each). - 2 old-form path acceptance cases (
/exec/obidos/ASIN/<ASIN>,/o/ASIN/<ASIN>). - 8 plain-text ASIN extraction cases for
extractAsinLenient:- Single ASIN in prose (accepts).
- Single lowercase ASIN in prose (accepts via case-fold).
- Single ASIN in prose with surrounding punctuation (accepts).
- Two or more distinct ASINs in prose (rejects with
UNRECOGNIZED_AMAZON_URL). - Zero ASINs in prose (rejects).
- 10-character English word (
Backbreaker,Background) — must reject (digit-presence guard). - 10-digit numeric token that isn’t an ISBN-10 — accepts via the all-digit ASIN branch.
- Search URL with ASIN in query string (
/s?k=B08N5WRWNW) — rejects (stop-at-step-3 ordering).
- 4 flipped expectations:
m.amazon.com,smile.amazon.com,read.amazon.compreviouslyUNSUPPORTED_AMAZON_LOCALE, now accepted.
New route — POST /api/amazon/search
Section titled “New route — POST /api/amazon/search”The request shape is deliberately flexible. The BFF route validates input
and delegates to the client layer, which maps the rich BFF shape onto the
Amazon SDK’s narrower SearchItemsRequestContent (see
Amazon Creators API capabilities
below). The route does not support pagination — every call returns
the first page with up to MAX_RESULTS items.
| Field | Value |
|---|---|
| Method, path | POST /api/amazon/search |
| Request body | flexible search input (see below) |
| Success | 200 with { "ok": true, "data": { "items": AmazonImportDto[], "totalResultsHint"?: number } } — items is [] when the search succeeds with zero matches. The { ok, data } wire wrap matches /api/amazon/import’s envelope (the Next.js handler renames the route module’s internal field name to data). |
| Errors | INVALID_REQUEST (400), INVALID_SEARCH_INPUT (400), AUTHENTICATION_REQUIRED (401), UNSUPPORTED_SHORT_LINK (422), UNSUPPORTED_AMAZON_LOCALE (422), AMAZON_API_ERROR (502). Error envelope: { "ok": false, "code": string, "message": string }. INVALID_REQUEST is emitted by the Next.js handler on malformed JSON or wrong-shape body before the route module is invoked (mirrors /api/amazon/import’s structural guard); INVALID_SEARCH_INPUT is emitted by the route module on semantic validation failure (see “Request validation” below). The two 422 codes are emitted on single-input recognised-rejection (when the strict ASIN extractor parses the query as a short link or non-US Amazon URL); they preserve the import route’s user-intent rejection rather than dereferencing the input. |
| Auth | Cognito JWT verification at the Next.js handler level (mirrors /api/amazon/import’s processJWTForArda call in src/app/api/amazon/import/route.ts). The route module itself remains auth-agnostic. tenantId from the JWT result is not currently threaded through the route module — same posture as /import. |
Request body
Section titled “Request body”{ // Free-text query — primary search term. Up to MAX_QUERY_LENGTH chars. // Optional when `keywords[]` is non-empty after filtering; otherwise // required. Trimmed; non-empty after trim. "query": "string",
// Optional additional keyword terms that further restrict the search. // Each entry is a single keyword/phrase; combined with `query` by the // client layer before calling Amazon. "keywords": ["string", "..."],
// Optional category restrictions. Each entry is a free-form category // label (e.g. "HomeGarden", "OfficeProducts", or a human-friendly // synonym). The client layer disambiguates against Amazon's // `SearchIndex` enum and/or `browseNodeId`; unresolvable entries are // either dropped with a warning or rejected — see Q1 in Open // Questions. "categories": ["string", "..."],
// Optional. When `true`, restricts results to Prime-eligible items // (maps to Amazon `deliveryFlags: ["Prime"]`). "primeOnly": false,
// Optional. Result ordering. // "relevance" — Amazon's default (SortBy=Relevance) // "price-low-to-high" — SortBy=Price:LowToHigh // Default: "relevance". "sortBy": "relevance" | "price-low-to-high"}Silent Amazon-side query relaxation on zero results (described in Search Input Processing) is always on and is not a contract surface — it is a server behaviour. Richer zero-result strategies (LLM-generated suggestions, external web search) are deferred to a v2 follow-on tracked under PDEV-569 and are explicitly out of scope for PDEV-457.
Request validation
Section titled “Request validation”| Field | Constraint | On violation |
|---|---|---|
query | Optional only when keywords[] is non-empty after filtering; otherwise required. Trimmed; length ≤ MAX_QUERY_LENGTH. At least one of query or keywords[] must produce non-empty content after the defensive filter — categories[] and primeOnly alone do not satisfy Amazon’s “at least one search term” requirement. | INVALID_SEARCH_INPUT |
keywords | Optional; if present, array of strings; length ≤ MAX_KEYWORDS; each entry length ≤ MAX_KEYWORD_LENGTH. | INVALID_SEARCH_INPUT |
categories | Optional; if present, array of strings; length ≤ MAX_CATEGORIES; each entry length ≤ MAX_CATEGORY_LENGTH. | INVALID_SEARCH_INPUT |
primeOnly | Optional; boolean. | INVALID_SEARCH_INPUT |
sortBy | Optional; one of "relevance", "price-low-to-high". | INVALID_SEARCH_INPUT |
All numeric caps are server-side constants — see Constants at the top of “Search Input Processing”.
Response
Section titled “Response”The full wire response is the { ok: true, data: <below> } envelope (see
the “Success” row of the route-summary table above). The block below
shows the inner data shape:
{ // Up to MAX_RESULTS entries, same shape as `/api/amazon/import` returns // today. Order matches the requested `sortBy` (Amazon's default when // not specified). Empty array on zero matches — not an error. "items": [ /* AmazonImportDto */ ],
// Optional convenience copy of Amazon's "total results" indicator when // present. May be `0` on a zero-match response, or absent entirely. "totalResultsHint": 1234}On zero matches the route still returns HTTP 200 with data.items: []
(and totalResultsHint either 0 or omitted).
MAX_RESULTS is one of the server-side constants — see
Constants.
Error codes (envelope { code, message }, HTTP status in parens)
Section titled “Error codes (envelope { code, message }, HTTP status in parens)”The table below documents the error surface emitted by the route module
itself. AUTHENTICATION_REQUIRED (401) listed in the route-summary
table above is emitted by the Next.js handler (Cognito JWT verification)
before the route module is invoked, so it is not produced by this
module — but it is still observable on the wire and tests targeting the
HTTP endpoint must account for it.
| Code | When |
|---|---|
INVALID_REQUEST (400) | Emitted upstream by the Next.js handler on malformed JSON or wrong-shape request body (mirrors /api/amazon/import’s structural guard). Not produced by the route module; listed here for wire-contract completeness. |
INVALID_SEARCH_INPUT (400) | Any request-body semantic validation failure inside the route module (see “Request validation” above). Fires after INVALID_REQUEST’s structural check passes. |
AUTHENTICATION_REQUIRED (401) | Emitted upstream by the Next.js handler when Cognito JWT verification fails. Not produced by the route module; listed here for wire-contract completeness. |
UNSUPPORTED_SHORT_LINK (422) | Single-input recognised-rejection: the strict ASIN extractor parsed the query as an a.co / amzn.to short link. Same error code /api/amazon/import emits for the same input class; the search route preserves the import route’s user-intent rejection rather than dereferencing the redirect. |
UNSUPPORTED_AMAZON_LOCALE (422) | Single-input recognised-rejection: the strict ASIN extractor parsed the query as a non-US Amazon URL (amazon.co.uk, amazon.de, …). Same error code /api/amazon/import emits for the same input class; v1 search is US-marketplace only. |
AMAZON_API_ERROR (502) | Creators API call failed (network, throttling, 5xx). |
Zero matching items is not an error — the route returns 200 with
items: [] and (optionally) totalResultsHint: 0. Callers decide how to
present “no matches”.
Client-layer disambiguation responsibility
Section titled “Client-layer disambiguation responsibility”The creatorsClient.searchItems wrapper in
src/server/lib/amazon/creators-client.ts owns the mapping from this
flexible BFF shape to Amazon’s narrower SearchItemsRequestContent:
| BFF field | Maps to Amazon SDK |
|---|---|
query | Concatenated with keywords[] (space-joined) and sent as SearchItemsRequestContent.keywords. |
keywords[] | See above. |
categories[] | First entry only is resolved to a SearchIndex enum or a browseNodeId via a resolveCategory(label) helper (see Q2 in Open questions); remaining entries are appended to keywords as restrictive terms. Single-category restriction is a hard Amazon constraint — searchIndex and browseNodeId each accept a single value per SearchItems call. |
primeOnly: true | SearchItemsRequestContent.deliveryFlags = ["Prime"]. |
primeOnly: false or unset | deliveryFlags left unset. |
sortBy: "relevance" | SearchItemsRequestContent.sortBy = "Relevance" (or omitted — Amazon defaults to Relevance). |
sortBy: "price-low-to-high" | SearchItemsRequestContent.sortBy = "Price:LowToHigh". |
| (always) | itemCount = MAX_RESULTS, no itemPage (always first page). |
| (always) | resources = [ ... Item.* fields the existing import path already requests ... ] — same selector as today’s GetItems so each result is already fully populated as AmazonImportDto. |
| (always) | partnerTag = AMAZON_ASSOCIATE_TAG (server env), marketplace = US. |
Search Input Processing
Section titled “Search Input Processing”Before any request reaches Amazon, the BFF runs the inputs through three deterministic stages, all in the server-side client layer so the route handler itself stays a thin contract:
- A defensive filter that normalises and bounds the raw strings.
- An ASIN extract-and-dispatch stage that bypasses
SearchItemsentirely when the user has already given us an unambiguous identifier. - A silent Amazon-side query relaxation stage that, on a zero-result primary response, retries the call with progressively fewer constraints (capped, never mutating the user’s words).
Constants
Section titled “Constants”All numeric thresholds in this section are named server-side constants, declared in one place in the route module so they can be tuned without a contract change. Initial values:
| Constant | Value | Use |
|---|---|---|
MAX_QUERY_LENGTH | 1024 | Max characters in query after Unicode NFC normalisation. |
MAX_KEYWORDS | 20 | Max array length of keywords[] after empty-entry drop. |
MAX_KEYWORD_LENGTH | 64 | Max characters per surviving keywords[] entry. |
MAX_CATEGORIES | 5 | Max array length of categories[] after empty-entry drop. |
MAX_CATEGORY_LENGTH | 64 | Max characters per surviving categories[] entry. |
MAX_RESULTS | 10 | Hard cap on returned items (also the Amazon SearchItems.itemCount cap). |
RELAXATION_MAX_RETRIES | 2 | Max number of additional SearchItems calls after a zero-result primary. |
RELAXATION_TIME_BUDGET_MS | 1500 | Total wall-clock budget across all relaxation retries; exceed → stop early. |
BATCH_ASIN_MAX | 10 | Max ASINs in a single GetItems batch in the ASIN-shortcut path (also the Amazon cap). |
Defensive input filter
Section titled “Defensive input filter”Each string field (query, every entry of keywords[], every entry of
categories[]) goes through the same pipeline before being used. Failures
in steps marked reject surface as INVALID_SEARCH_INPUT (400).
Failures in steps marked drop silently remove the offending entry from
an array; if the array becomes empty it is treated as absent.
| # | Step | Behaviour |
|---|---|---|
| 1 | Reject if any string exceeds its max length (MAX_QUERY_LENGTH, MAX_KEYWORD_LENGTH, MAX_CATEGORY_LENGTH). | Bounds resource usage. Query-presence rule is checked in step 8 (after array filtering) so that callers can omit query when keywords[] carries the search terms. |
| 2 | Unicode NFC normalise. | Canonicalises bytes vs. codepoints; defeats trivial homoglyph variants. Runs before the post-normalisation length re-check in step 3. |
| 3 | Re-check length after normalisation; reject on overflow. | Normalisation can change byte length. |
| 4 | Replace control characters (\x00–\x1F, \x7F) with a single space. | Removes pasted newlines, tabs, null bytes — never meaningful for Amazon search. |
| 5 | Replace < and > with a single space. | Defensive against pasted HTML and log-context confusion; Amazon never needs angle brackets in keywords. |
| 6 | Collapse internal whitespace runs to a single space, then trim. | Avoids spending Amazon’s keyword budget on whitespace; keeps quote/apostrophe/accented characters intact since real product titles use them. |
| 7 | For array fields (keywords[], categories[]): drop entries that go empty after filtering, then reject if the resulting array exceeds its cardinality cap (MAX_KEYWORDS, MAX_CATEGORIES). | Empty entries are not malformed input; over-cardinality is. |
| 8 | Reject if the combination yields no search terms — i.e. query is empty/absent and keywords[] (post-step-7) is empty/absent. | Amazon’s SearchItems requires at least one of Keywords, Title, Brand, Author, Actor, Artist; we only populate Keywords from query + keywords[]. categories[] and primeOnly alone do not satisfy Amazon — they restrict, they do not search. |
Things the filter deliberately does not do:
- Does not lowercase. Amazon’s keyword search is case-insensitive but preserving case is friendlier in any future preview/echo response.
- Does not strip apostrophes, quotes, ampersands, currency symbols, or accented characters. Product titles legitimately contain them.
- Does not strip the pipe character
|blindly — it has Amazon-specific meaning when bulk-looking-up external identifiers (see “Smart identifier handling” in Amazon Creators API capabilities). - Does not interpret boolean operators (
AND,OR,NOT), quoted phrases, or minus exclusion. Amazon’s documentedKeywordssemantics do not honour these, so pretending we do would be a lie.
ASIN extract-and-dispatch
Section titled “ASIN extract-and-dispatch”After the defensive filter, the client inspects the cleaned query for an
unambiguous identifier before calling SearchItems. When one is found,
the call is redirected to GetItems because that is the cheaper,
deterministic, correct operation for “I already know which product I want”.
The level of effort is bounded to what extractAsin
(src/lib/shared/amazon/asin.ts) already does today — bare ASINs and a
small set of canonical URL paths on amazon.com. No new HTTP work
(no dereferencing short links), no new regexes beyond a thin
“tokenize and try extractAsin per token” pass for the multi-ASIN case.
query content (after filtering) | Dispatch | Notes |
|---|---|---|
A single bare ASIN, e.g. B08N5WRWNW | creatorsClient.getItems([asin]) → first/only AmazonImportDto returned in items array | Same DTO shape as SearchItems would have returned, no search noise. |
A single Amazon product URL (/dp/<asin>, /<slug>/dp/<asin>, /gp/product/<asin>, /gp/aw/d/<asin> on amazon.com) | Extract ASIN → getItems([asin]) | Reuses existing extractAsin. |
| Multiple ASINs and/or URLs separated by whitespace, commas, semicolons, or newlines (whitespace already collapsed by step 6 of the filter) | Tokenise → run extractAsin per token → if every token yields an ASIN, batch into a single getItems([...asins]) (Amazon caps at 10 ASINs/call; excess → INVALID_SEARCH_INPUT) | Enables paste-a-list-of-products imports through the same route. |
ASIN(s) embedded in surrounding free text (e.g. headphones B08N5WRWNW) | No shortcut — pass through to SearchItems as keywords | Mixed input signals “find me something like this”; honour user intent rather than guessing. |
A short-link URL (a.co/..., amzn.to/...) | Surface UNSUPPORTED_SHORT_LINK (same error code /api/amazon/import returns) | Consistent with the import route. No dereferencing inside the search route. |
Non-US-locale Amazon URL (amazon.co.uk, amazon.de, …) | Surface UNSUPPORTED_AMAZON_LOCALE (same error code as import) | Same reason. |
| Anything else | Continue to SearchItems with the BFF→SDK mapping in the table above. | Default path. |
Silent Amazon-side query relaxation
Section titled “Silent Amazon-side query relaxation”When the first SearchItems call returns zero items and the dispatcher
did not take the GetItems shortcut, the client retries up to
RELAXATION_MAX_RETRIES more times, each time dropping one constraint,
before giving up and returning items: []. This is a deterministic
server-side behaviour — not a contract surface — and applies on every
request. The aim is to absorb common “one constraint too many” cases
without surfacing them as zero results to the caller.
The retry ladder, in order, stops at the first non-empty page:
- Original call — all constraints as derived from the BFF request.
- Drop
deliveryFlags(Prime restriction) if it was set. - Drop the category restrictor (
searchIndex/browseNodeId) if one was active.
Hard bounds:
- Maximum
1 + RELAXATION_MAX_RETRIESAmazon calls per/api/amazon/searchrequest (the original plus up toRELAXATION_MAX_RETRIESrelaxations). - Total wall-clock budget
≤ RELAXATION_TIME_BUDGET_MSacross all relaxation attempts; exceed → stop early and return what the most-recent call produced. - Relaxation never mutates the
keywordstext content — only filter and category constraints. Rewriting the user’s words is out of scope here.
When the dispatcher takes the GetItems shortcut, the route’s response
shape is unchanged — it still returns { items: AmazonImportDto[], totalResultsHint? },
just sourced from GetItems instead of SearchItems. Callers do not need
to know which path was taken.
Richer zero-result strategies — out of scope for v1
Section titled “Richer zero-result strategies — out of scope for v1”Strategies that go beyond the constraint-dropping relaxation above — LLM-generated alternative-query suggestions, external web-search fallbacks (Brave, Tavily, Perplexity, SerpAPI, Firecrawl, …), synonym expansion, spelling correction — are explicitly out of scope for PDEV-457. The decision for v1 is to ship only the silent Amazon-side relaxation above; neither LLM augmentation nor external web search is implemented or exposed in the route contract.
The deferred LLM-suggestion feature is tracked under
PDEV-569
(Backlog, blocked by PDEV-457). The full survey of options —
Amazon-side / external / LLM — that informed the decision is in
query-relaxation-exploration.md.
Cross-cutting concerns
Section titled “Cross-cutting concerns”| Concern | Note |
|---|---|
| OAuth2 / credentials | Reuses the existing creatorsClient and AMAZON_CREATORS_* env vars. No new server secrets. |
| Marketplace | US only, via the existing marketplace header on the Creators client. Non-US is out of scope. |
| Affiliate tag | Each AmazonImportDto carries an affiliate-tagged URL via the existing helper. No helper changes. |
| Rate-limit posture | Each /api/amazon/search call is at most 1 + RELAXATION_MAX_RETRIES Amazon S2S calls: the original SearchItems, plus up to RELAXATION_MAX_RETRIES silent relaxation retries when the result is empty. Resolving a category label to a browseNodeId may add at most one GetBrowseNodes call (cacheable in-memory). The GetItems shortcut path is at most 1 call. |
| Pagination | None. First page only, capped at MAX_RESULTS. Refactoring to add pagination is non-breaking (extend request/response with optional fields). |
| Sort / refinements | v1 exposes relevance and price-low-to-high only. SearchRefinements from Amazon’s response is dropped at the BFF boundary; surfacing it later is non-breaking. |
| Caching | None. Same posture as today’s /api/amazon/import. |
| MSW | src/mocks/handlers/amazon.ts gets handlers for /api/amazon/search covering: success (multi-item), zero results (200 + empty items), invalid input, Creators failure. |
| Tests | Route-level tests for each branch (success, zero-results, invalid input, Creators failure); creatorsClient.searchItems wrapper tests covering header, resource selector, error mapping, and the BFF→SDK field mapping above. |
Amazon Creators API capabilities
Section titled “Amazon Creators API capabilities”What the underlying Amazon Creators API offers, summarized so the route
contract can be read against its actual ceiling. This is a snapshot of the
SDK shape vendored as amazon-creators-api@1.2.2; consult the SDK’s own
types for the canonical reference.
Operations
Section titled “Operations”| Operation | Purpose | Used in this project? |
|---|---|---|
GetItems(marketplace, { ASINs[], resources[] }) | Look up one or more items by ASIN. | Yes — by the existing /api/amazon/import route. |
SearchItems(marketplace, { keywords, ... }) | Keyword search with filters and sort; returns up to itemCount items per page plus searchRefinements. | Yes — primary operation for the new /api/amazon/search route. |
GetBrowseNodes(marketplace, { browseNodeIds[], resources[] }) | Resolve category browse-node ids to their full metadata (name, ancestors, children, sales rank). | Possibly, to back the categories[] → browseNodeId mapping (small in-memory cache). |
GetVariations(marketplace, { ASIN, resources[] }) | Walk parent ASIN to child variation ASINs (e.g. size/color). | No. |
GetFeed / ListFeeds / GetReport / ListReports | Bulk/feed-style report operations. | No. |
SearchItems request fields (most relevant subset)
Section titled “SearchItems request fields (most relevant subset)”| Field | Type | Use here |
|---|---|---|
keywords | string | Receives query plus joined keywords[] from the BFF. |
title, brand, author, actor, artist | string | Typed restrictors. Not surfaced in v1; could replace generic keyword joining later. |
searchIndex | string enum | Single-category restrictor (e.g. All, HomeGarden, OfficeProducts). One of two ways to honor categories[]. |
browseNodeId | string | Numeric browse-node id. The narrower category restrictor; one node per request. |
condition | enum: Any, New (SDK type is sparse; Amazon also exposes Used, Refurbished, Collectible) | Not used in v1. |
availability | enum: Available, IncludeOutOfStock | Not used in v1 (defaults to Available). |
minPrice, maxPrice | number | Not used in v1. |
minReviewsRating, minSavingPercent | number | Not used in v1. |
deliveryFlags | array of AmazonGlobal | FreeShipping | FulfilledByAmazon | Prime | Receives ["Prime"] when primeOnly: true. |
sortBy | enum: Featured, Relevance, AvgCustomerReviews, NewestArrivals, Price:LowToHigh, Price:HighToLow | relevance → Relevance; price-low-to-high → Price:LowToHigh. |
itemCount | number | MAX_RESULTS. |
itemPage | number | Not used in v1 (first page only). |
currencyOfPreference, languagesOfPreference | locale prefs | Not used in v1. |
partnerTag | string | AMAZON_ASSOCIATE_TAG. |
resources | array of SearchItemsResource | Selector for which Item.* fields the response includes — images.primary.large, itemInfo.title, offersV2.listings.price, etc. The client requests the same selector the existing GetItems path uses so each search hit is already a full AmazonImportDto. |
properties | Record<string,string> | Advanced/experimental; not used. |
SearchItems response (highlights)
Section titled “SearchItems response (highlights)”| Field | Notes |
|---|---|
searchResult.items[] | Each item is a full Item resource shaped by the request’s resources selector. |
searchResult.totalResultCount | ”Total results” hint surfaced to the BFF as totalResultsHint. |
searchResult.searchRefinements | Facet bins (browseNode, searchIndex, otherRefinements[]). Not surfaced in v1. |
errors[] | Structured Amazon error envelope; mapped to the BFF’s AMAZON_API_ERROR. |
Keywords syntax and “smart query” support
Section titled “Keywords syntax and “smart query” support”Amazon’s PA-API / Creators API documentation is intentionally sparse about
what the Keywords parameter understands. The verified behaviours,
based on Amazon’s own use-case docs and confirmed in vendor commentary:
| Form | Behaviour | How we use it |
|---|---|---|
| Plain space-separated tokens | AND-style narrowing (proprietary, not formally documented). | Default join for query and keywords[]. |
Pipe | separator between identifier-shaped tokens, with SearchIndex=All and ItemInfo.ExternalIds in resources | Bulk lookup by external identifier (UPC, EAN, ISBN). Amazon may return adjacent products; the caller must filter the response by exact match against ItemInfo.ExternalIds.{EANs,UPCs,ISBNs}.DisplayValues. | See “Smart identifier handling” below. |
Boolean operators (AND, OR, NOT), quoted phrases, minus exclusion | Not officially supported. Amazon’s site search has its own syntax but Keywords on the API does not promise to honour any of it. | We do not advertise, document, or interpret these. Whatever the user types is passed through; behaviour is whatever Amazon chooses to do. |
Typed restrictors (Title, Brand, Author, Actor, Artist as separate SearchItemsRequestContent fields) | Stronger and more reliable than AND-stuffing into Keywords. | Not exposed in v1. A natural v2 lever once the BFF surface stabilises. |
Smart identifier handling
Section titled “Smart identifier handling”When the cleaned query (after the filter pipeline and ASIN dispatcher)
is a list of tokens all of which match one of the patterns below, the
client switches to identifier mode: joins the tokens with |, sets
SearchIndex=All, and adds ItemInfo.ExternalIds to the resource
selector. The response is filtered to keep only items whose
ItemInfo.ExternalIds echoes one of the input identifiers.
| Pattern | Format |
|---|---|
| UPC-A | ^\d{12}$ |
EAN-13 (incl. ISBN-13 97[89]\d{10}) | ^\d{13}$ |
| EAN-8 | ^\d{8}$ |
| ISBN-10 | ^\d{9}[\dX]$ |
This buys cheap “scan-a-list-of-barcodes” import through the same route without a separate endpoint. Mixed input (some identifier-shaped, some not) falls through to plain keyword search.
Sub-bits worth pinning before implementation
Section titled “Sub-bits worth pinning before implementation”Conditional resource selector. The original spec assumed
identifier mode would request additional fine-grained
ItemInfo.ExternalIds.{EANs,UPCs,ISBNs}.DisplayValues paths that the
default mode would omit, to keep payload small for plain keyword
searches. Implementation found this is not possible at the SDK
level: the vendored amazon-creators-api@1.2.2 SearchItemsResource
/ GetItemsResource enums only expose 'itemInfo.externalIds' as a
single key that returns all external-id sub-fields together. There is
no way to ask for EANs but not UPCs, or to omit external IDs entirely
once any sub-resource is requested.
Furthermore, the existing V1_RESOURCES constant used by getItems
already contains 'itemInfo.externalIds' (because the current
AmazonImportDto carries the first UPC). So:
- The “default” mode resource selector and the “identifier” mode resource selector are functionally identical in v1.
- The
resourcesForMode(mode: SearchMode)abstraction is still kept (insrc/server/lib/amazon/creators-client.ts) so a future SDK revision that exposes finer-grained resource paths can be adopted by changing only the helper.SearchMode = "default" | "identifier". The function de-duplicates internally. - There is no payload-size win from mode-discrimination in v1. The spec’s earlier reasoning about “avoid response bloat” does not apply.
Response filtering. Amazon’s docs explicitly say bulk-identifier search “may return adjacent products”; callers are expected to filter the response against the input identifiers. The filter is a pure function:
filterByExternalIds( items: SearchItem[], inputIds: ReadonlyArray<string>,): SearchItem[]Match an input id against an item by checking, in order:
ItemInfo.ExternalIds.EANs.DisplayValues,
ItemInfo.ExternalIds.UPCs.DisplayValues,
ItemInfo.ExternalIds.ISBNs.DisplayValues. If any input id appears in
any of those DisplayValues arrays for that item, the item is kept;
otherwise it is dropped. Comparison is exact-string (Amazon returns
identifiers in canonical form; no normalisation needed beyond the
defensive filter that ran upstream).
Deduplication. A single product can carry several external identifiers (e.g. a book with both ISBN-10 and ISBN-13; a product with multiple UPC barcodes across regional variants). Filtering by exact match against multiple input ids can yield the same item more than once if both ids happen to point to the same ASIN. Dedup the filtered list by ASIN, preserving the first occurrence.
Output ordering. Items are returned in Amazon’s response order with non-matching items removed and ASIN-duplicates collapsed. The route does not attempt to re-order results to match the input id ordering; doing so adds complexity without a real consumer in v1. Document the order in the route contract so callers do not accidentally rely on input-order alignment.
Edge cases for tests.
- Pure-identifier list of 3 UPCs, Amazon returns exactly 3 items — pass-through.
- Pure-identifier list, Amazon returns 5 items where 2 are “adjacent products” — filter drops the 2 noise items.
- Pure-identifier list, none of Amazon’s response items echo any input
id —
items: [](still a200, not an error). - Mixed token list (one UPC + one keyword) — falls through to plain keyword search, no identifier mode triggered.
- One ASIN + one UPC — neither shortcut fires; falls through to plain keyword search.
- Single ISBN-10 with trailing
X— pattern matches, identifier mode triggers with a single-element pipe-list.
Operational characteristics
Section titled “Operational characteristics”| Aspect | Note |
|---|---|
| Auth | OAuth2 client-credentials. The creatorsClient already obtains and rotates tokens transparently. |
| Marketplace | Selected via the X-Marketplace header on every call. US only for this project. |
| Rate limits | TPS and TPD quotas shared across GetItems, SearchItems, GetBrowseNodes, GetVariations. Throttled responses surface as ThrottleExceptionResponseContent; the BFF maps to AMAZON_API_ERROR. |
| Failure modes | AccessDeniedException, UnauthorizedException, ValidationException, InternalServerException, ResourceNotFoundException, ThrottleException (SDK type names). All collapse to AMAZON_API_ERROR at the BFF for v1; finer-grained mapping is a future extension. |
| PA-API sunset | The legacy PA-API 5.0 product surface is deprecated on 2026-05-15; the Creators API (used here via amazon-creators-api@1.2.2) is the supported successor. Confirmed in Amazon’s own docs and external write-ups. The Keywords semantics described above carry over unchanged. |
Open questions
Section titled “Open questions”| ID | Question | Status |
|---|---|---|
| Q1 | Multi-category requests — Amazon allows only one searchIndex or one browseNodeId per SearchItems call. When categories[] has more than one entry, how should the client behave? | Resolved — option (a). The client uses the first entry as the Amazon category restrictor (searchIndex or browseNodeId) and appends the remaining entries to the keywords string as additional restrictive terms. |
| Q2 | categories[] resolution source — static enum-matching table only, or also dynamic GetBrowseNodes lookup with an in-memory cache? | Resolved — static table for v1, hidden behind a resolveCategory(label: string): { searchIndex?: string; browseNodeId?: string } | null function so the implementation can be swapped for a dynamic GetBrowseNodes-backed resolver later without a contract or call-site change. |
| Q3 | Should empty categories[] / keywords[] arrays be treated as “field absent”, or rejected as malformed? | Resolved — entries that go empty after the defensive filter are dropped silently; if the array becomes empty it is treated as absent. Caller-side empty arrays are equivalent to omitting the field. See “Search Input Processing — Defensive input filter”. |
| Q4 | Concrete values for the server-side constants. | Resolved — values fixed (see Constants at the top of “Search Input Processing”). The constants remain server-side names so they can be tuned without a contract change. |
| Q5 | Zero-result augmentation policy beyond silent Amazon-side relaxation. | Resolved — out of scope for v1. Neither LLM-driven suggestions nor external web-search fallbacks ship in PDEV-457. The deferred LLM-suggestion feature is tracked under PDEV-569. External web-search providers are not currently planned. |
| Q6 | LLM provider for suggestion generation. | Resolved — out of scope for v1. Decision tracked in PDEV-569 if and when that issue is picked up. |
| Q7 | Should the server silently re-run SearchItems against generated suggestions before responding? | Resolved — out of scope for v1. No suggestion generation in v1, so no re-search path exists. The constraint (no server-side re-search) is preserved in PDEV-569 for the deferred feature. |
Copyright: (c) Arda Systems 2025-2026, All rights reserved
Copyright: © Arda Systems 2025-2026, All rights reserved