Skip to content

Amazon Import — Specification Analysis

This project is restricted to the Next.js BFF route surface in arda-frontend-app. The chosen design splits keyword search from URL/ASIN lookup into two sibling routes, with shared URL/ASIN input normalisation:

  • POST /api/amazon/import — preserved response contract, but its set of accepted URL/ASIN input shapes is broadened by the new shared normalisation layer (see URL/ASIN input normalisation).
  • POST /api/amazon/search — new. Flexible search input shape; the BFF client layer (creatorsClient.searchItems) disambiguates and maps onto the Amazon Creators API.

Existing route — POST /api/amazon/import (broadened acceptance)

Section titled “Existing route — POST /api/amazon/import (broadened acceptance)”
FieldValue
Method, pathPOST /api/amazon/import
Request body{ "input": "<URL or ASIN or text containing exactly one ASIN>" }
Success200 with the { ok: true, data: AmazonImportDto } envelope (single item). The /api/amazon/search route below uses the same wire envelope; only the inner data shape differs (AmazonImportDto[] vs single AmazonImportDto).
ErrorsUNRECOGNIZED_AMAZON_URL, UNSUPPORTED_SHORT_LINK, UNSUPPORTED_AMAZON_LOCALE, ITEM_NOT_FOUND, AMAZON_API_ERROR

The response contract is preserved verbatim — same DTO, same error codes, same HTTP statuses. What changes is the set of inputs accepted on the UNRECOGNIZED_AMAZON_URL boundary: the route now delegates extraction to extractAsinLenient (see below), which accepts schemeless URLs, path-only inputs, lowercase bare ASINs, additional US sub-hosts, old-form product paths, and plain text containing exactly one ASIN. Previously rejected inputs that map cleanly to a single ASIN now succeed; inputs that genuinely don’t carry an ASIN still return UNRECOGNIZED_AMAZON_URL.

User-reported feedback: the current extractAsin in src/lib/shared/amazon/asin.ts is too restrictive — it rejects pasted inputs that drop the URL scheme, that are path-only, that use lowercase ASINs, that come from mobile / Smile / Kindle sub-hosts, that use the old /exec/obidos/ASIN/ form, or that mix a single ASIN with surrounding prose. This project widens that acceptance set with a shared normalisation layer used by both routes.

The layer is split into two functions to keep /search’s ASIN-shortcut dispatcher from over-interpreting a keyword query that happens to contain an ASIN.

The canonical extractor used by /search’s multi-token ASIN-shortcut dispatcher. Accepts only inputs that are unambiguously an ASIN or an Amazon product URL — no plain-text fallback.

Acceptance set:

ShapeExampleNotes
Bare ASIN (10 chars [A-Za-z0-9])B08N5WRWNW, b08n5wrwnwNew: case-folded to uppercase before pattern test. Outer whitespace trimmed.
Canonical product URL (full scheme, US host)https://www.amazon.com/dp/B08N5WRWNW, etc.All four existing canonical forms: /dp/, /<slug>/dp/, /gp/product/, /gp/aw/d/.
New: Old-form product URLhttps://www.amazon.com/exec/obidos/ASIN/B08N5WRWNWAlso matches /o/ASIN/<ASIN>.
New: Extended US sub-hostshttps://m.amazon.com/dp/B08N5WRWNW, smile.amazon.com, read.amazon.comAdded to the US allow-list alongside www.amazon.com and amazon.com.
New: Schemeless URLwww.amazon.com/dp/B08N5WRWNW, amazon.com/dp/B08N5WRWNWWhen new URL(input) throws and the input starts with (www\.)?amazon\.com/ or a known US sub-host, retry with https:// prepended.
New: Path-only URL (with leading /)/dp/B08N5WRWNW, /gp/product/B08N5WRWNW, /<slug>/dp/<ASIN>, /exec/obidos/ASIN/<ASIN>, /o/ASIN/<ASIN>Treated as https://www.amazon.com<path>.
New: Path-only URL (without leading /)dp/B08N5WRWNW, Some-Product/dp/B08N5WRWNW, gp/product/B08N5WRWNW, gp/aw/d/B08N5WRWNW, exec/obidos/ASIN/B08N5WRWNW, o/ASIN/B08N5WRWNWSame as above; prepend https://www.amazon.com/.

Rejection set:

ShapeCode
a.co/..., amzn.to/...UNSUPPORTED_SHORT_LINK
amazon.<non-US-tld>/dp/<ASIN> (incl. co.uk, de, ca, co.jp, …)UNSUPPORTED_AMAZON_LOCALE
Look-alike host (myamazon.com, amazonn.com, …)UNRECOGNIZED_AMAZON_URL
URL parses, host is US Amazon, but path is non-product (/s?k=…, /cart, /)UNRECOGNIZED_AMAZON_URL
Plain text without a parseable URL or bare ASINUNRECOGNIZED_AMAZON_URL

Used by /api/amazon/import. Runs the same step ladder as strict extractAsin (bare ASIN → new URL() → scheme prepend → path-only) and adds one final fallback: plain-text ASIN extraction, fired only when no earlier step produced a parseable Amazon URL.

In other words: the side effect of “URL parsed but path is non-product → stop at step 3” applies in lenient too. A user pasting https://www.amazon.com/s?k=B08N5WRWNW into /api/amazon/import still receives UNRECOGNIZED_AMAZON_URL, even though the input contains an ASIN-shaped token — the URL parsed and we honour its non-product semantics.

The plain-text fallback fires when every URL-parse attempt (original input, scheme-prepend retry, path-only normalisation) threw or failed to land on a US Amazon host. UNSUPPORTED_SHORT_LINK and UNSUPPORTED_AMAZON_LOCALE from any of those parse attempts short-circuit and return as-is — we never silently reroute a recognised-but-rejected URL to plain-text extraction.

Implementation note: extractAsinLenient cannot be a thin wrapper that inspects extractAsin’s return value, because strict returns the same UNRECOGNIZED_AMAZON_URL code for both “URL parsed but non-product” and “no URL parsed at all”. The two functions share internal step helpers but compose differently: lenient adds the plain-text step on the no-URL-parsed-at-all branch.

Plain-text extraction pattern (case-insensitive heuristic with digit-presence guard):

const ASIN_IN_TEXT_RE =
/\b(B[A-Za-z0-9]{9}|\d{9}[\dXx])\b/g;
// Accept only when the matched token contains at least one digit
// (`B[A-Za-z0-9]{9}` branch) OR is an ISBN-10 (`\d{9}[\dXx]` branch).

Rule:

  • Find all tokens in the trimmed input matching ASIN_IN_TEXT_RE.
  • Filter the matches:
    • For the B[A-Za-z0-9]{9} branch, require ≥1 digit in the token. This rejects 10-letter English words like Backbreaker, Background, Blueprints that start with B but contain no digits.
    • The \d{9}[\dXx] branch (ISBN-10) is already digit-heavy and needs no guard.
  • Upper-case the surviving matches.
  • Accept exactly one match. Zero matches → UNRECOGNIZED_AMAZON_URL. Two or more distinct matches → UNRECOGNIZED_AMAZON_URL (the caller hit /import, which is single-item by contract; ambiguity should surface, not be silently resolved).

extractAsinLenient(input) follows this order. The first step that returns a non-UNRECOGNIZED_AMAZON_URL result wins.

  1. Trim outer whitespace.
  2. Bare-ASIN check (case-folded) — uppercases the input then matches ^[A-Z0-9]{10}$.
  3. new URL(input) — if parseable with an http: or https: protocol:
    • Short-link host → UNSUPPORTED_SHORT_LINK.
    • Non-US Amazon host → UNSUPPORTED_AMAZON_LOCALE.
    • US Amazon host (incl. new sub-hosts) → canonical or old-form path match → accept. Non-matching path → UNRECOGNIZED_AMAZON_URL; do NOT fall through to plain-text extraction (a search/cart URL is not an import).
    • Non-Amazon host → fall through to step 4.
    • URLs with a non-http/https protocol (e.g. product:, mailto:) are not authoritative: they fall through to step 4 even though new URL() succeeded.
  4. Scheme prepend — if input starts with (www\.)?amazon\.com/, m\.amazon\.com/, smile\.amazon\.com/, or read\.amazon\.com/, retry step 3 with https:// prepended.
  5. Path-only — if input matches one of the canonical or old-form path patterns with or without a leading /, treat as https://www.amazon.com/<normalised-path> and retry step 3.
  6. Plain-text ASIN extraction (lenient only — extractAsin strict stops here and returns UNRECOGNIZED_AMAZON_URL).
  7. Otherwise → UNRECOGNIZED_AMAZON_URL.

Stopping at step 3 when the URL parses but the path is non-product is intentional. It honours the user’s signal: if they pasted a parseable URL, we trust its semantics over hunting for ASIN-shaped tokens in its query string.

WHATWG parser quirk — literal-space inputs

Section titled “WHATWG parser quirk — literal-space inputs”

Node’s WHATWG URL parser (and the browser’s) does parse inputs that contain literal spaces — it percent-encodes them. That means new URL('https://www.amazon.com/dp/B08N5WRWNW great price!') does not throw; it returns a URL whose pathname is /dp/B08N5WRWNW%20great%20price!, which then fails canonical-path matching at step 3 and would, by the strict “stop at step 3” rule, return UNRECOGNIZED_AMAZON_URL.

This would defeat the “URL + trailing prose → plain-text ASIN extraction” side effect we explicitly accepted. The implementation patches this with a hasLiteralSpace check at step 3: when the parsed URL classifies as US-Amazon-non-product AND the original input contains a literal space character, the step downgrades the outcome from recognised-rejection (which would stop the lenient pipeline) to no-parseable-url (which lets lenient fall through to plain-text ASIN extraction).

In effect: a clean URL on a non-product path stops as designed, but a URL with prose appended falls through. Strict extractAsin returns UNRECOGNIZED_AMAZON_URL either way — only lenient’s fall-through behaviour cares about the distinction.

  • URL + trailing prose: an input like https://www.amazon.com/dp/B08N5WRWNW great price! parses cleanly with new URL() (WHATWG percent-encodes the literal space — see WHATWG parser quirk — literal-space inputs above), so step 3’s hasLiteralSpace guard downgrades the parse outcome to no-parseable-url and the pipeline falls through to plain-text ASIN extraction. The ASIN is found and imported. Effectively the same outcome as if we’d extracted the URL from surrounding prose — accepted for /import’s user-intent model.
  • /import accepting prose: a comment like I want to import B08N5WRWNW please resolves to importing B08N5WRWNW. The user’s intent is unambiguous when only one ASIN appears.
  • Multiple ASINs in prose at /import: an input like B08N5WRWNW and also B0EXAMPLE2 returns UNRECOGNIZED_AMAZON_URL. /import is a single-item route by contract; ambiguous input should fail loudly, not pick one ASIN. Multi-ASIN paste belongs to /search’s strict-tokenised dispatcher.
CallerFunctionRationale
/api/amazon/import route handlerextractAsinLenientSingle-input route; user intent is to import a specific product, even when their paste includes prose.
/api/amazon/search — ASIN-shortcut dispatcher (single bare ASIN or single URL whole-input case)extractAsin (strict)Same input shape as /import’s strict path; consistent treatment.
/api/amazon/search — multi-token tokeniserextractAsin (strict)Plain-text extraction here would silently reroute keyword queries to import. The user typed search terms; we honour that.

Atop the existing 30-ish tests in asin.test.ts, the normalisation work adds roughly:

  • 6 bare-ASIN case-folding cases (lowercase, mixed-case, leading/trailing whitespace + lowercase).
  • 12 schemeless / path-only URL acceptance cases (each of the six path forms × with/without leading slash).
  • 4 schemeless URL acceptance cases (www.amazon.com/dp/..., amazon.com/gp/product/..., m.amazon.com/dp/..., etc.).
  • 4 extended US sub-host acceptance cases (m., smile., read. × at least one path form each).
  • 2 old-form path acceptance cases (/exec/obidos/ASIN/<ASIN>, /o/ASIN/<ASIN>).
  • 8 plain-text ASIN extraction cases for extractAsinLenient:
    • Single ASIN in prose (accepts).
    • Single lowercase ASIN in prose (accepts via case-fold).
    • Single ASIN in prose with surrounding punctuation (accepts).
    • Two or more distinct ASINs in prose (rejects with UNRECOGNIZED_AMAZON_URL).
    • Zero ASINs in prose (rejects).
    • 10-character English word (Backbreaker, Background) — must reject (digit-presence guard).
    • 10-digit numeric token that isn’t an ISBN-10 — accepts via the all-digit ASIN branch.
    • Search URL with ASIN in query string (/s?k=B08N5WRWNW) — rejects (stop-at-step-3 ordering).
  • 4 flipped expectations: m.amazon.com, smile.amazon.com, read.amazon.com previously UNSUPPORTED_AMAZON_LOCALE, now accepted.

The request shape is deliberately flexible. The BFF route validates input and delegates to the client layer, which maps the rich BFF shape onto the Amazon SDK’s narrower SearchItemsRequestContent (see Amazon Creators API capabilities below). The route does not support pagination — every call returns the first page with up to MAX_RESULTS items.

FieldValue
Method, pathPOST /api/amazon/search
Request bodyflexible search input (see below)
Success200 with { "ok": true, "data": { "items": AmazonImportDto[], "totalResultsHint"?: number } }items is [] when the search succeeds with zero matches. The { ok, data } wire wrap matches /api/amazon/import’s envelope (the Next.js handler renames the route module’s internal field name to data).
ErrorsINVALID_REQUEST (400), INVALID_SEARCH_INPUT (400), AUTHENTICATION_REQUIRED (401), UNSUPPORTED_SHORT_LINK (422), UNSUPPORTED_AMAZON_LOCALE (422), AMAZON_API_ERROR (502). Error envelope: { "ok": false, "code": string, "message": string }. INVALID_REQUEST is emitted by the Next.js handler on malformed JSON or wrong-shape body before the route module is invoked (mirrors /api/amazon/import’s structural guard); INVALID_SEARCH_INPUT is emitted by the route module on semantic validation failure (see “Request validation” below). The two 422 codes are emitted on single-input recognised-rejection (when the strict ASIN extractor parses the query as a short link or non-US Amazon URL); they preserve the import route’s user-intent rejection rather than dereferencing the input.
AuthCognito JWT verification at the Next.js handler level (mirrors /api/amazon/import’s processJWTForArda call in src/app/api/amazon/import/route.ts). The route module itself remains auth-agnostic. tenantId from the JWT result is not currently threaded through the route module — same posture as /import.
{
// Free-text query — primary search term. Up to MAX_QUERY_LENGTH chars.
// Optional when `keywords[]` is non-empty after filtering; otherwise
// required. Trimmed; non-empty after trim.
"query": "string",
// Optional additional keyword terms that further restrict the search.
// Each entry is a single keyword/phrase; combined with `query` by the
// client layer before calling Amazon.
"keywords": ["string", "..."],
// Optional category restrictions. Each entry is a free-form category
// label (e.g. "HomeGarden", "OfficeProducts", or a human-friendly
// synonym). The client layer disambiguates against Amazon's
// `SearchIndex` enum and/or `browseNodeId`; unresolvable entries are
// either dropped with a warning or rejected — see Q1 in Open
// Questions.
"categories": ["string", "..."],
// Optional. When `true`, restricts results to Prime-eligible items
// (maps to Amazon `deliveryFlags: ["Prime"]`).
"primeOnly": false,
// Optional. Result ordering.
// "relevance" — Amazon's default (SortBy=Relevance)
// "price-low-to-high" — SortBy=Price:LowToHigh
// Default: "relevance".
"sortBy": "relevance" | "price-low-to-high"
}

Silent Amazon-side query relaxation on zero results (described in Search Input Processing) is always on and is not a contract surface — it is a server behaviour. Richer zero-result strategies (LLM-generated suggestions, external web search) are deferred to a v2 follow-on tracked under PDEV-569 and are explicitly out of scope for PDEV-457.

FieldConstraintOn violation
queryOptional only when keywords[] is non-empty after filtering; otherwise required. Trimmed; length ≤ MAX_QUERY_LENGTH. At least one of query or keywords[] must produce non-empty content after the defensive filter — categories[] and primeOnly alone do not satisfy Amazon’s “at least one search term” requirement.INVALID_SEARCH_INPUT
keywordsOptional; if present, array of strings; length ≤ MAX_KEYWORDS; each entry length ≤ MAX_KEYWORD_LENGTH.INVALID_SEARCH_INPUT
categoriesOptional; if present, array of strings; length ≤ MAX_CATEGORIES; each entry length ≤ MAX_CATEGORY_LENGTH.INVALID_SEARCH_INPUT
primeOnlyOptional; boolean.INVALID_SEARCH_INPUT
sortByOptional; one of "relevance", "price-low-to-high".INVALID_SEARCH_INPUT

All numeric caps are server-side constants — see Constants at the top of “Search Input Processing”.

The full wire response is the { ok: true, data: <below> } envelope (see the “Success” row of the route-summary table above). The block below shows the inner data shape:

{
// Up to MAX_RESULTS entries, same shape as `/api/amazon/import` returns
// today. Order matches the requested `sortBy` (Amazon's default when
// not specified). Empty array on zero matches — not an error.
"items": [ /* AmazonImportDto */ ],
// Optional convenience copy of Amazon's "total results" indicator when
// present. May be `0` on a zero-match response, or absent entirely.
"totalResultsHint": 1234
}

On zero matches the route still returns HTTP 200 with data.items: [] (and totalResultsHint either 0 or omitted).

MAX_RESULTS is one of the server-side constants — see Constants.

Error codes (envelope { code, message }, HTTP status in parens)

Section titled “Error codes (envelope { code, message }, HTTP status in parens)”

The table below documents the error surface emitted by the route module itself. AUTHENTICATION_REQUIRED (401) listed in the route-summary table above is emitted by the Next.js handler (Cognito JWT verification) before the route module is invoked, so it is not produced by this module — but it is still observable on the wire and tests targeting the HTTP endpoint must account for it.

CodeWhen
INVALID_REQUEST (400)Emitted upstream by the Next.js handler on malformed JSON or wrong-shape request body (mirrors /api/amazon/import’s structural guard). Not produced by the route module; listed here for wire-contract completeness.
INVALID_SEARCH_INPUT (400)Any request-body semantic validation failure inside the route module (see “Request validation” above). Fires after INVALID_REQUEST’s structural check passes.
AUTHENTICATION_REQUIRED (401)Emitted upstream by the Next.js handler when Cognito JWT verification fails. Not produced by the route module; listed here for wire-contract completeness.
UNSUPPORTED_SHORT_LINK (422)Single-input recognised-rejection: the strict ASIN extractor parsed the query as an a.co / amzn.to short link. Same error code /api/amazon/import emits for the same input class; the search route preserves the import route’s user-intent rejection rather than dereferencing the redirect.
UNSUPPORTED_AMAZON_LOCALE (422)Single-input recognised-rejection: the strict ASIN extractor parsed the query as a non-US Amazon URL (amazon.co.uk, amazon.de, …). Same error code /api/amazon/import emits for the same input class; v1 search is US-marketplace only.
AMAZON_API_ERROR (502)Creators API call failed (network, throttling, 5xx).

Zero matching items is not an error — the route returns 200 with items: [] and (optionally) totalResultsHint: 0. Callers decide how to present “no matches”.

Client-layer disambiguation responsibility

Section titled “Client-layer disambiguation responsibility”

The creatorsClient.searchItems wrapper in src/server/lib/amazon/creators-client.ts owns the mapping from this flexible BFF shape to Amazon’s narrower SearchItemsRequestContent:

BFF fieldMaps to Amazon SDK
queryConcatenated with keywords[] (space-joined) and sent as SearchItemsRequestContent.keywords.
keywords[]See above.
categories[]First entry only is resolved to a SearchIndex enum or a browseNodeId via a resolveCategory(label) helper (see Q2 in Open questions); remaining entries are appended to keywords as restrictive terms. Single-category restriction is a hard Amazon constraint — searchIndex and browseNodeId each accept a single value per SearchItems call.
primeOnly: trueSearchItemsRequestContent.deliveryFlags = ["Prime"].
primeOnly: false or unsetdeliveryFlags left unset.
sortBy: "relevance"SearchItemsRequestContent.sortBy = "Relevance" (or omitted — Amazon defaults to Relevance).
sortBy: "price-low-to-high"SearchItemsRequestContent.sortBy = "Price:LowToHigh".
(always)itemCount = MAX_RESULTS, no itemPage (always first page).
(always)resources = [ ... Item.* fields the existing import path already requests ... ] — same selector as today’s GetItems so each result is already fully populated as AmazonImportDto.
(always)partnerTag = AMAZON_ASSOCIATE_TAG (server env), marketplace = US.

Before any request reaches Amazon, the BFF runs the inputs through three deterministic stages, all in the server-side client layer so the route handler itself stays a thin contract:

  1. A defensive filter that normalises and bounds the raw strings.
  2. An ASIN extract-and-dispatch stage that bypasses SearchItems entirely when the user has already given us an unambiguous identifier.
  3. A silent Amazon-side query relaxation stage that, on a zero-result primary response, retries the call with progressively fewer constraints (capped, never mutating the user’s words).

All numeric thresholds in this section are named server-side constants, declared in one place in the route module so they can be tuned without a contract change. Initial values:

ConstantValueUse
MAX_QUERY_LENGTH1024Max characters in query after Unicode NFC normalisation.
MAX_KEYWORDS20Max array length of keywords[] after empty-entry drop.
MAX_KEYWORD_LENGTH64Max characters per surviving keywords[] entry.
MAX_CATEGORIES5Max array length of categories[] after empty-entry drop.
MAX_CATEGORY_LENGTH64Max characters per surviving categories[] entry.
MAX_RESULTS10Hard cap on returned items (also the Amazon SearchItems.itemCount cap).
RELAXATION_MAX_RETRIES2Max number of additional SearchItems calls after a zero-result primary.
RELAXATION_TIME_BUDGET_MS1500Total wall-clock budget across all relaxation retries; exceed → stop early.
BATCH_ASIN_MAX10Max ASINs in a single GetItems batch in the ASIN-shortcut path (also the Amazon cap).

Each string field (query, every entry of keywords[], every entry of categories[]) goes through the same pipeline before being used. Failures in steps marked reject surface as INVALID_SEARCH_INPUT (400). Failures in steps marked drop silently remove the offending entry from an array; if the array becomes empty it is treated as absent.

#StepBehaviour
1Reject if any string exceeds its max length (MAX_QUERY_LENGTH, MAX_KEYWORD_LENGTH, MAX_CATEGORY_LENGTH).Bounds resource usage. Query-presence rule is checked in step 8 (after array filtering) so that callers can omit query when keywords[] carries the search terms.
2Unicode NFC normalise.Canonicalises bytes vs. codepoints; defeats trivial homoglyph variants. Runs before the post-normalisation length re-check in step 3.
3Re-check length after normalisation; reject on overflow.Normalisation can change byte length.
4Replace control characters (\x00–\x1F, \x7F) with a single space.Removes pasted newlines, tabs, null bytes — never meaningful for Amazon search.
5Replace < and > with a single space.Defensive against pasted HTML and log-context confusion; Amazon never needs angle brackets in keywords.
6Collapse internal whitespace runs to a single space, then trim.Avoids spending Amazon’s keyword budget on whitespace; keeps quote/apostrophe/accented characters intact since real product titles use them.
7For array fields (keywords[], categories[]): drop entries that go empty after filtering, then reject if the resulting array exceeds its cardinality cap (MAX_KEYWORDS, MAX_CATEGORIES).Empty entries are not malformed input; over-cardinality is.
8Reject if the combination yields no search terms — i.e. query is empty/absent and keywords[] (post-step-7) is empty/absent.Amazon’s SearchItems requires at least one of Keywords, Title, Brand, Author, Actor, Artist; we only populate Keywords from query + keywords[]. categories[] and primeOnly alone do not satisfy Amazon — they restrict, they do not search.

Things the filter deliberately does not do:

  • Does not lowercase. Amazon’s keyword search is case-insensitive but preserving case is friendlier in any future preview/echo response.
  • Does not strip apostrophes, quotes, ampersands, currency symbols, or accented characters. Product titles legitimately contain them.
  • Does not strip the pipe character | blindly — it has Amazon-specific meaning when bulk-looking-up external identifiers (see “Smart identifier handling” in Amazon Creators API capabilities).
  • Does not interpret boolean operators (AND, OR, NOT), quoted phrases, or minus exclusion. Amazon’s documented Keywords semantics do not honour these, so pretending we do would be a lie.

After the defensive filter, the client inspects the cleaned query for an unambiguous identifier before calling SearchItems. When one is found, the call is redirected to GetItems because that is the cheaper, deterministic, correct operation for “I already know which product I want”.

The level of effort is bounded to what extractAsin (src/lib/shared/amazon/asin.ts) already does today — bare ASINs and a small set of canonical URL paths on amazon.com. No new HTTP work (no dereferencing short links), no new regexes beyond a thin “tokenize and try extractAsin per token” pass for the multi-ASIN case.

query content (after filtering)DispatchNotes
A single bare ASIN, e.g. B08N5WRWNWcreatorsClient.getItems([asin]) → first/only AmazonImportDto returned in items arraySame DTO shape as SearchItems would have returned, no search noise.
A single Amazon product URL (/dp/<asin>, /<slug>/dp/<asin>, /gp/product/<asin>, /gp/aw/d/<asin> on amazon.com)Extract ASIN → getItems([asin])Reuses existing extractAsin.
Multiple ASINs and/or URLs separated by whitespace, commas, semicolons, or newlines (whitespace already collapsed by step 6 of the filter)Tokenise → run extractAsin per token → if every token yields an ASIN, batch into a single getItems([...asins]) (Amazon caps at 10 ASINs/call; excess → INVALID_SEARCH_INPUT)Enables paste-a-list-of-products imports through the same route.
ASIN(s) embedded in surrounding free text (e.g. headphones B08N5WRWNW)No shortcut — pass through to SearchItems as keywordsMixed input signals “find me something like this”; honour user intent rather than guessing.
A short-link URL (a.co/..., amzn.to/...)Surface UNSUPPORTED_SHORT_LINK (same error code /api/amazon/import returns)Consistent with the import route. No dereferencing inside the search route.
Non-US-locale Amazon URL (amazon.co.uk, amazon.de, …)Surface UNSUPPORTED_AMAZON_LOCALE (same error code as import)Same reason.
Anything elseContinue to SearchItems with the BFF→SDK mapping in the table above.Default path.

When the first SearchItems call returns zero items and the dispatcher did not take the GetItems shortcut, the client retries up to RELAXATION_MAX_RETRIES more times, each time dropping one constraint, before giving up and returning items: []. This is a deterministic server-side behaviour — not a contract surface — and applies on every request. The aim is to absorb common “one constraint too many” cases without surfacing them as zero results to the caller.

The retry ladder, in order, stops at the first non-empty page:

  1. Original call — all constraints as derived from the BFF request.
  2. Drop deliveryFlags (Prime restriction) if it was set.
  3. Drop the category restrictor (searchIndex / browseNodeId) if one was active.

Hard bounds:

  • Maximum 1 + RELAXATION_MAX_RETRIES Amazon calls per /api/amazon/search request (the original plus up to RELAXATION_MAX_RETRIES relaxations).
  • Total wall-clock budget ≤ RELAXATION_TIME_BUDGET_MS across all relaxation attempts; exceed → stop early and return what the most-recent call produced.
  • Relaxation never mutates the keywords text content — only filter and category constraints. Rewriting the user’s words is out of scope here.

When the dispatcher takes the GetItems shortcut, the route’s response shape is unchanged — it still returns { items: AmazonImportDto[], totalResultsHint? }, just sourced from GetItems instead of SearchItems. Callers do not need to know which path was taken.

Richer zero-result strategies — out of scope for v1

Section titled “Richer zero-result strategies — out of scope for v1”

Strategies that go beyond the constraint-dropping relaxation above — LLM-generated alternative-query suggestions, external web-search fallbacks (Brave, Tavily, Perplexity, SerpAPI, Firecrawl, …), synonym expansion, spelling correction — are explicitly out of scope for PDEV-457. The decision for v1 is to ship only the silent Amazon-side relaxation above; neither LLM augmentation nor external web search is implemented or exposed in the route contract.

The deferred LLM-suggestion feature is tracked under PDEV-569 (Backlog, blocked by PDEV-457). The full survey of options — Amazon-side / external / LLM — that informed the decision is in query-relaxation-exploration.md.

ConcernNote
OAuth2 / credentialsReuses the existing creatorsClient and AMAZON_CREATORS_* env vars. No new server secrets.
MarketplaceUS only, via the existing marketplace header on the Creators client. Non-US is out of scope.
Affiliate tagEach AmazonImportDto carries an affiliate-tagged URL via the existing helper. No helper changes.
Rate-limit postureEach /api/amazon/search call is at most 1 + RELAXATION_MAX_RETRIES Amazon S2S calls: the original SearchItems, plus up to RELAXATION_MAX_RETRIES silent relaxation retries when the result is empty. Resolving a category label to a browseNodeId may add at most one GetBrowseNodes call (cacheable in-memory). The GetItems shortcut path is at most 1 call.
PaginationNone. First page only, capped at MAX_RESULTS. Refactoring to add pagination is non-breaking (extend request/response with optional fields).
Sort / refinementsv1 exposes relevance and price-low-to-high only. SearchRefinements from Amazon’s response is dropped at the BFF boundary; surfacing it later is non-breaking.
CachingNone. Same posture as today’s /api/amazon/import.
MSWsrc/mocks/handlers/amazon.ts gets handlers for /api/amazon/search covering: success (multi-item), zero results (200 + empty items), invalid input, Creators failure.
TestsRoute-level tests for each branch (success, zero-results, invalid input, Creators failure); creatorsClient.searchItems wrapper tests covering header, resource selector, error mapping, and the BFF→SDK field mapping above.

What the underlying Amazon Creators API offers, summarized so the route contract can be read against its actual ceiling. This is a snapshot of the SDK shape vendored as amazon-creators-api@1.2.2; consult the SDK’s own types for the canonical reference.

OperationPurposeUsed in this project?
GetItems(marketplace, { ASINs[], resources[] })Look up one or more items by ASIN.Yes — by the existing /api/amazon/import route.
SearchItems(marketplace, { keywords, ... })Keyword search with filters and sort; returns up to itemCount items per page plus searchRefinements.Yes — primary operation for the new /api/amazon/search route.
GetBrowseNodes(marketplace, { browseNodeIds[], resources[] })Resolve category browse-node ids to their full metadata (name, ancestors, children, sales rank).Possibly, to back the categories[]browseNodeId mapping (small in-memory cache).
GetVariations(marketplace, { ASIN, resources[] })Walk parent ASIN to child variation ASINs (e.g. size/color).No.
GetFeed / ListFeeds / GetReport / ListReportsBulk/feed-style report operations.No.

SearchItems request fields (most relevant subset)

Section titled “SearchItems request fields (most relevant subset)”
FieldTypeUse here
keywordsstringReceives query plus joined keywords[] from the BFF.
title, brand, author, actor, artiststringTyped restrictors. Not surfaced in v1; could replace generic keyword joining later.
searchIndexstring enumSingle-category restrictor (e.g. All, HomeGarden, OfficeProducts). One of two ways to honor categories[].
browseNodeIdstringNumeric browse-node id. The narrower category restrictor; one node per request.
conditionenum: Any, New (SDK type is sparse; Amazon also exposes Used, Refurbished, Collectible)Not used in v1.
availabilityenum: Available, IncludeOutOfStockNot used in v1 (defaults to Available).
minPrice, maxPricenumberNot used in v1.
minReviewsRating, minSavingPercentnumberNot used in v1.
deliveryFlagsarray of AmazonGlobal | FreeShipping | FulfilledByAmazon | PrimeReceives ["Prime"] when primeOnly: true.
sortByenum: Featured, Relevance, AvgCustomerReviews, NewestArrivals, Price:LowToHigh, Price:HighToLowrelevanceRelevance; price-low-to-highPrice:LowToHigh.
itemCountnumberMAX_RESULTS.
itemPagenumberNot used in v1 (first page only).
currencyOfPreference, languagesOfPreferencelocale prefsNot used in v1.
partnerTagstringAMAZON_ASSOCIATE_TAG.
resourcesarray of SearchItemsResourceSelector for which Item.* fields the response includes — images.primary.large, itemInfo.title, offersV2.listings.price, etc. The client requests the same selector the existing GetItems path uses so each search hit is already a full AmazonImportDto.
propertiesRecord<string,string>Advanced/experimental; not used.
FieldNotes
searchResult.items[]Each item is a full Item resource shaped by the request’s resources selector.
searchResult.totalResultCount”Total results” hint surfaced to the BFF as totalResultsHint.
searchResult.searchRefinementsFacet bins (browseNode, searchIndex, otherRefinements[]). Not surfaced in v1.
errors[]Structured Amazon error envelope; mapped to the BFF’s AMAZON_API_ERROR.

Keywords syntax and “smart query” support

Section titled “Keywords syntax and “smart query” support”

Amazon’s PA-API / Creators API documentation is intentionally sparse about what the Keywords parameter understands. The verified behaviours, based on Amazon’s own use-case docs and confirmed in vendor commentary:

FormBehaviourHow we use it
Plain space-separated tokensAND-style narrowing (proprietary, not formally documented).Default join for query and keywords[].
Pipe | separator between identifier-shaped tokens, with SearchIndex=All and ItemInfo.ExternalIds in resourcesBulk lookup by external identifier (UPC, EAN, ISBN). Amazon may return adjacent products; the caller must filter the response by exact match against ItemInfo.ExternalIds.{EANs,UPCs,ISBNs}.DisplayValues.See “Smart identifier handling” below.
Boolean operators (AND, OR, NOT), quoted phrases, minus exclusionNot officially supported. Amazon’s site search has its own syntax but Keywords on the API does not promise to honour any of it.We do not advertise, document, or interpret these. Whatever the user types is passed through; behaviour is whatever Amazon chooses to do.
Typed restrictors (Title, Brand, Author, Actor, Artist as separate SearchItemsRequestContent fields)Stronger and more reliable than AND-stuffing into Keywords.Not exposed in v1. A natural v2 lever once the BFF surface stabilises.

When the cleaned query (after the filter pipeline and ASIN dispatcher) is a list of tokens all of which match one of the patterns below, the client switches to identifier mode: joins the tokens with |, sets SearchIndex=All, and adds ItemInfo.ExternalIds to the resource selector. The response is filtered to keep only items whose ItemInfo.ExternalIds echoes one of the input identifiers.

PatternFormat
UPC-A^\d{12}$
EAN-13 (incl. ISBN-13 97[89]\d{10})^\d{13}$
EAN-8^\d{8}$
ISBN-10^\d{9}[\dX]$

This buys cheap “scan-a-list-of-barcodes” import through the same route without a separate endpoint. Mixed input (some identifier-shaped, some not) falls through to plain keyword search.

Sub-bits worth pinning before implementation
Section titled “Sub-bits worth pinning before implementation”

Conditional resource selector. The original spec assumed identifier mode would request additional fine-grained ItemInfo.ExternalIds.{EANs,UPCs,ISBNs}.DisplayValues paths that the default mode would omit, to keep payload small for plain keyword searches. Implementation found this is not possible at the SDK level: the vendored amazon-creators-api@1.2.2 SearchItemsResource / GetItemsResource enums only expose 'itemInfo.externalIds' as a single key that returns all external-id sub-fields together. There is no way to ask for EANs but not UPCs, or to omit external IDs entirely once any sub-resource is requested.

Furthermore, the existing V1_RESOURCES constant used by getItems already contains 'itemInfo.externalIds' (because the current AmazonImportDto carries the first UPC). So:

  • The “default” mode resource selector and the “identifier” mode resource selector are functionally identical in v1.
  • The resourcesForMode(mode: SearchMode) abstraction is still kept (in src/server/lib/amazon/creators-client.ts) so a future SDK revision that exposes finer-grained resource paths can be adopted by changing only the helper. SearchMode = "default" | "identifier". The function de-duplicates internally.
  • There is no payload-size win from mode-discrimination in v1. The spec’s earlier reasoning about “avoid response bloat” does not apply.

Response filtering. Amazon’s docs explicitly say bulk-identifier search “may return adjacent products”; callers are expected to filter the response against the input identifiers. The filter is a pure function:

filterByExternalIds(
items: SearchItem[],
inputIds: ReadonlyArray<string>,
): SearchItem[]

Match an input id against an item by checking, in order: ItemInfo.ExternalIds.EANs.DisplayValues, ItemInfo.ExternalIds.UPCs.DisplayValues, ItemInfo.ExternalIds.ISBNs.DisplayValues. If any input id appears in any of those DisplayValues arrays for that item, the item is kept; otherwise it is dropped. Comparison is exact-string (Amazon returns identifiers in canonical form; no normalisation needed beyond the defensive filter that ran upstream).

Deduplication. A single product can carry several external identifiers (e.g. a book with both ISBN-10 and ISBN-13; a product with multiple UPC barcodes across regional variants). Filtering by exact match against multiple input ids can yield the same item more than once if both ids happen to point to the same ASIN. Dedup the filtered list by ASIN, preserving the first occurrence.

Output ordering. Items are returned in Amazon’s response order with non-matching items removed and ASIN-duplicates collapsed. The route does not attempt to re-order results to match the input id ordering; doing so adds complexity without a real consumer in v1. Document the order in the route contract so callers do not accidentally rely on input-order alignment.

Edge cases for tests.

  • Pure-identifier list of 3 UPCs, Amazon returns exactly 3 items — pass-through.
  • Pure-identifier list, Amazon returns 5 items where 2 are “adjacent products” — filter drops the 2 noise items.
  • Pure-identifier list, none of Amazon’s response items echo any input id — items: [] (still a 200, not an error).
  • Mixed token list (one UPC + one keyword) — falls through to plain keyword search, no identifier mode triggered.
  • One ASIN + one UPC — neither shortcut fires; falls through to plain keyword search.
  • Single ISBN-10 with trailing X — pattern matches, identifier mode triggers with a single-element pipe-list.
AspectNote
AuthOAuth2 client-credentials. The creatorsClient already obtains and rotates tokens transparently.
MarketplaceSelected via the X-Marketplace header on every call. US only for this project.
Rate limitsTPS and TPD quotas shared across GetItems, SearchItems, GetBrowseNodes, GetVariations. Throttled responses surface as ThrottleExceptionResponseContent; the BFF maps to AMAZON_API_ERROR.
Failure modesAccessDeniedException, UnauthorizedException, ValidationException, InternalServerException, ResourceNotFoundException, ThrottleException (SDK type names). All collapse to AMAZON_API_ERROR at the BFF for v1; finer-grained mapping is a future extension.
PA-API sunsetThe legacy PA-API 5.0 product surface is deprecated on 2026-05-15; the Creators API (used here via amazon-creators-api@1.2.2) is the supported successor. Confirmed in Amazon’s own docs and external write-ups. The Keywords semantics described above carry over unchanged.
IDQuestionStatus
Q1Multi-category requests — Amazon allows only one searchIndex or one browseNodeId per SearchItems call. When categories[] has more than one entry, how should the client behave?Resolved — option (a). The client uses the first entry as the Amazon category restrictor (searchIndex or browseNodeId) and appends the remaining entries to the keywords string as additional restrictive terms.
Q2categories[] resolution source — static enum-matching table only, or also dynamic GetBrowseNodes lookup with an in-memory cache?Resolved — static table for v1, hidden behind a resolveCategory(label: string): { searchIndex?: string; browseNodeId?: string } | null function so the implementation can be swapped for a dynamic GetBrowseNodes-backed resolver later without a contract or call-site change.
Q3Should empty categories[] / keywords[] arrays be treated as “field absent”, or rejected as malformed?Resolved — entries that go empty after the defensive filter are dropped silently; if the array becomes empty it is treated as absent. Caller-side empty arrays are equivalent to omitting the field. See “Search Input Processing — Defensive input filter”.
Q4Concrete values for the server-side constants.Resolved — values fixed (see Constants at the top of “Search Input Processing”). The constants remain server-side names so they can be tuned without a contract change.
Q5Zero-result augmentation policy beyond silent Amazon-side relaxation.Resolved — out of scope for v1. Neither LLM-driven suggestions nor external web-search fallbacks ship in PDEV-457. The deferred LLM-suggestion feature is tracked under PDEV-569. External web-search providers are not currently planned.
Q6LLM provider for suggestion generation.Resolved — out of scope for v1. Decision tracked in PDEV-569 if and when that issue is picked up.
Q7Should the server silently re-run SearchItems against generated suggestions before responding?Resolved — out of scope for v1. No suggestion generation in v1, so no re-search path exists. The constraint (no server-side re-search) is preserved in PDEV-569 for the deferred feature.

Copyright: (c) Arda Systems 2025-2026, All rights reserved