Amazon Import — Specification Analysis

This project is restricted to the Next.js BFF route surface in arda-frontend-app. The chosen design splits keyword search from URL/ASIN lookup into two sibling routes, with shared URL/ASIN input normalisation:

POST /api/amazon/import — preserved response contract, but its set of accepted URL/ASIN input shapes is broadened by the new shared normalisation layer (see URL/ASIN input normalisation).
POST /api/amazon/search — new. Flexible search input shape; the BFF client layer (creatorsClient.searchItems) disambiguates and maps onto the Amazon Creators API.

Existing route — `POST /api/amazon/import` (broadened acceptance)

Field	Value
Method, path	`POST /api/amazon/import`
Request body	`{ "input": "<URL or ASIN or text containing exactly one ASIN>" }`
Success	`200` with the `{ ok: true, data: AmazonImportDto }` envelope (single item). The `/api/amazon/search` route below uses the same wire envelope; only the inner `data` shape differs (`AmazonImportDto[]` vs single `AmazonImportDto`).
Errors	`UNRECOGNIZED_AMAZON_URL`, `UNSUPPORTED_SHORT_LINK`, `UNSUPPORTED_AMAZON_LOCALE`, `ITEM_NOT_FOUND`, `AMAZON_API_ERROR`

The response contract is preserved verbatim — same DTO, same error codes, same HTTP statuses. What changes is the set of inputs accepted on the UNRECOGNIZED_AMAZON_URL boundary: the route now delegates extraction to extractAsinLenient (see below), which accepts schemeless URLs, path-only inputs, lowercase bare ASINs, additional US sub-hosts, old-form product paths, and plain text containing exactly one ASIN. Previously rejected inputs that map cleanly to a single ASIN now succeed; inputs that genuinely don’t carry an ASIN still return UNRECOGNIZED_AMAZON_URL.

URL/ASIN input normalisation

User-reported feedback: the current extractAsin in src/lib/shared/amazon/asin.ts is too restrictive — it rejects pasted inputs that drop the URL scheme, that are path-only, that use lowercase ASINs, that come from mobile / Smile / Kindle sub-hosts, that use the old /exec/obidos/ASIN/ form, or that mix a single ASIN with surrounding prose. This project widens that acceptance set with a shared normalisation layer used by both routes.

The layer is split into two functions to keep /search’s ASIN-shortcut dispatcher from over-interpreting a keyword query that happens to contain an ASIN.

`extractAsin(input)` — strict

The canonical extractor used by /search’s multi-token ASIN-shortcut dispatcher. Accepts only inputs that are unambiguously an ASIN or an Amazon product URL — no plain-text fallback.

Acceptance set:

Shape	Example	Notes
Bare ASIN (10 chars `[A-Za-z0-9]`)	`B08N5WRWNW`, `b08n5wrwnw`	New: case-folded to uppercase before pattern test. Outer whitespace trimmed.
Canonical product URL (full scheme, US host)	`https://www.amazon.com/dp/B08N5WRWNW`, etc.	All four existing canonical forms: `/dp/`, `/<slug>/dp/`, `/gp/product/`, `/gp/aw/d/`.
New: Old-form product URL	`https://www.amazon.com/exec/obidos/ASIN/B08N5WRWNW`	Also matches `/o/ASIN/<ASIN>`.
New: Extended US sub-hosts	`https://m.amazon.com/dp/B08N5WRWNW`, `smile.amazon.com`, `read.amazon.com`	Added to the US allow-list alongside `www.amazon.com` and `amazon.com`.
New: Schemeless URL	`www.amazon.com/dp/B08N5WRWNW`, `amazon.com/dp/B08N5WRWNW`	When `new URL(input)` throws and the input starts with `(www\.)?amazon\.com/` or a known US sub-host, retry with `https://` prepended.
New: Path-only URL (with leading `/`)	`/dp/B08N5WRWNW`, `/gp/product/B08N5WRWNW`, `/<slug>/dp/<ASIN>`, `/exec/obidos/ASIN/<ASIN>`, `/o/ASIN/<ASIN>`	Treated as `https://www.amazon.com<path>`.
New: Path-only URL (without leading `/`)	`dp/B08N5WRWNW`, `Some-Product/dp/B08N5WRWNW`, `gp/product/B08N5WRWNW`, `gp/aw/d/B08N5WRWNW`, `exec/obidos/ASIN/B08N5WRWNW`, `o/ASIN/B08N5WRWNW`	Same as above; prepend `https://www.amazon.com/`.

Rejection set:

Shape	Code
`a.co/...`, `amzn.to/...`	`UNSUPPORTED_SHORT_LINK`
`amazon.<non-US-tld>/dp/<ASIN>` (incl. `co.uk`, `de`, `ca`, `co.jp`, …)	`UNSUPPORTED_AMAZON_LOCALE`
Look-alike host (`myamazon.com`, `amazonn.com`, …)	`UNRECOGNIZED_AMAZON_URL`
URL parses, host is US Amazon, but path is non-product (`/s?k=…`, `/cart`, `/`)	`UNRECOGNIZED_AMAZON_URL`
Plain text without a parseable URL or bare ASIN	`UNRECOGNIZED_AMAZON_URL`

`extractAsinLenient(input)` — permissive

Used by /api/amazon/import. Runs the same step ladder as strict extractAsin (bare ASIN → new URL() → scheme prepend → path-only) and adds one final fallback: plain-text ASIN extraction, fired only when no earlier step produced a parseable Amazon URL.

In other words: the side effect of “URL parsed but path is non-product → stop at step 3” applies in lenient too. A user pasting https://www.amazon.com/s?k=B08N5WRWNW into /api/amazon/import still receives UNRECOGNIZED_AMAZON_URL, even though the input contains an ASIN-shaped token — the URL parsed and we honour its non-product semantics.

The plain-text fallback fires when every URL-parse attempt (original input, scheme-prepend retry, path-only normalisation) threw or failed to land on a US Amazon host. UNSUPPORTED_SHORT_LINK and UNSUPPORTED_AMAZON_LOCALE from any of those parse attempts short-circuit and return as-is — we never silently reroute a recognised-but-rejected URL to plain-text extraction.

Implementation note: extractAsinLenient cannot be a thin wrapper that inspects extractAsin’s return value, because strict returns the same UNRECOGNIZED_AMAZON_URL code for both “URL parsed but non-product” and “no URL parsed at all”. The two functions share internal step helpers but compose differently: lenient adds the plain-text step on the no-URL-parsed-at-all branch.

Plain-text extraction pattern (case-insensitive heuristic with digit-presence guard):

const ASIN_IN_TEXT_RE =
  /\b(B[A-Za-z0-9]{9}|\d{9}[\dXx])\b/g;
// Accept only when the matched token contains at least one digit
// (`B[A-Za-z0-9]{9}` branch) OR is an ISBN-10 (`\d{9}[\dXx]` branch).

Rule:

Find all tokens in the trimmed input matching ASIN_IN_TEXT_RE.
Filter the matches:
- For the B[A-Za-z0-9]{9} branch, require ≥1 digit in the token. This rejects 10-letter English words like Backbreaker, Background, Blueprints that start with B but contain no digits.
- The \d{9}[\dXx] branch (ISBN-10) is already digit-heavy and needs no guard.
Upper-case the surviving matches.
Accept exactly one match. Zero matches → UNRECOGNIZED_AMAZON_URL. Two or more distinct matches → UNRECOGNIZED_AMAZON_URL (the caller hit /import, which is single-item by contract; ambiguity should surface, not be silently resolved).

Resolution ordering

extractAsinLenient(input) follows this order. The first step that returns a non-UNRECOGNIZED_AMAZON_URL result wins.

Trim outer whitespace.
Bare-ASIN check (case-folded) — uppercases the input then matches ^[A-Z0-9]{10}$.
new URL(input) — if parseable with an http: or https: protocol:
- Short-link host → UNSUPPORTED_SHORT_LINK.
- Non-US Amazon host → UNSUPPORTED_AMAZON_LOCALE.
- US Amazon host (incl. new sub-hosts) → canonical or old-form path match → accept. Non-matching path → UNRECOGNIZED_AMAZON_URL; do NOT fall through to plain-text extraction (a search/cart URL is not an import).
- Non-Amazon host → fall through to step 4.
- URLs with a non-http/https protocol (e.g. product:, mailto:) are not authoritative: they fall through to step 4 even though new URL() succeeded.
Scheme prepend — if input starts with (www\.)?amazon\.com/, m\.amazon\.com/, smile\.amazon\.com/, or read\.amazon\.com/, retry step 3 with https:// prepended.
Path-only — if input matches one of the canonical or old-form path patterns with or without a leading /, treat as https://www.amazon.com/<normalised-path> and retry step 3.
Plain-text ASIN extraction (lenient only — extractAsin strict stops here and returns UNRECOGNIZED_AMAZON_URL).
Otherwise → UNRECOGNIZED_AMAZON_URL.

Stopping at step 3 when the URL parses but the path is non-product is intentional. It honours the user’s signal: if they pasted a parseable URL, we trust its semantics over hunting for ASIN-shaped tokens in its query string.

WHATWG parser quirk — literal-space inputs

Node’s WHATWG URL parser (and the browser’s) does parse inputs that contain literal spaces — it percent-encodes them. That means new URL('https://www.amazon.com/dp/B08N5WRWNW great price!') does not throw; it returns a URL whose pathname is /dp/B08N5WRWNW%20great%20price!, which then fails canonical-path matching at step 3 and would, by the strict “stop at step 3” rule, return UNRECOGNIZED_AMAZON_URL.

This would defeat the “URL + trailing prose → plain-text ASIN extraction” side effect we explicitly accepted. The implementation patches this with a hasLiteralSpace check at step 3: when the parsed URL classifies as US-Amazon-non-product AND the original input contains a literal space character, the step downgrades the outcome from recognised-rejection (which would stop the lenient pipeline) to no-parseable-url (which lets lenient fall through to plain-text ASIN extraction).

In effect: a clean URL on a non-product path stops as designed, but a URL with prose appended falls through. Strict extractAsin returns UNRECOGNIZED_AMAZON_URL either way — only lenient’s fall-through behaviour cares about the distinction.

Acknowledged side effects

URL + trailing prose: an input like https://www.amazon.com/dp/B08N5WRWNW great price! parses cleanly with new URL() (WHATWG percent-encodes the literal space — see WHATWG parser quirk — literal-space inputs above), so step 3’s hasLiteralSpace guard downgrades the parse outcome to no-parseable-url and the pipeline falls through to plain-text ASIN extraction. The ASIN is found and imported. Effectively the same outcome as if we’d extracted the URL from surrounding prose — accepted for /import’s user-intent model.
/import accepting prose: a comment like I want to import B08N5WRWNW please resolves to importing B08N5WRWNW. The user’s intent is unambiguous when only one ASIN appears.
Multiple ASINs in prose at /import: an input like B08N5WRWNW and also B0EXAMPLE2 returns UNRECOGNIZED_AMAZON_URL. /import is a single-item route by contract; ambiguous input should fail loudly, not pick one ASIN. Multi-ASIN paste belongs to /search’s strict-tokenised dispatcher.

Where each function is called

Caller	Function	Rationale
`/api/amazon/import` route handler	`extractAsinLenient`	Single-input route; user intent is to import a specific product, even when their paste includes prose.
`/api/amazon/search` — ASIN-shortcut dispatcher (single bare ASIN or single URL whole-input case)	`extractAsin` (strict)	Same input shape as `/import`’s strict path; consistent treatment.
`/api/amazon/search` — multi-token tokeniser	`extractAsin` (strict)	Plain-text extraction here would silently reroute keyword queries to import. The user typed search terms; we honour that.

Test matrix additions

Atop the existing 30-ish tests in asin.test.ts, the normalisation work adds roughly:

6 bare-ASIN case-folding cases (lowercase, mixed-case, leading/trailing whitespace + lowercase).
12 schemeless / path-only URL acceptance cases (each of the six path forms × with/without leading slash).
4 schemeless URL acceptance cases (www.amazon.com/dp/..., amazon.com/gp/product/..., m.amazon.com/dp/..., etc.).
4 extended US sub-host acceptance cases (m., smile., read. × at least one path form each).
2 old-form path acceptance cases (/exec/obidos/ASIN/<ASIN>, /o/ASIN/<ASIN>).
8 plain-text ASIN extraction cases for extractAsinLenient:
- Single ASIN in prose (accepts).
- Single lowercase ASIN in prose (accepts via case-fold).
- Single ASIN in prose with surrounding punctuation (accepts).
- Two or more distinct ASINs in prose (rejects with UNRECOGNIZED_AMAZON_URL).
- Zero ASINs in prose (rejects).
- 10-character English word (Backbreaker, Background) — must reject (digit-presence guard).
- 10-digit numeric token that isn’t an ISBN-10 — accepts via the all-digit ASIN branch.
- Search URL with ASIN in query string (/s?k=B08N5WRWNW) — rejects (stop-at-step-3 ordering).
4 flipped expectations: m.amazon.com, smile.amazon.com, read.amazon.com previously UNSUPPORTED_AMAZON_LOCALE, now accepted.

New route — `POST /api/amazon/search`

The request shape is deliberately flexible. The BFF route validates input and delegates to the client layer, which maps the rich BFF shape onto the Amazon SDK’s narrower SearchItemsRequestContent (see Amazon Creators API capabilities below). The route does not support pagination — every call returns the first page with up to MAX_RESULTS items.

Field	Value
Method, path	`POST /api/amazon/search`
Request body	flexible search input (see below)
Success	`200` with `{ "ok": true, "data": { "items": AmazonImportDto[], "totalResultsHint"?: number } }` — `items` is `[]` when the search succeeds with zero matches. The `{ ok, data }` wire wrap matches `/api/amazon/import`’s envelope (the Next.js handler renames the route module’s internal field name to `data`).
Errors	`INVALID_REQUEST` (`400`), `INVALID_SEARCH_INPUT` (`400`), `AUTHENTICATION_REQUIRED` (`401`), `UNSUPPORTED_SHORT_LINK` (`422`), `UNSUPPORTED_AMAZON_LOCALE` (`422`), `AMAZON_API_ERROR` (`502`). Error envelope: `{ "ok": false, "code": string, "message": string }`. `INVALID_REQUEST` is emitted by the Next.js handler on malformed JSON or wrong-shape body before the route module is invoked (mirrors `/api/amazon/import`’s structural guard); `INVALID_SEARCH_INPUT` is emitted by the route module on semantic validation failure (see “Request validation” below). The two 422 codes are emitted on single-input recognised-rejection (when the strict ASIN extractor parses the query as a short link or non-US Amazon URL); they preserve the import route’s user-intent rejection rather than dereferencing the input.
Auth	Cognito JWT verification at the Next.js handler level (mirrors `/api/amazon/import`’s `processJWTForArda` call in `src/app/api/amazon/import/route.ts`). The route module itself remains auth-agnostic. `tenantId` from the JWT result is not currently threaded through the route module — same posture as `/import`.

Request body

{
  // Free-text query — primary search term. Up to MAX_QUERY_LENGTH chars.
  // Optional when `keywords[]` is non-empty after filtering; otherwise
  // required. Trimmed; non-empty after trim.
  "query": "string",

  // Optional additional keyword terms that further restrict the search.
  // Each entry is a single keyword/phrase; combined with `query` by the
  // client layer before calling Amazon.
  "keywords": ["string", "..."],

  // Optional category restrictions. Each entry is a free-form category
  // label (e.g. "HomeGarden", "OfficeProducts", or a human-friendly
  // synonym). The client layer disambiguates against Amazon's
  // `SearchIndex` enum and/or `browseNodeId`; unresolvable entries are
  // either dropped with a warning or rejected — see Q1 in Open
  // Questions.
  "categories": ["string", "..."],

  // Optional. When `true`, restricts results to Prime-eligible items
  // (maps to Amazon `deliveryFlags: ["Prime"]`).
  "primeOnly": false,

  // Optional. Result ordering.
  //   "relevance"          — Amazon's default (SortBy=Relevance)
  //   "price-low-to-high"  — SortBy=Price:LowToHigh
  // Default: "relevance".
  "sortBy": "relevance" | "price-low-to-high"
}

Silent Amazon-side query relaxation on zero results (described in Search Input Processing) is always on and is not a contract surface — it is a server behaviour. Richer zero-result strategies (LLM-generated suggestions, external web search) are deferred to a v2 follow-on tracked under PDEV-569 and are explicitly out of scope for PDEV-457.

Request validation

Field	Constraint	On violation
`query`	Optional only when `keywords[]` is non-empty after filtering; otherwise required. Trimmed; `length ≤ MAX_QUERY_LENGTH`. At least one of `query` or `keywords[]` must produce non-empty content after the defensive filter — `categories[]` and `primeOnly` alone do not satisfy Amazon’s “at least one search term” requirement.	`INVALID_SEARCH_INPUT`
`keywords`	Optional; if present, array of strings; `length ≤ MAX_KEYWORDS`; each entry `length ≤ MAX_KEYWORD_LENGTH`.	`INVALID_SEARCH_INPUT`
`categories`	Optional; if present, array of strings; `length ≤ MAX_CATEGORIES`; each entry `length ≤ MAX_CATEGORY_LENGTH`.	`INVALID_SEARCH_INPUT`
`primeOnly`	Optional; boolean.	`INVALID_SEARCH_INPUT`
`sortBy`	Optional; one of `"relevance"`, `"price-low-to-high"`.	`INVALID_SEARCH_INPUT`

All numeric caps are server-side constants — see Constants at the top of “Search Input Processing”.

Response

The full wire response is the { ok: true, data: <below> } envelope (see the “Success” row of the route-summary table above). The block below shows the inner data shape:

{
  // Up to MAX_RESULTS entries, same shape as `/api/amazon/import` returns
  // today. Order matches the requested `sortBy` (Amazon's default when
  // not specified). Empty array on zero matches — not an error.
  "items": [ /* AmazonImportDto */ ],

  // Optional convenience copy of Amazon's "total results" indicator when
  // present. May be `0` on a zero-match response, or absent entirely.
  "totalResultsHint": 1234
}

On zero matches the route still returns HTTP 200 with data.items: [] (and totalResultsHint either 0 or omitted).

MAX_RESULTS is one of the server-side constants — see Constants.

Error codes (envelope `{ code, message }`, HTTP status in parens)

The table below documents the error surface emitted by the route module itself. AUTHENTICATION_REQUIRED (401) listed in the route-summary table above is emitted by the Next.js handler (Cognito JWT verification) before the route module is invoked, so it is not produced by this module — but it is still observable on the wire and tests targeting the HTTP endpoint must account for it.

Code	When
`INVALID_REQUEST` (`400`)	Emitted upstream by the Next.js handler on malformed JSON or wrong-shape request body (mirrors `/api/amazon/import`’s structural guard). Not produced by the route module; listed here for wire-contract completeness.
`INVALID_SEARCH_INPUT` (`400`)	Any request-body semantic validation failure inside the route module (see “Request validation” above). Fires after `INVALID_REQUEST`’s structural check passes.
`AUTHENTICATION_REQUIRED` (`401`)	Emitted upstream by the Next.js handler when Cognito JWT verification fails. Not produced by the route module; listed here for wire-contract completeness.
`UNSUPPORTED_SHORT_LINK` (`422`)	Single-input recognised-rejection: the strict ASIN extractor parsed the query as an `a.co` / `amzn.to` short link. Same error code `/api/amazon/import` emits for the same input class; the search route preserves the import route’s user-intent rejection rather than dereferencing the redirect.
`UNSUPPORTED_AMAZON_LOCALE` (`422`)	Single-input recognised-rejection: the strict ASIN extractor parsed the query as a non-US Amazon URL (`amazon.co.uk`, `amazon.de`, …). Same error code `/api/amazon/import` emits for the same input class; v1 search is US-marketplace only.
`AMAZON_API_ERROR` (`502`)	Creators API call failed (network, throttling, 5xx).

Zero matching items is not an error — the route returns 200 with items: [] and (optionally) totalResultsHint: 0. Callers decide how to present “no matches”.

Client-layer disambiguation responsibility

The creatorsClient.searchItems wrapper in src/server/lib/amazon/creators-client.ts owns the mapping from this flexible BFF shape to Amazon’s narrower SearchItemsRequestContent:

BFF field	Maps to Amazon SDK
`query`	Concatenated with `keywords[]` (space-joined) and sent as `SearchItemsRequestContent.keywords`.
`keywords[]`	See above.
`categories[]`	First entry only is resolved to a `SearchIndex` enum or a `browseNodeId` via a `resolveCategory(label)` helper (see Q2 in Open questions); remaining entries are appended to `keywords` as restrictive terms. Single-category restriction is a hard Amazon constraint — `searchIndex` and `browseNodeId` each accept a single value per `SearchItems` call.
`primeOnly: true`	`SearchItemsRequestContent.deliveryFlags = ["Prime"]`.
`primeOnly: false` or unset	`deliveryFlags` left unset.
`sortBy: "relevance"`	`SearchItemsRequestContent.sortBy = "Relevance"` (or omitted — Amazon defaults to Relevance).
`sortBy: "price-low-to-high"`	`SearchItemsRequestContent.sortBy = "Price:LowToHigh"`.
(always)	`itemCount = MAX_RESULTS`, no `itemPage` (always first page).
(always)	`resources = [ ... Item.* fields the existing import path already requests ... ]` — same selector as today’s `GetItems` so each result is already fully populated as `AmazonImportDto`.
(always)	`partnerTag = AMAZON_ASSOCIATE_TAG` (server env), `marketplace = US`.

Search Input Processing

Before any request reaches Amazon, the BFF runs the inputs through three deterministic stages, all in the server-side client layer so the route handler itself stays a thin contract:

A defensive filter that normalises and bounds the raw strings.
An ASIN extract-and-dispatch stage that bypasses SearchItems entirely when the user has already given us an unambiguous identifier.
A silent Amazon-side query relaxation stage that, on a zero-result primary response, retries the call with progressively fewer constraints (capped, never mutating the user’s words).

Constants

All numeric thresholds in this section are named server-side constants, declared in one place in the route module so they can be tuned without a contract change. Initial values:

Constant	Value	Use
`MAX_QUERY_LENGTH`	`1024`	Max characters in `query` after Unicode NFC normalisation.
`MAX_KEYWORDS`	`20`	Max array length of `keywords[]` after empty-entry drop.
`MAX_KEYWORD_LENGTH`	`64`	Max characters per surviving `keywords[]` entry.
`MAX_CATEGORIES`	`5`	Max array length of `categories[]` after empty-entry drop.
`MAX_CATEGORY_LENGTH`	`64`	Max characters per surviving `categories[]` entry.
`MAX_RESULTS`	`10`	Hard cap on returned `items` (also the Amazon `SearchItems.itemCount` cap).
`RELAXATION_MAX_RETRIES`	`2`	Max number of additional `SearchItems` calls after a zero-result primary.
`RELAXATION_TIME_BUDGET_MS`	`1500`	Total wall-clock budget across all relaxation retries; exceed → stop early.
`BATCH_ASIN_MAX`	`10`	Max ASINs in a single `GetItems` batch in the ASIN-shortcut path (also the Amazon cap).

Defensive input filter

Each string field (query, every entry of keywords[], every entry of categories[]) goes through the same pipeline before being used. Failures in steps marked reject surface as INVALID_SEARCH_INPUT (400). Failures in steps marked drop silently remove the offending entry from an array; if the array becomes empty it is treated as absent.

#	Step	Behaviour
1	Reject if any string exceeds its max length (`MAX_QUERY_LENGTH`, `MAX_KEYWORD_LENGTH`, `MAX_CATEGORY_LENGTH`).	Bounds resource usage. Query-presence rule is checked in step 8 (after array filtering) so that callers can omit `query` when `keywords[]` carries the search terms.
2	Unicode NFC normalise.	Canonicalises bytes vs. codepoints; defeats trivial homoglyph variants. Runs before the post-normalisation length re-check in step 3.
3	Re-check length after normalisation; reject on overflow.	Normalisation can change byte length.
4	Replace control characters (`\x00–\x1F`, `\x7F`) with a single space.	Removes pasted newlines, tabs, null bytes — never meaningful for Amazon search.
5	Replace `<` and `>` with a single space.	Defensive against pasted HTML and log-context confusion; Amazon never needs angle brackets in keywords.
6	Collapse internal whitespace runs to a single space, then trim.	Avoids spending Amazon’s keyword budget on whitespace; keeps quote/apostrophe/accented characters intact since real product titles use them.
7	For array fields (`keywords[]`, `categories[]`): drop entries that go empty after filtering, then reject if the resulting array exceeds its cardinality cap (`MAX_KEYWORDS`, `MAX_CATEGORIES`).	Empty entries are not malformed input; over-cardinality is.
8	Reject if the combination yields no search terms — i.e. `query` is empty/absent and `keywords[]` (post-step-7) is empty/absent.	Amazon’s `SearchItems` requires at least one of `Keywords`, `Title`, `Brand`, `Author`, `Actor`, `Artist`; we only populate `Keywords` from `query` + `keywords[]`. `categories[]` and `primeOnly` alone do not satisfy Amazon — they restrict, they do not search.

Things the filter deliberately does not do:

Does not lowercase. Amazon’s keyword search is case-insensitive but preserving case is friendlier in any future preview/echo response.
Does not strip apostrophes, quotes, ampersands, currency symbols, or accented characters. Product titles legitimately contain them.
Does not strip the pipe character | blindly — it has Amazon-specific meaning when bulk-looking-up external identifiers (see “Smart identifier handling” in Amazon Creators API capabilities).
Does not interpret boolean operators (AND, OR, NOT), quoted phrases, or minus exclusion. Amazon’s documented Keywords semantics do not honour these, so pretending we do would be a lie.

ASIN extract-and-dispatch

After the defensive filter, the client inspects the cleaned query for an unambiguous identifier before calling SearchItems. When one is found, the call is redirected to GetItems because that is the cheaper, deterministic, correct operation for “I already know which product I want”.

The level of effort is bounded to what extractAsin (src/lib/shared/amazon/asin.ts) already does today — bare ASINs and a small set of canonical URL paths on amazon.com. No new HTTP work (no dereferencing short links), no new regexes beyond a thin “tokenize and try extractAsin per token” pass for the multi-ASIN case.

`query` content (after filtering)	Dispatch	Notes
A single bare ASIN, e.g. `B08N5WRWNW`	`creatorsClient.getItems([asin])` → first/only `AmazonImportDto` returned in `items` array	Same DTO shape as `SearchItems` would have returned, no search noise.
A single Amazon product URL (`/dp/<asin>`, `/<slug>/dp/<asin>`, `/gp/product/<asin>`, `/gp/aw/d/<asin>` on `amazon.com`)	Extract ASIN → `getItems([asin])`	Reuses existing `extractAsin`.
Multiple ASINs and/or URLs separated by whitespace, commas, semicolons, or newlines (whitespace already collapsed by step 6 of the filter)	Tokenise → run `extractAsin` per token → if every token yields an ASIN, batch into a single `getItems([...asins])` (Amazon caps at 10 ASINs/call; excess → `INVALID_SEARCH_INPUT`)	Enables paste-a-list-of-products imports through the same route.
ASIN(s) embedded in surrounding free text (e.g. `headphones B08N5WRWNW`)	No shortcut — pass through to `SearchItems` as keywords	Mixed input signals “find me something like this”; honour user intent rather than guessing.
A short-link URL (`a.co/...`, `amzn.to/...`)	Surface `UNSUPPORTED_SHORT_LINK` (same error code `/api/amazon/import` returns)	Consistent with the import route. No dereferencing inside the search route.
Non-US-locale Amazon URL (`amazon.co.uk`, `amazon.de`, …)	Surface `UNSUPPORTED_AMAZON_LOCALE` (same error code as import)	Same reason.
Anything else	Continue to `SearchItems` with the BFF→SDK mapping in the table above.	Default path.

Silent Amazon-side query relaxation

When the first SearchItems call returns zero items and the dispatcher did not take the GetItems shortcut, the client retries up to RELAXATION_MAX_RETRIES more times, each time dropping one constraint, before giving up and returning items: []. This is a deterministic server-side behaviour — not a contract surface — and applies on every request. The aim is to absorb common “one constraint too many” cases without surfacing them as zero results to the caller.

The retry ladder, in order, stops at the first non-empty page:

Original call — all constraints as derived from the BFF request.
Drop deliveryFlags (Prime restriction) if it was set.
Drop the category restrictor (searchIndex / browseNodeId) if one was active.

Hard bounds:

Maximum 1 + RELAXATION_MAX_RETRIES Amazon calls per /api/amazon/search request (the original plus up to RELAXATION_MAX_RETRIES relaxations).
Total wall-clock budget ≤ RELAXATION_TIME_BUDGET_MS across all relaxation attempts; exceed → stop early and return what the most-recent call produced.
Relaxation never mutates the keywords text content — only filter and category constraints. Rewriting the user’s words is out of scope here.

When the dispatcher takes the GetItems shortcut, the route’s response shape is unchanged — it still returns { items: AmazonImportDto[], totalResultsHint? }, just sourced from GetItems instead of SearchItems. Callers do not need to know which path was taken.

Richer zero-result strategies — out of scope for v1

Strategies that go beyond the constraint-dropping relaxation above — LLM-generated alternative-query suggestions, external web-search fallbacks (Brave, Tavily, Perplexity, SerpAPI, Firecrawl, …), synonym expansion, spelling correction — are explicitly out of scope for PDEV-457. The decision for v1 is to ship only the silent Amazon-side relaxation above; neither LLM augmentation nor external web search is implemented or exposed in the route contract.

The deferred LLM-suggestion feature is tracked under PDEV-569 (Backlog, blocked by PDEV-457). The full survey of options — Amazon-side / external / LLM — that informed the decision is in query-relaxation-exploration.md.

Cross-cutting concerns

Concern	Note
OAuth2 / credentials	Reuses the existing `creatorsClient` and `AMAZON_CREATORS_*` env vars. No new server secrets.
Marketplace	US only, via the existing marketplace header on the Creators client. Non-US is out of scope.
Affiliate tag	Each `AmazonImportDto` carries an affiliate-tagged URL via the existing helper. No helper changes.
Rate-limit posture	Each `/api/amazon/search` call is at most `1 + RELAXATION_MAX_RETRIES` Amazon S2S calls: the original `SearchItems`, plus up to `RELAXATION_MAX_RETRIES` silent relaxation retries when the result is empty. Resolving a category label to a `browseNodeId` may add at most one `GetBrowseNodes` call (cacheable in-memory). The `GetItems` shortcut path is at most 1 call.
Pagination	None. First page only, capped at `MAX_RESULTS`. Refactoring to add pagination is non-breaking (extend request/response with optional fields).
Sort / refinements	v1 exposes `relevance` and `price-low-to-high` only. `SearchRefinements` from Amazon’s response is dropped at the BFF boundary; surfacing it later is non-breaking.
Caching	None. Same posture as today’s `/api/amazon/import`.
MSW	`src/mocks/handlers/amazon.ts` gets handlers for `/api/amazon/search` covering: success (multi-item), zero results (`200` + empty `items`), invalid input, Creators failure.
Tests	Route-level tests for each branch (success, zero-results, invalid input, Creators failure); `creatorsClient.searchItems` wrapper tests covering header, resource selector, error mapping, and the BFF→SDK field mapping above.

Amazon Creators API capabilities

What the underlying Amazon Creators API offers, summarized so the route contract can be read against its actual ceiling. This is a snapshot of the SDK shape vendored as amazon-creators-api@1.2.2; consult the SDK’s own types for the canonical reference.

Operations

Operation	Purpose	Used in this project?
`GetItems(marketplace, { ASINs[], resources[] })`	Look up one or more items by ASIN.	Yes — by the existing `/api/amazon/import` route.
`SearchItems(marketplace, { keywords, ... })`	Keyword search with filters and sort; returns up to `itemCount` items per page plus `searchRefinements`.	Yes — primary operation for the new `/api/amazon/search` route.
`GetBrowseNodes(marketplace, { browseNodeIds[], resources[] })`	Resolve category browse-node ids to their full metadata (name, ancestors, children, sales rank).	Possibly, to back the `categories[]` → `browseNodeId` mapping (small in-memory cache).
`GetVariations(marketplace, { ASIN, resources[] })`	Walk parent ASIN to child variation ASINs (e.g. size/color).	No.
`GetFeed` / `ListFeeds` / `GetReport` / `ListReports`	Bulk/feed-style report operations.	No.

`SearchItems` request fields (most relevant subset)

Field	Type	Use here
`keywords`	`string`	Receives `query` plus joined `keywords[]` from the BFF.
`title`, `brand`, `author`, `actor`, `artist`	`string`	Typed restrictors. Not surfaced in v1; could replace generic keyword joining later.
`searchIndex`	`string` enum	Single-category restrictor (e.g. `All`, `HomeGarden`, `OfficeProducts`). One of two ways to honor `categories[]`.
`browseNodeId`	`string`	Numeric browse-node id. The narrower category restrictor; one node per request.
`condition`	enum: `Any`, `New` (SDK type is sparse; Amazon also exposes `Used`, `Refurbished`, `Collectible`)	Not used in v1.
`availability`	enum: `Available`, `IncludeOutOfStock`	Not used in v1 (defaults to `Available`).
`minPrice`, `maxPrice`	`number`	Not used in v1.
`minReviewsRating`, `minSavingPercent`	`number`	Not used in v1.
`deliveryFlags`	array of `AmazonGlobal` \| `FreeShipping` \| `FulfilledByAmazon` \| `Prime`	Receives `["Prime"]` when `primeOnly: true`.
`sortBy`	enum: `Featured`, `Relevance`, `AvgCustomerReviews`, `NewestArrivals`, `Price:LowToHigh`, `Price:HighToLow`	`relevance` → `Relevance`; `price-low-to-high` → `Price:LowToHigh`.
`itemCount`	`number`	`MAX_RESULTS`.
`itemPage`	`number`	Not used in v1 (first page only).
`currencyOfPreference`, `languagesOfPreference`	locale prefs	Not used in v1.
`partnerTag`	`string`	`AMAZON_ASSOCIATE_TAG`.
`resources`	array of `SearchItemsResource`	Selector for which `Item.*` fields the response includes — `images.primary.large`, `itemInfo.title`, `offersV2.listings.price`, etc. The client requests the same selector the existing `GetItems` path uses so each search hit is already a full `AmazonImportDto`.
`properties`	`Record<string,string>`	Advanced/experimental; not used.

`SearchItems` response (highlights)

Field	Notes
`searchResult.items[]`	Each item is a full `Item` resource shaped by the request’s `resources` selector.
`searchResult.totalResultCount`	”Total results” hint surfaced to the BFF as `totalResultsHint`.
`searchResult.searchRefinements`	Facet bins (`browseNode`, `searchIndex`, `otherRefinements[]`). Not surfaced in v1.
`errors[]`	Structured Amazon error envelope; mapped to the BFF’s `AMAZON_API_ERROR`.

`Keywords` syntax and “smart query” support

Amazon’s PA-API / Creators API documentation is intentionally sparse about what the Keywords parameter understands. The verified behaviours, based on Amazon’s own use-case docs and confirmed in vendor commentary:

Form	Behaviour	How we use it
Plain space-separated tokens	AND-style narrowing (proprietary, not formally documented).	Default join for `query` and `keywords[]`.
Pipe `\|` separator between identifier-shaped tokens, with `SearchIndex=All` and `ItemInfo.ExternalIds` in `resources`	Bulk lookup by external identifier (UPC, EAN, ISBN). Amazon may return adjacent products; the caller must filter the response by exact match against `ItemInfo.ExternalIds.{EANs,UPCs,ISBNs}.DisplayValues`.	See “Smart identifier handling” below.
Boolean operators (`AND`, `OR`, `NOT`), quoted phrases, minus exclusion	Not officially supported. Amazon’s site search has its own syntax but `Keywords` on the API does not promise to honour any of it.	We do not advertise, document, or interpret these. Whatever the user types is passed through; behaviour is whatever Amazon chooses to do.
Typed restrictors (`Title`, `Brand`, `Author`, `Actor`, `Artist` as separate `SearchItemsRequestContent` fields)	Stronger and more reliable than AND-stuffing into `Keywords`.	Not exposed in v1. A natural v2 lever once the BFF surface stabilises.

Smart identifier handling

When the cleaned query (after the filter pipeline and ASIN dispatcher) is a list of tokens all of which match one of the patterns below, the client switches to identifier mode: joins the tokens with |, sets SearchIndex=All, and adds ItemInfo.ExternalIds to the resource selector. The response is filtered to keep only items whose ItemInfo.ExternalIds echoes one of the input identifiers.

Pattern	Format
UPC-A	`^\d{12}$`
EAN-13 (incl. ISBN-13 `97[89]\d{10}`)	`^\d{13}$`
EAN-8	`^\d{8}$`
ISBN-10	`^\d{9}[\dX]$`

This buys cheap “scan-a-list-of-barcodes” import through the same route without a separate endpoint. Mixed input (some identifier-shaped, some not) falls through to plain keyword search.

Sub-bits worth pinning before implementation

Conditional resource selector. The original spec assumed identifier mode would request additional fine-grained ItemInfo.ExternalIds.{EANs,UPCs,ISBNs}.DisplayValues paths that the default mode would omit, to keep payload small for plain keyword searches. Implementation found this is not possible at the SDK level: the vendored amazon-creators-api@1.2.2 SearchItemsResource / GetItemsResource enums only expose 'itemInfo.externalIds' as a single key that returns all external-id sub-fields together. There is no way to ask for EANs but not UPCs, or to omit external IDs entirely once any sub-resource is requested.

Furthermore, the existing V1_RESOURCES constant used by getItems already contains 'itemInfo.externalIds' (because the current AmazonImportDto carries the first UPC). So:

The “default” mode resource selector and the “identifier” mode resource selector are functionally identical in v1.
The resourcesForMode(mode: SearchMode) abstraction is still kept (in src/server/lib/amazon/creators-client.ts) so a future SDK revision that exposes finer-grained resource paths can be adopted by changing only the helper. SearchMode = "default" | "identifier". The function de-duplicates internally.
There is no payload-size win from mode-discrimination in v1. The spec’s earlier reasoning about “avoid response bloat” does not apply.

Response filtering. Amazon’s docs explicitly say bulk-identifier search “may return adjacent products”; callers are expected to filter the response against the input identifiers. The filter is a pure function:

filterByExternalIds(
  items: SearchItem[],
  inputIds: ReadonlyArray<string>,
): SearchItem[]

Match an input id against an item by checking, in order: ItemInfo.ExternalIds.EANs.DisplayValues, ItemInfo.ExternalIds.UPCs.DisplayValues, ItemInfo.ExternalIds.ISBNs.DisplayValues. If any input id appears in any of those DisplayValues arrays for that item, the item is kept; otherwise it is dropped. Comparison is exact-string (Amazon returns identifiers in canonical form; no normalisation needed beyond the defensive filter that ran upstream).

Deduplication. A single product can carry several external identifiers (e.g. a book with both ISBN-10 and ISBN-13; a product with multiple UPC barcodes across regional variants). Filtering by exact match against multiple input ids can yield the same item more than once if both ids happen to point to the same ASIN. Dedup the filtered list by ASIN, preserving the first occurrence.

Output ordering. Items are returned in Amazon’s response order with non-matching items removed and ASIN-duplicates collapsed. The route does not attempt to re-order results to match the input id ordering; doing so adds complexity without a real consumer in v1. Document the order in the route contract so callers do not accidentally rely on input-order alignment.

Edge cases for tests.

Pure-identifier list of 3 UPCs, Amazon returns exactly 3 items — pass-through.
Pure-identifier list, Amazon returns 5 items where 2 are “adjacent products” — filter drops the 2 noise items.
Pure-identifier list, none of Amazon’s response items echo any input id — items: [] (still a 200, not an error).
Mixed token list (one UPC + one keyword) — falls through to plain keyword search, no identifier mode triggered.
One ASIN + one UPC — neither shortcut fires; falls through to plain keyword search.
Single ISBN-10 with trailing X — pattern matches, identifier mode triggers with a single-element pipe-list.

Operational characteristics

Aspect	Note
Auth	OAuth2 client-credentials. The `creatorsClient` already obtains and rotates tokens transparently.
Marketplace	Selected via the `X-Marketplace` header on every call. US only for this project.
Rate limits	TPS and TPD quotas shared across `GetItems`, `SearchItems`, `GetBrowseNodes`, `GetVariations`. Throttled responses surface as `ThrottleExceptionResponseContent`; the BFF maps to `AMAZON_API_ERROR`.
Failure modes	`AccessDeniedException`, `UnauthorizedException`, `ValidationException`, `InternalServerException`, `ResourceNotFoundException`, `ThrottleException` (SDK type names). All collapse to `AMAZON_API_ERROR` at the BFF for v1; finer-grained mapping is a future extension.
PA-API sunset	The legacy PA-API 5.0 product surface is deprecated on 2026-05-15; the Creators API (used here via `amazon-creators-api@1.2.2`) is the supported successor. Confirmed in Amazon’s own docs and external write-ups. The `Keywords` semantics described above carry over unchanged.

Open questions

ID	Question	Status
Q1	Multi-category requests — Amazon allows only one `searchIndex` or one `browseNodeId` per `SearchItems` call. When `categories[]` has more than one entry, how should the client behave?	Resolved — option (a). The client uses the first entry as the Amazon category restrictor (`searchIndex` or `browseNodeId`) and appends the remaining entries to the `keywords` string as additional restrictive terms.
Q2	`categories[]` resolution source — static enum-matching table only, or also dynamic `GetBrowseNodes` lookup with an in-memory cache?	Resolved — static table for v1, hidden behind a `resolveCategory(label: string): { searchIndex?: string; browseNodeId?: string } \| null` function so the implementation can be swapped for a dynamic `GetBrowseNodes`-backed resolver later without a contract or call-site change.
Q3	Should empty `categories[]` / `keywords[]` arrays be treated as “field absent”, or rejected as malformed?	Resolved — entries that go empty after the defensive filter are dropped silently; if the array becomes empty it is treated as absent. Caller-side empty arrays are equivalent to omitting the field. See “Search Input Processing — Defensive input filter”.
Q4	Concrete values for the server-side constants.	Resolved — values fixed (see Constants at the top of “Search Input Processing”). The constants remain server-side names so they can be tuned without a contract change.
Q5	Zero-result augmentation policy beyond silent Amazon-side relaxation.	Resolved — out of scope for v1. Neither LLM-driven suggestions nor external web-search fallbacks ship in PDEV-457. The deferred LLM-suggestion feature is tracked under PDEV-569. External web-search providers are not currently planned.
Q6	LLM provider for suggestion generation.	Resolved — out of scope for v1. Decision tracked in PDEV-569 if and when that issue is picked up.
Q7	Should the server silently re-run `SearchItems` against generated suggestions before responding?	Resolved — out of scope for v1. No suggestion generation in v1, so no re-search path exists. The constraint (no server-side re-search) is preserved in PDEV-569 for the deferred feature.