Design Session 01: Upload Product Images and Managed File Assets

Purpose

Goal: Resolve large design questions before deciding on Phasing, initial scope and detailed design.

Design exploration and decision dialog for the file/asset upload capability, covering S3 bucket architecture, access control, CDN integration, API design, and the first use case (product images in the Item module).

Decisions from this session are summarized in decision-log.md.

This session focuses on the system-level storage architecture and security model. These are foundational decisions that constrain all subsequent design work.

Context: Two Input Documents

This round synthesizes two perspectives:

project-description.md (Design Exploration section) — establishes system-level principles for bulk storage: objects are always owned by business entities, immutable, never used as shared global state. Classifies storage along three axes: Http Accessible vs. Internal Use, Externally Uploaded vs. Internally Sourced, Durable vs. Ephemeral.
dna-work-in-progress.md (Denis’s FileStore design) — proposes a Lambda-based FileStore service behind API Gateway + Cognito, with per-tenant namespace partitioning, presigned URLs for both reads and writes, and OAC for CloudFront. The storage backend (number and organization of buckets) is transparent to API consumers.

Context: Existing Infrastructure

A partition-level upload bucket already exists with 1-day TTL, used for CSV uploads. Created by BulkStoresStack in CDK.
An API CloudFront distribution (ApiCloudFront construct) already exists for API Gateway, but it is API-specific (no caching, passes all methods through). There is no existing S3-origin CloudFront construct.
The UploadBucket CDK construct is parameterized (name, expirationDays) and reusable for creating additional buckets.

DQ-001: S3 Bucket Organization

Context

The system needs bulk storage for multiple purposes with different lifecycle and access characteristics. The current infrastructure has a single ephemeral upload bucket per partition. The design exploration in project-description.md identifies three classification axes that create distinct storage profiles.

Storage Profiles Identified

Profile	Access	Lifecycle	Example
Http Assets	CDN/public	Durable	Product images, logos
Internal Durable	Backend IAM	Durable	Generated reports, archives
Ephemeral Upload	Backend IAM	Short TTL	CSV uploads for processing
Ephemeral Download	CDN/public	Short TTL	CSV downloads, export files

Option	Description	Trade-offs
A. Single bucket, prefix-partitioned	One bucket per partition. Prefixes like `http-assets/`, `ephemeral/`, `internal/` differentiate profiles. Prefix-based lifecycle rules handle TTL differences.	Pro: Simplest to manage, single IAM boundary, single presigning role. S3 supports up to 1,000 lifecycle rules per bucket with prefix scoping. Con: CloudFront OAC applies to the entire origin (bucket), so all content in the bucket is accessible via CloudFront — must rely on CloudFront behaviors or signed URLs/cookies for access differentiation. Mixes durable and ephemeral content operationally.
B. Two buckets by lifecycle	`http-assets` bucket (durable, CDN-fronted, versioned) and `ephemeral` bucket (current, short TTL, no CDN). Add more buckets only when a genuinely new profile emerges.	Pro: Clean separation of CDN-fronted vs internal content. OAC on http-assets bucket means CloudFront can only reach durable assets. Ephemeral bucket retains current behavior unchanged. Each bucket has its own IAM and lifecycle boundary. Con: Two presigning roles, two sets of IAM policies, two sets of CloudFormation exports. Modest operational overhead.
C. Three+ buckets by profile	One bucket per storage profile from the table above.	Pro: Maximum isolation. Each bucket has exactly one lifecycle policy, one access pattern. Cleanest CloudFront OAC scoping. Con: More infrastructure to manage. The current system has one bucket; jumping to 3-4 adds CDK complexity, more CloudFormation exports, more IAM roles. Some profiles (Internal Durable) may not be needed yet.
D. Per-component buckets	Each microservice (operations, accounts, …) gets its own bucket(s).	Pro: Maximum blast-radius isolation between services. Con: Violates the principle that bucket organization should be transparent to API consumers (per Denis’s design). Significant infrastructure proliferation. Components share the same partition and same CDN — per-component buckets create CDN routing complexity.

Recommendation

Option B — two buckets by lifecycle. It gives the clean separation needed for CDN (OAC only on the http-assets bucket), preserves the existing ephemeral bucket unchanged, and aligns with the principle that bucket internals are transparent to API consumers. A third bucket (internal durable) can be added later if a concrete use case demands it. Option A is viable but conflates CDN-accessible and internal-only content behind the same OAC boundary. Options C and D over-engineer for current needs.

Discussion

Q: With Option B, internal files would be in an HTTP accessible bucket, is it possible to restrict access to certain prefixes or objects?

A: With Option B, the question is moot for the current set of use cases because “Internal Durable” files do not exist yet — the two buckets are http-assets (durable, CDN-fronted) and ephemeral (current bucket, short TTL, backend-only). Internal durable files have no bucket to land in, which is by design: a third bucket is added only when that profile materializes.

If in the future an internal-only durable file needs to coexist in the http-assets bucket (to avoid a third bucket), access restriction is possible at two levels:

CloudFront bucket policy with prefix conditions: The OAC bucket policy can restrict s3:GetObject to specific prefixes. For example, grant CloudFront access only to objects under {tenant-id}/product-images/* and {tenant-id}/user-profiles/*, while denying access to {tenant-id}/internal/*. CloudFront requests to internal prefixes would get a 403 from S3 directly.
```
{
  "Effect": "Allow",
  "Principal": { "Service": "cloudfront.amazonaws.com" },
  "Action": "s3:GetObject",
  "Resource": "arn:aws:s3:::bucket-name/*/product-images/*",
  "Condition": {
    "StringEquals": {
      "AWS:SourceArn": "arn:aws:cloudfront::ACCOUNT:distribution/DIST-ID"
    }
  }
}
```
Objects under prefixes not covered by the policy are invisible to CloudFront even though they are in the same bucket.
CloudFront cache behaviors: A behavior matching /internal/* could return a static 403 response (using a CloudFront Function) or simply not be configured, causing the default behavior to apply. Since the default behavior would be the S3 origin with OAC, the bucket policy restriction in (1) is the more reliable guard.

Bottom line: Option B does not force internal files into the CDN-accessible bucket. If it ever becomes necessary, prefix-scoped bucket policies provide effective access restriction within a single bucket. But the cleaner path is to add a third bucket when the “Internal Durable” profile has a concrete use case.

Decision

Option B with a partition bucket per profile (2 initially: http-assets & ephemeral-uploads)
- Bucket installed at Partition Configuration time.
Object Key Format to enable policy-based access control: ${tenantId}/${owning-module}/${entity-type}/${property-name}/${asset-uuid}.${extension}
This project needs to modify the infrastructure repository to support the new bucket as part of the partition configuration.

Analysis

Key format supersedes DQ-004. The key structure decided here (${tenantId}/${owning-module}/${entity-type}/${property-name}/${asset-uuid}.${extension}) is more granular than the DQ-004 recommendation ({tenant-id}/{feature}/{uuid}.{ext}), replacing the single {feature} segment with three domain-aware segments. DQ-004 should be marked as decided by this decision.

Domain model coupling. The segments {owning-module}/{entity-type}/{property-name} couple the S3 key structure to the domain model. If an entity type is renamed or a property moves, the key path changes. However, under the immutability principle (DQ-008), this coupling is low-risk: old objects with old key paths remain valid and accessible — only new uploads use the new path. The entity reference (e.g., Item.imageUrl) stores the full key, so renames do not break existing references.

Signed cookie scoping alignment. The tenant-first key structure aligns with the DQ-002 signed cookie approach: cookies scoped to /{tenant-id}/* cover all objects under the tenant prefix regardless of the deeper segments. The additional granularity is invisible to the cookie validation layer.

Applied To

DQ-002: Read-Path Access Control Model

Context

Http-accessible assets must be served directly to HTTP clients (browsers) without backend proxying. Access needs to be partitioned by tenant. The question is what level of security is appropriate and how it interacts with CDN caching.

Key AWS constraint: CloudFront Functions cannot verify RS256/ES256 JWT signatures (the crypto module only supports HMAC). Lambda@Edge can verify JWTs but adds latency (~5ms cold, sub-ms warm) and cost per invocation.

Denis’s FileStore design uses API Gateway + Cognito + Lambda to issue presigned GET URLs (302 redirect). This provides full tenant auth but means every image load requires a Lambda invocation and produces a unique presigned URL that defeats CDN caching.

Option	Description	Trade-offs
A. Unguessable keys, no auth	Object keys include UUID components making them unguessable. CloudFront serves content without authentication. Anyone with the URL can access the asset.	Pro: Simplest. Full CDN caching. No Lambda or edge functions needed. Zero latency overhead. Works identically in local dev and production. Con: No tenant isolation at access time — URL sharing or logging exposes content. Acceptable only if assets are not sensitive.
B. CloudFront signed cookies (tenant-scoped)	Backend issues signed cookies scoped to `/{tenant-id}/*` path prefix on login. CloudFront validates cookies at the edge natively (no Lambda). Cookies refreshed periodically.	Pro: Tenant-level isolation. Full CDN caching (cookie is not part of cache key). Sub-millisecond edge verification. One cookie set per tenant session covers all assets. Con: Requires RSA/ECDSA key pair management (trusted key groups). Cookie scoping requires tenant-id in the URL path. Browser must send cookies with image requests (same-site/cross-origin considerations). Adds complexity to the auth flow.
C. Lambda@Edge JWT validation	Lambda@Edge on viewer-request validates the JWT from the Authorization header or a cookie, extracts tenant-id, and validates against the request path prefix.	Pro: Full tenant isolation with standard JWT tokens. Reuses existing Cognito token infrastructure. Can enforce fine-grained policies (e.g., per-module access). Con: Lambda@Edge invoked on every cache miss (viewer-request trigger). Cold start latency (~5s for first invocation). Must cache JWKS keys. Adds per-request cost. More complex to deploy and test (must be in us-east-1). Cannot be tested with LocalStack.
D. FileStore Lambda (Denis’s design)	API Gateway route `/assets/{key}` -> Cognito auth -> Lambda -> 302 redirect to S3 presigned GET URL. Full tenant validation in Lambda.	Pro: Full tenant isolation. Reuses existing API Gateway + Cognito infrastructure. Lambda has full AWS SDK access for authorization logic. Con: Every image load is: browser -> CloudFront -> API Gateway -> Lambda -> S3 presign -> 302 -> browser -> S3 GET. No CDN caching of the redirect (each presigned URL is unique). Higher latency for image loads. Lambda cost per image request. Effectively makes the CDN useless for read-path caching of assets.

Recommendation

Option B (signed cookies) as the target architecture, with Option A as an acceptable Phase 1 shortcut if product images are not considered sensitive. Signed cookies provide tenant isolation with full CDN caching — the best balance of security and performance. Option D is architecturally sound but sacrifices the primary benefit of a CDN (caching). Option C adds too much edge complexity for this use case.

Note: Options B, C, and D are not mutually exclusive with the FileStore concept. The FileStore can be the write-path authority (presigned PUT URLs) while the read path uses signed cookies directly from CloudFront.

Discussion

Q: If the tenant id is in the header, does Option B still work?

A: No — not directly. CloudFront signed cookies validate against the URL path, not request headers. The signed cookie’s custom policy contains a Resource field like https://assets.example.com/{tenant-id}/*, and CloudFront matches it against the request URL. If the tenant-id is only in an X-Tenant-ID header, CloudFront has no way to enforce it.

This means the tenant-id must be part of the URL path for Option B to work. This is already reflected in the DQ-004 recommendation (tenant-first key structure: {tenant-id}/{feature}/{uuid}.{ext}). When CloudFront serves assets, the URL would be https://assets.arda.cards/{tenant-id}/product-images/{uuid}.png, and the signed cookie scopes to https://assets.arda.cards/{tenant-id}/*.

The X-Tenant-ID header remains relevant for the write path (API calls to the backend for presigned PUT URLs), but the read path (browser loading images from CloudFront) relies on the tenant-id embedded in the URL path. This is a natural consequence of the key structure — the S3 object key becomes the CloudFront URL path.

Q: How does it work if the user has logged in as multiple tenants in one browser session?

A: CloudFront signed cookies use three fixed cookie names: CloudFront-Policy, CloudFront-Signature, CloudFront-Key-Pair-Id. The browser can hold multiple cookies with the same name if they have different Path attributes. The approach for multi-tenant sessions:

Set cookies with tenant-scoped Path: When the user accesses tenant A, the backend issues signed cookies with Path=/{tenant-a-id}/. When the user switches to tenant B, issue a second set with Path=/{tenant-b-id}/.
Browser sends the right cookies per request: For a request to /{tenant-a-id}/product-images/img.png, the browser sends only the cookies whose Path matches /{tenant-a-id}/. The tenant-B cookies are not sent because their Path does not match.
CloudFront sees exactly one set: Since only the matching tenant’s cookies are sent, CloudFront validates them normally against the policy’s Resource field.

Constraints and caveats:

This relies on the browser’s standard cookie path-matching behavior (RFC 6265). All modern browsers implement this correctly.
If a user has access to N tenants, they accumulate N sets of 3 cookies each. This is not a scalability concern for reasonable tenant counts.
Cookie expiration should be short (1-8 hours) with refresh on tenant switch, to limit the window of exposure.
The Domain attribute on the cookies must match the asset distribution domain (e.g., assets.arda.cards). If the SPA is on a different domain (app.arda.cards), the cookies must be set with Domain=.arda.cards or the asset domain, and SameSite=None; Secure to allow cross-origin sending.

Alternative: If multi-tenant cookie management proves too complex for Phase 1, Option A (unguessable keys, no auth) avoids the issue entirely. The multi-tenant cookie behavior can be deferred until signed cookies are actually implemented.

Q: In Option B scenario, who sets the cookies? The SPA or the browser based on http responses?

A: The backend sets the cookies via HTTP response headers — the SPA does not construct or manage them directly. The flow:

The SPA calls a backend endpoint (e.g., POST /auth/asset-cookies or as part of the login/tenant-switch flow) over the normal API path (API Gateway + Cognito JWT).
The backend validates the JWT, extracts the tenant-id, and generates three CloudFront signed cookies using the CloudFront private key:
- CloudFront-Policy (base64-encoded JSON policy with Resource = https://assets.arda.cards/{tenant-id}/* and expiration time)
- CloudFront-Signature (RSA signature of the policy)
- CloudFront-Key-Pair-Id (identifies which public key CloudFront should use)
The backend returns these as Set-Cookie headers with:
- Path=/{tenant-id}/ (scopes the cookie to that tenant’s URL prefix)
- Domain=.arda.cards (or the asset distribution domain)
- Secure; HttpOnly; SameSite=None (cross-origin, HTTPS-only, not accessible to JavaScript)
The browser stores the cookies automatically and sends them on subsequent requests to assets.arda.cards/{tenant-id}/* — no SPA code needed for cookie management.

The SPA’s only responsibility is to trigger the cookie issuance (e.g., on login or tenant switch) and to use the correct asset URLs that match the cookie path. The HttpOnly flag means the SPA cannot read or modify the cookies, which is a security benefit.

Refresh: The backend sets a reasonable expiration (e.g., 4-8 hours). The SPA can proactively refresh by calling the cookie endpoint before expiration, or the backend can issue fresh cookies piggy-backed on any authenticated API response via Set-Cookie headers.

Q: Does the Back-End in this context mean the BFF that controls the User Interaction?

A: Yes — in the Arda architecture, the “backend” that sets cookies is the BFF (Backend-for-Frontend), specifically the Next.js proxy layer (proxy.ts in arda-frontend-app). This is the natural place because:

The BFF already manages the user session. It holds the Cognito JWT and the tenant context (X-Tenant-ID). It is the component that knows which tenant the user is currently operating as.
The BFF can set cookies on the correct domain. Since the BFF serves the SPA (e.g., app.arda.cards), it can set Set-Cookie headers with Domain=.arda.cards that the browser will send to assets.arda.cards. A backend microservice behind API Gateway cannot easily set cookies on the asset domain because the response goes through API Gateway, which does not relay Set-Cookie headers transparently.
The cookie issuance is a UI concern, not a business logic concern. The microservices (operations, etc.) should not know about CloudFront cookie mechanics. The BFF is the boundary between the user’s browser session and the backend services.

Implementation path: The BFF would have a route (e.g., /api/asset-cookies) that:

Validates the user’s session (Cognito JWT).
Calls an internal endpoint or uses the AWS SDK directly to generate the three CloudFront signed cookies for the current tenant.
Returns them as Set-Cookie headers.

Alternatively, the BFF could issue cookies as a side-effect of the tenant-switch or login flow, avoiding a separate round-trip.

Key implication for DQ-003: This reinforces Option C (shared library in common-module). The CloudFront cookie-signing logic would live in the BFF, not in the operations microservice. The operations microservice only handles the write path (presigned POST for uploads). The read path is entirely BFF + CloudFront — no microservice involvement.

Q: The BFF in Arda’s system is a Next.js application in the arda-frontend-app repository that runs as an Amplify App in AWS. What are the implications of this for Option B?

A: Running the BFF as an AWS Amplify Hosting application has several concrete implications for signed cookie issuance:

What works well:

AWS SDK access in server-side routes. Next.js API routes and server actions in Amplify run as Lambda functions with full Node.js runtime. The @aws-sdk/cloudfront-signer package provides getSignedCookies() — the BFF can call this directly to generate the three signed cookies. No external service call required; signing is a local cryptographic operation using the private key.
Full response header control. Next.js API routes can set arbitrary Set-Cookie headers. Amplify’s managed CloudFront distribution preserves response headers from the origin (the Lambda function), so signed cookies flow through to the browser intact.
IAM permissions are not needed for signing. CloudFront cookie signing uses an RSA private key directly (not an AWS API call). The Amplify execution role does not need any CloudFront-related IAM permissions for this operation.

Constraints to address:

Private key storage. The CloudFront RSA private key must be accessible to the BFF at runtime. Options:
- Amplify environment variables (encrypted at rest) — simplest, but environment variables have a 4 KB limit. RSA 2048-bit PEM keys are ~1.7 KB, so this fits.
- AWS Secrets Manager — more secure, supports rotation, but adds a runtime API call on cold start (~50ms). The BFF would cache the key in memory after first fetch.
- SSM Parameter Store (SecureString) — similar to Secrets Manager, lower cost.
Recommendation: Amplify environment variable for MVP simplicity. Migrate to Secrets Manager if key rotation becomes a requirement.
Cookie domain alignment. For cross-subdomain cookies to work:
- Amplify app: app.arda.cards (or similar)
- Asset distribution: assets.arda.cards
- Cookie Domain=.arda.cards — both subdomains share the parent domain
This works if both subdomains are under the same registrable domain. The Amplify app’s route sets Set-Cookie with Domain=.arda.cards, and the browser sends those cookies on requests to assets.arda.cards. No cross-origin issues because the cookie domain covers both.
Two separate CloudFront distributions. Amplify creates a managed CloudFront distribution for the app. The asset bucket needs its own separate distribution (per DQ-006 Option B). This is fine — they are independent and have different subdomains. The Amplify-managed distribution cannot be extended with custom S3 origins (Amplify controls its configuration), which is another reason DQ-006 Option B (separate distribution) is the right choice.
proxy.ts and cookie issuance path. The current proxy.ts (Node.js runtime, replaces middleware.ts) handles request proxying to API Gateway. Cookie issuance could be:
- A dedicated API route (e.g., app/api/asset-cookies/route.ts) — cleanest separation.
- Integrated into proxy.ts as a side-effect on responses that include tenant context — avoids a separate round-trip but adds complexity to the proxy.
- Issued during the authentication/tenant-switch flow — piggy-backed on an existing interaction.

Net assessment: Amplify Hosting is fully compatible with Option B. The BFF has the runtime capabilities needed (AWS SDK, response headers, secret access). The main new artifact is the CloudFront key pair (public key registered with CloudFront, private key stored in Amplify environment variable) and a server- side route for cookie issuance. This is a small addition to the BFF, not a structural change.

Q: How would Option B work with a non-browser client?

A: It depends on the client type and its HTTP cookie support:

Clients with cookie jar support (e.g., mobile apps using URLSession/OkHttp, curl --cookie-jar, Python requests.Session): These work identically to browsers. The client calls the cookie endpoint, receives Set-Cookie headers, stores them in its cookie jar, and automatically attaches them to subsequent requests matching the Path and Domain.

Clients without cookie support (e.g., simple HTTP clients, wget, IoT devices, server-to-server): Option B does not work natively. Alternatives:

CloudFront signed URLs (per-request, instead of cookies): The backend generates a signed URL for each specific asset. The client uses the signed URL directly. This is Option B’s fallback — CloudFront supports both signed cookies and signed URLs with the same key pair. The backend can offer both: cookies for browser clients, signed URLs for API clients.
Presigned S3 GET URLs (bypass CloudFront): For server-to-server access, the backend generates presigned S3 GET URLs directly. The client fetches from S3 without going through CloudFront. This is Denis’s Option D approach, appropriate for trusted backend clients that don’t need CDN caching.
Header-based auth via Lambda@Edge: Option C from the options table — the client sends a JWT in the Authorization header. This works for any HTTP client but adds edge compute cost.

Practical impact: For the MVP (product images displayed in the SPA), all clients are browsers. Non-browser clients (e.g., mobile apps, API integrations) are a future concern, and the signed URL fallback handles them without architectural changes. The backend can expose both cookie and signed-URL endpoints behind the same key pair infrastructure.

Decision

Initial Implementation: Option A as the asset path has two hard to guess components (tenantId, assetId)
Enhanced Security Implementation: Option B with signed cookies (disable Option A at this time)
CDN Usage:
- Separate CloudFront configuration for Option A and Option B.
Next.js/Amplify BFF
- Secret Manager to store CloudFront key pair (pattern for other parts of the system)
Scope Extension: arda-frontend-app needs to be modified as part of this project.

Analysis

Two distributions, not one reconfigured. The decision calls for separate CloudFront distributions for Option A and Option B (not a single distribution reconfigured in-place). This means:

Phase 1 (Option A): a distribution with no viewer access restriction. OAC handles S3 origin access, but any request with the correct URL reaches the asset. DNS points assets.arda.cards to this distribution.
Phase 2 (Option B): a new distribution requiring signed cookies (Restrict Viewer Access + Trusted Key Groups). DNS swaps assets.arda.cards to the new distribution. The old distribution can be decommissioned.
The two-distribution approach avoids a reconfiguration that would break in-flight requests during the transition. The DNS swap provides a clean cutover with rollback capability (point DNS back to the old distribution).

“Disable Option A at this time” clarification. “At this time” means at the time Option B (signed cookies) is introduced — not immediately. During Phase 1, Option A is the active and only access model. When signed cookies are deployed, Option A’s distribution is replaced by Option B’s distribution, and unguarded access ceases.

Secrets Manager establishes a pattern. Choosing Secrets Manager over Amplify environment variables for the CloudFront key pair is slightly more complex but sets a precedent for other secrets the system may need. This is a Phase 2 artifact — Phase 1 (Option A) does not need the key pair at all.

Scope extension phasing. arda-frontend-app modifications are Phase 2 only (BFF cookie issuance route via @aws-sdk/cloudfront-signer). Phase 1 requires no BFF changes — the SPA renders <img src="https://assets.arda.cards/..."> directly. The project plan should reflect which repository changes belong to which phase to avoid pulling arda-frontend-app into Phase 1 scope.

Applied To

DQ-003: FileStore as Lambda vs. Component Service

Context

Denis’s design proposes the FileStore as an AWS Lambda behind API Gateway. The existing system has backend components running on EKS. The FileStore’s responsibilities are: (1) authorize the request, (2) map the key to the tenant namespace, (3) generate presigned URLs. These are lightweight stateless operations.

Option	Description	Trade-offs
A. Lambda behind API Gateway	Dedicated Lambda function handles `/assets/*` routes. Cognito authorizer on API Gateway validates JWT before Lambda invocation. Lambda maps tenant-id + key to S3 object, generates presigned URLs.	Pro: Aligns with Denis’s design. Scales to zero when not in use. No EKS pod overhead. Natural fit for the stateless presign operation. Clean separation from business logic. Con: New deployment artifact (Lambda). Different CI/CD pipeline than Kotlin services. Cannot share `common-module` Kotlin code (unless compiled to native/JVM Lambda). Cold start latency for JVM Lambdas (~1-3s).
B. EKS component service (Kotlin/Ktor)	FileStore is a regular Ktor module registered in a component (possibly `operations` initially, or a new `filestore` component). Uses the same `common-module` abstractions and deployment pipeline.	Pro: Reuses existing deployment infrastructure (Helm, EKS). Can share `common-module` S3 abstractions directly. Same language and testing patterns. No cold starts. Trusted components can call it directly (in-cluster). Con: Always-running pod even when idle. Tighter coupling to component lifecycle. Adds an HTTP hop if other components need to call it.
C. Shared library in common-module only	No separate service. Each component that needs file operations embeds the presigned URL logic directly (via `common-module` abstraction). The write path lives in the component’s own endpoints.	Pro: Simplest deployment — no new service. Each component controls its own upload/download routes. Aligns with how CSV upload already works (embedded in operations). Con: No centralized access control for the storage backend. Each component must independently implement tenant namespace mapping and authorization. Harder to enforce consistent key structure across components.

Recommendation

Option C for Phase 1 (product images) combined with Option A as a future target. For the immediate use case, product image upload is an operations-module concern. Embedding the presigned URL logic in common-module (as a generalized S3 access abstraction) and wiring it through the Item module endpoints is the simplest path. This does not preclude extracting a FileStore Lambda later when multiple components need file operations — the common-module abstraction becomes the shared logic that the Lambda wraps.

Denis’s Lambda design (Option A) is the right long-term architecture, but building it now would require standing up a new Lambda deployment pipeline before we have a second use case to justify it.

Decision

Option C, will re-evaluate Option A at a later time once the upload use-cases are better understood.

Design Note: Key Structure can be enforced by implementing a strongly typed Storage Access class parametrized with the module name, entity name and property name. Class instantiation is at bootstrap time from Configuration parameters. Runtime Object Access only needs tenantId, object key and extension (or derived from property type metadata if available). TenantId is available at runtime in the ApplicationContext so there is no need to pass it as a parameter, keeping the Storage Access interface simple and clean.

Applied To

DQ-004: Object Key Structure

Context

Object keys must support tenant isolation, feature namespacing, CloudFront path-pattern routing, and collision avoidance. The key structure also determines CloudFront cache behavior configuration and signed cookie scoping.

Per the design exploration principles: objects are owned by business entities, accessed through the owning entity, and considered immutable.

Option	Description	Trade-offs
A. Tenant-first	`{tenant-id}/{feature}/{uuid}.{ext}` e.g., `a1b2c3d4/.../product-images/f47ac10b.png`	Pro: Natural for IAM prefix-based policies scoping access to a tenant. Signed cookies can scope to `/{tenant-id}/`. Con*: CloudFront cache behaviors cannot route by tenant (too many tenants for static behavior rules). Feature-based CloudFront routing requires wildcards.
B. Feature-first	`{feature}/{tenant-id}/{uuid}.{ext}` e.g., `product-images/a1b2c3d4/.../f47ac10b.png`	Pro: CloudFront path patterns can route by feature (e.g., `/product-images/`). Lifecycle rules can target features if needed. Con: Signed cookies scoped to `/{feature}/{tenant-id}/` are more complex. IAM prefix policies for tenant isolation require conditions with wildcards.
C. Flat with UUID	`{uuid}.{ext}` All context in metadata only.	Pro: Simplest keys. Maximum collision avoidance. Con: No structural tenant isolation. No prefix-based lifecycle or IAM. Requires metadata lookup for any context. Debugging is harder.

Recommendation

Option A — tenant-first. Tenant isolation is the primary organizational concern. Signed cookies scope naturally to /{tenant-id}/*. IAM policies can restrict to a tenant prefix. CloudFront path-pattern routing is less important for http assets (they all use the same cache behavior: serve from S3 with OAC). The {feature} segment provides logical grouping for debugging and future lifecycle differentiation without affecting routing.

Full key template: {tenant-id}/{feature}/{uuid}.{ext}

Where:

{tenant-id} is the tenant UUID (from X-Tenant-ID header).
{feature} is a static string identifying the use case (e.g., product-images, user-profiles).
{uuid} is a server-generated UUID for collision avoidance.
{ext} is the file extension derived from the declared content type.

Decision

Resolved by DQ-001. The DQ-001 decision specifies the key format as ${tenantId}/${owning-module}/${entity-type}/${property-name}/${asset-uuid}.${extension}, which is a more granular variant of Option A (tenant-first). See the DQ-001 Analysis for trade-off notes on domain model coupling and signed cookie scoping alignment.

Applied To

DQ-005: Upload Workflow (Write Path)

Context

The upload workflow must: (1) authenticate the user, (2) authorize the upload for the tenant, (3) generate a presigned PUT URL with appropriate constraints, (4) allow the client to upload directly to S3, (5) link the uploaded object to the business entity. Denis’s design describes steps 1-4 clearly. Step 5 is done by the client calling the Arda Component to persist the key.

The design exploration principle states that objects are immutable — “change” means uploading a new object and updating the entity reference.

Option	Description	Trade-offs
A. Decoupled upload + entity update	(1) POST `/item/upload-url` -> presigned PUT URL + object key. (2) Client uploads to S3. (3) Client calls PUT `/item/{id}` with `imageUrl` set to the asset URL. Upload and entity update are independent operations.	Pro: Simplest. Reuses existing Item update endpoint. No new server-side state between upload and link. Aligns with Denis’s write flow. Client controls when to link. Con: Possible orphaned objects if step 3 never happens. The client must know the final asset URL format.
B. Upload-and-confirm	(1) POST `/item/{id}/image` -> presigned PUT URL + upload token. Server records pending upload. (2) Client uploads to S3. (3) POST `/item/{id}/image/confirm` with upload token. Server validates S3 object exists and sets `imageUrl`.	Pro: Server controls the full lifecycle. Can validate the upload (check S3 HeadObject). Prevents orphaned images (pending uploads can be cleaned up). Con: More complex. Requires server-side state for pending uploads. Two new endpoints instead of zero.
C. Upload via existing update	Client generates a UUID, uploads to a well-known key pattern, then includes the constructed URL in a normal Item create/update payload. No dedicated upload endpoint.	Pro: Zero new endpoints. Upload is just “put a file at a URL, then reference it.” Con: Client must know the S3 key convention. No server-side validation of the upload. Presigned URL generation must happen somewhere (can’t skip it with BLOCK_ALL).

Recommendation

Option A — decoupled upload + entity update. It is the simplest workflow, aligns with Denis’s write-path design, and the existing Item update endpoint already accepts imageUrl. Orphaned objects are mitigated by the immutability principle (objects are cheap to store and can be cleaned up periodically if needed). The presigned URL endpoint returns the complete asset URL that the client will later set as imageUrl.

Discussion

Presigned POST vs. Presigned PUT (from content validation research):

Presigned PUT URLs cannot enforce Content-Type or Content-Length server-side. S3 does not reject uploads that differ in size or type from what was signed. This is a significant gap for an upload workflow that needs to constrain file types to images and enforce a maximum file size.

Presigned POST (form-based upload) solves this via a POST policy document that S3 validates server-side:

content-length-range — e.g., ["content-length-range", 1, 10485760] enforces 1 byte to 10 MB.
Content-Type starts-with — e.g., ["starts-with", "$Content-Type", "image/"] restricts to image MIME types.
Key prefix — e.g., ["starts-with", "$key", "staging/"] forces uploads into the staging prefix for lifecycle-based orphan cleanup.

Impact on workflow: The endpoint should return presigned POST fields (URL + form fields + policy signature) instead of a presigned PUT URL. The client submits a multipart form POST to S3 instead of a raw PUT. This is a standard pattern (used by AWS documentation examples, Stripe, and others).

Orphan cleanup via staging prefix: Uploads land at staging/{tenant-id}/{feature}/{uuid}.{ext}. On entity update, the backend either: (a) copies the object to the final prefix and deletes the staging copy, or (b) simply records the staging key as the final key (if the staging prefix is CDN-accessible). An S3 lifecycle rule expires unclaimed staging/ objects after 7 days. A separate rule aborts incomplete multipart uploads after 1 day.

Partial upload handling: For single-part POST, S3 does not create the object if the client disconnects mid-upload (requires full body). For multipart uploads, the abort lifecycle rule handles fragments.

Decision

More analysis required.

Applied To

DQ-006: CDN Integration Approach

Context

The existing ApiCloudFront construct creates a distribution for API Gateway (no caching, all methods, no S3 origin). A new CloudFront configuration is needed for S3 asset serving. The question is whether to extend the existing distribution or create a new one.

Option	Description	Trade-offs
A. Add S3 behavior to existing API distribution	Add an additional cache behavior (e.g., `/assets/*`) to the existing `ApiCloudFront` distribution that routes to the S3 http-assets bucket with OAC. API routes remain the default behavior.	Pro: Single distribution, single domain. No additional CloudFront costs. Assets and API share the same origin domain. Con: Couples API and asset serving lifecycle. The existing construct would need modification. Cache invalidation for assets could affect API behavior configuration.
B. Separate CloudFront distribution for assets	New CDK construct `AssetCloudFront` with an S3 origin, OAC, and its own domain (e.g., `assets.<purpose>.arda.cards`).	Pro: Clean separation. Asset distribution can have its own caching, security, and lifecycle policies. Can configure signed cookies/URLs independently. No risk of affecting API distribution. Con: Additional CloudFront distribution cost (~$0). Separate domain requires additional DNS and certificate configuration. Cross-origin considerations for the SPA loading images from a different domain.
C. No CloudFront initially	Serve assets via S3 presigned GET URLs directly (no CDN). Add CloudFront later.	Pro: Simplest infrastructure. No CDN configuration needed. Presigned URLs provide access control inherently. Con: No edge caching. Higher S3 request costs. Higher latency for geographically distributed users. Every image load requires a presigned URL (backend involvement). Harder to add CDN later (URL format changes).

Recommendation

Option B — separate distribution. Asset serving has fundamentally different caching semantics than API calls (long TTL, immutable content, OAC required). A separate distribution keeps concerns cleanly separated and allows independent configuration of signed cookies/URLs without affecting the API distribution. The cross-origin concern is manageable with proper CORS headers.

Decision

Option B — separate distribution with changes:

The URL format should follow the pattern of other sub-services:
- ${partition}.${infrastructure}.assets.arda.cards
- For Production System, in addition to the environment domain: live.assets.arda.cards
Implementation and deployment will be in three phases:
1. Phase 1: Direct access to S3 bucket for download, no CDN
2. Phase 2: Add CloudFront distribution without access control/cookies
3. Phase 3: Add tenant based access control & browser cookies

Applied To

DQ-007: S3 Abstraction Placement

Context

The existing CsvS3BucketDirectAccess in common-module is CSV-specific. A general-purpose S3 access abstraction is needed for presigned URL generation, key construction, and metadata management.

Option	Description	Trade-offs
A. New abstraction in common-module alongside CSV	Create a new `S3BucketAccess` (or `FileStoreAccess`) interface and implementation in `common-module/lib/infra/storage/`, parallel to `CsvS3BucketDirectAccess`. Handles presigned PUT/GET URL generation, key construction, metadata.	Pro: Reusable by any component. Follows existing pattern placement. Can share the `S3AsyncClient` and `S3Presigner` infrastructure. Con: Adds to `common-module` surface area. Must be general enough for multiple use cases but concrete enough to be useful.
B. In operations, extract later	Build the abstraction in `operations/common/lib/` first. Extract to `common-module` when a second component needs it.	Pro: Faster to ship. No `common-module` release cycle dependency. Can iterate on the API with a single consumer. Con: May need refactoring when extracting. Other components cannot reuse it until extraction.
C. Extend CsvS3BucketDirectAccess	Generalize the existing class to handle non-CSV files.	Pro: Single abstraction. No new classes. Con: The existing class has CSV-specific semantics deeply embedded (row streaming, batch processing, header aliases). Generalizing it would break its focused purpose.

Recommendation

Option A — new abstraction in common-module. The presigned URL and key-construction logic is inherently cross-component (any service that needs file upload/download will need it). Placing it in common-module from the start avoids the extract-and-refactor cycle. It complements rather than replaces CsvS3BucketDirectAccess.

Decision

Agreed, Option A.

Additional Nice to have but not required: Extract the common elements between the existing CsvS3BucketDirectAccess and the new S3BucketAccess into a base class or utility helpers.

Applied To

DQ-008: Object Immutability and Versioning

Context

The design exploration establishes that bulk objects are considered immutable — change is handled by updating references in business entities. Denis’s design enables S3 versioning. These two approaches have different implications.

Option	Description	Trade-offs
A. Immutable objects, no versioning	Each upload creates a new object with a unique key (UUID). “Replacing” an image means uploading a new object and updating the entity reference. Old objects are orphaned and cleaned up periodically. S3 versioning disabled.	Pro: Aligns with design exploration principle. Simplest S3 configuration. CDN cache invalidation is unnecessary (new key = new URL = automatic cache bypass). No versioning storage cost. Con: Orphaned objects accumulate and need cleanup. Cannot recover “previous version” without application-level history (bitemporal entity history provides this).
B. Versioned objects, same key	Uploads overwrite the same key (derived from entity-id + feature). S3 versioning preserves previous versions. Entity always points to “latest.”	Pro: No orphaned objects. Built-in version history in S3. Can recover previous versions via S3 version-id. Con: CDN cache invalidation required on every update (CloudFront invalidation costs $0.005 per path after the first 1,000/month). Entity URL doesn’t change, so caches serve stale content until invalidated. Versioning increases storage cost (all versions retained). Contradicts the stated immutability principle.
C. Hybrid — immutable with lifecycle cleanup	Same as Option A, but with a lifecycle rule that transitions orphaned objects to cheaper storage (e.g., Glacier IR after 90 days) and eventually deletes them. “Orphan” detection via periodic scan comparing S3 keys against entity references.	Pro: Combines immutability with cost management. No cache invalidation needed. Storage cost controlled over time. Con: Requires an orphan cleanup mechanism (could be simple: lifecycle rule on a staging prefix, or complex: cross-reference scan).

Recommendation

Option A (immutable, no versioning) with a note to implement orphan cleanup if storage growth becomes a concern. The immutability principle is clean, avoids CDN cache invalidation entirely, and aligns with the bitemporal entity model (the entity’s history already tracks which image was active at any point in time). The orphan cost is negligible for product images (small files, low volume per tenant).

Decision

Pending of human team review with Denis.

Initial: Option A
Future: Add Orphan Cleanup per C or when the referring entity is marked as retired in the bitemporal model.