Design Session 01: Upload Product Images and Managed File Assets
Purpose
Section titled “Purpose”Goal: Resolve large design questions before deciding on Phasing, initial scope and detailed design.
Design exploration and decision dialog for the file/asset upload capability, covering S3 bucket architecture, access control, CDN integration, API design, and the first use case (product images in the Item module).
Decisions from this session are summarized in decision-log.md.
This session focuses on the system-level storage architecture and security model. These are foundational decisions that constrain all subsequent design work.
Context: Two Input Documents
Section titled “Context: Two Input Documents”This round synthesizes two perspectives:
-
project-description.md (Design Exploration section) — establishes system-level principles for bulk storage: objects are always owned by business entities, immutable, never used as shared global state. Classifies storage along three axes: Http Accessible vs. Internal Use, Externally Uploaded vs. Internally Sourced, Durable vs. Ephemeral.
-
dna-work-in-progress.md (Denis’s FileStore design) — proposes a Lambda-based FileStore service behind API Gateway + Cognito, with per-tenant namespace partitioning, presigned URLs for both reads and writes, and OAC for CloudFront. The storage backend (number and organization of buckets) is transparent to API consumers.
Context: Existing Infrastructure
Section titled “Context: Existing Infrastructure”- A partition-level upload bucket already exists with 1-day TTL, used for
CSV uploads. Created by
BulkStoresStackin CDK. - An API CloudFront distribution (
ApiCloudFrontconstruct) already exists for API Gateway, but it is API-specific (no caching, passes all methods through). There is no existing S3-origin CloudFront construct. - The
UploadBucketCDK construct is parameterized (name,expirationDays) and reusable for creating additional buckets.
DQ-001: S3 Bucket Organization
Section titled “DQ-001: S3 Bucket Organization”Context
Section titled “Context”The system needs bulk storage for multiple purposes with different lifecycle and access characteristics. The current infrastructure has a single ephemeral upload bucket per partition. The design exploration in project-description.md identifies three classification axes that create distinct storage profiles.
Storage Profiles Identified
Section titled “Storage Profiles Identified”| Profile | Access | Lifecycle | Example |
|---|---|---|---|
| Http Assets | CDN/public | Durable | Product images, logos |
| Internal Durable | Backend IAM | Durable | Generated reports, archives |
| Ephemeral Upload | Backend IAM | Short TTL | CSV uploads for processing |
| Ephemeral Download | CDN/public | Short TTL | CSV downloads, export files |
| Option | Description | Trade-offs |
|---|---|---|
| A. Single bucket, prefix-partitioned | One bucket per partition. Prefixes like http-assets/, ephemeral/, internal/ differentiate profiles. Prefix-based lifecycle rules handle TTL differences. | Pro: Simplest to manage, single IAM boundary, single presigning role. S3 supports up to 1,000 lifecycle rules per bucket with prefix scoping. Con: CloudFront OAC applies to the entire origin (bucket), so all content in the bucket is accessible via CloudFront — must rely on CloudFront behaviors or signed URLs/cookies for access differentiation. Mixes durable and ephemeral content operationally. |
| B. Two buckets by lifecycle | http-assets bucket (durable, CDN-fronted, versioned) and ephemeral bucket (current, short TTL, no CDN). Add more buckets only when a genuinely new profile emerges. | Pro: Clean separation of CDN-fronted vs internal content. OAC on http-assets bucket means CloudFront can only reach durable assets. Ephemeral bucket retains current behavior unchanged. Each bucket has its own IAM and lifecycle boundary. Con: Two presigning roles, two sets of IAM policies, two sets of CloudFormation exports. Modest operational overhead. |
| C. Three+ buckets by profile | One bucket per storage profile from the table above. | Pro: Maximum isolation. Each bucket has exactly one lifecycle policy, one access pattern. Cleanest CloudFront OAC scoping. Con: More infrastructure to manage. The current system has one bucket; jumping to 3-4 adds CDK complexity, more CloudFormation exports, more IAM roles. Some profiles (Internal Durable) may not be needed yet. |
| D. Per-component buckets | Each microservice (operations, accounts, …) gets its own bucket(s). | Pro: Maximum blast-radius isolation between services. Con: Violates the principle that bucket organization should be transparent to API consumers (per Denis’s design). Significant infrastructure proliferation. Components share the same partition and same CDN — per-component buckets create CDN routing complexity. |
Recommendation
Section titled “Recommendation”Option B — two buckets by lifecycle. It gives the clean separation needed for CDN (OAC only on the http-assets bucket), preserves the existing ephemeral bucket unchanged, and aligns with the principle that bucket internals are transparent to API consumers. A third bucket (internal durable) can be added later if a concrete use case demands it. Option A is viable but conflates CDN-accessible and internal-only content behind the same OAC boundary. Options C and D over-engineer for current needs.
Discussion
Section titled “Discussion”Q: With Option B, internal files would be in an HTTP accessible bucket, is it possible to restrict access to certain prefixes or objects?
A: With Option B, the question is moot for the current set of use cases
because “Internal Durable” files do not exist yet — the two buckets are
http-assets (durable, CDN-fronted) and ephemeral (current bucket, short
TTL, backend-only). Internal durable files have no bucket to land in, which is
by design: a third bucket is added only when that profile materializes.
If in the future an internal-only durable file needs to coexist in the
http-assets bucket (to avoid a third bucket), access restriction is possible
at two levels:
-
CloudFront bucket policy with prefix conditions: The OAC bucket policy can restrict
s3:GetObjectto specific prefixes. For example, grant CloudFront access only to objects under{tenant-id}/product-images/*and{tenant-id}/user-profiles/*, while denying access to{tenant-id}/internal/*. CloudFront requests to internal prefixes would get a 403 from S3 directly.{"Effect": "Allow","Principal": { "Service": "cloudfront.amazonaws.com" },"Action": "s3:GetObject","Resource": "arn:aws:s3:::bucket-name/*/product-images/*","Condition": {"StringEquals": {"AWS:SourceArn": "arn:aws:cloudfront::ACCOUNT:distribution/DIST-ID"}}}Objects under prefixes not covered by the policy are invisible to CloudFront even though they are in the same bucket.
-
CloudFront cache behaviors: A behavior matching
/internal/*could return a static 403 response (using a CloudFront Function) or simply not be configured, causing the default behavior to apply. Since the default behavior would be the S3 origin with OAC, the bucket policy restriction in (1) is the more reliable guard.
Bottom line: Option B does not force internal files into the CDN-accessible bucket. If it ever becomes necessary, prefix-scoped bucket policies provide effective access restriction within a single bucket. But the cleaner path is to add a third bucket when the “Internal Durable” profile has a concrete use case.
Decision
Section titled “Decision”- Option B with a partition bucket per profile (2 initially: http-assets & ephemeral-uploads)
- Bucket installed at Partition Configuration time.
- Object Key Format to enable policy-based access control:
${tenantId}/${owning-module}/${entity-type}/${property-name}/${asset-uuid}.${extension} - This project needs to modify the
infrastructurerepository to support the new bucket as part of the partition configuration.
Analysis
Section titled “Analysis”Key format supersedes DQ-004. The key structure decided here
(${tenantId}/${owning-module}/${entity-type}/${property-name}/${asset-uuid}.${extension})
is more granular than the DQ-004 recommendation ({tenant-id}/{feature}/{uuid}.{ext}),
replacing the single {feature} segment with three domain-aware segments. DQ-004
should be marked as decided by this decision.
Domain model coupling. The segments {owning-module}/{entity-type}/{property-name}
couple the S3 key structure to the domain model. If an entity type is renamed or
a property moves, the key path changes. However, under the immutability principle
(DQ-008), this coupling is low-risk: old objects with old key paths remain valid
and accessible — only new uploads use the new path. The entity reference (e.g.,
Item.imageUrl) stores the full key, so renames do not break existing references.
Signed cookie scoping alignment. The tenant-first key structure aligns with
the DQ-002 signed cookie approach: cookies scoped to /{tenant-id}/* cover all
objects under the tenant prefix regardless of the deeper segments. The additional
granularity is invisible to the cookie validation layer.
Applied To
Section titled “Applied To”DQ-002: Read-Path Access Control Model
Section titled “DQ-002: Read-Path Access Control Model”Context
Section titled “Context”Http-accessible assets must be served directly to HTTP clients (browsers) without backend proxying. Access needs to be partitioned by tenant. The question is what level of security is appropriate and how it interacts with CDN caching.
Key AWS constraint: CloudFront Functions cannot verify RS256/ES256 JWT signatures (the crypto module only supports HMAC). Lambda@Edge can verify JWTs but adds latency (~5ms cold, sub-ms warm) and cost per invocation.
Denis’s FileStore design uses API Gateway + Cognito + Lambda to issue presigned GET URLs (302 redirect). This provides full tenant auth but means every image load requires a Lambda invocation and produces a unique presigned URL that defeats CDN caching.
| Option | Description | Trade-offs |
|---|---|---|
| A. Unguessable keys, no auth | Object keys include UUID components making them unguessable. CloudFront serves content without authentication. Anyone with the URL can access the asset. | Pro: Simplest. Full CDN caching. No Lambda or edge functions needed. Zero latency overhead. Works identically in local dev and production. Con: No tenant isolation at access time — URL sharing or logging exposes content. Acceptable only if assets are not sensitive. |
| B. CloudFront signed cookies (tenant-scoped) | Backend issues signed cookies scoped to /{tenant-id}/* path prefix on login. CloudFront validates cookies at the edge natively (no Lambda). Cookies refreshed periodically. | Pro: Tenant-level isolation. Full CDN caching (cookie is not part of cache key). Sub-millisecond edge verification. One cookie set per tenant session covers all assets. Con: Requires RSA/ECDSA key pair management (trusted key groups). Cookie scoping requires tenant-id in the URL path. Browser must send cookies with image requests (same-site/cross-origin considerations). Adds complexity to the auth flow. |
| C. Lambda@Edge JWT validation | Lambda@Edge on viewer-request validates the JWT from the Authorization header or a cookie, extracts tenant-id, and validates against the request path prefix. | Pro: Full tenant isolation with standard JWT tokens. Reuses existing Cognito token infrastructure. Can enforce fine-grained policies (e.g., per-module access). Con: Lambda@Edge invoked on every cache miss (viewer-request trigger). Cold start latency (~5s for first invocation). Must cache JWKS keys. Adds per-request cost. More complex to deploy and test (must be in us-east-1). Cannot be tested with LocalStack. |
| D. FileStore Lambda (Denis’s design) | API Gateway route /assets/{key} -> Cognito auth -> Lambda -> 302 redirect to S3 presigned GET URL. Full tenant validation in Lambda. | Pro: Full tenant isolation. Reuses existing API Gateway + Cognito infrastructure. Lambda has full AWS SDK access for authorization logic. Con: Every image load is: browser -> CloudFront -> API Gateway -> Lambda -> S3 presign -> 302 -> browser -> S3 GET. No CDN caching of the redirect (each presigned URL is unique). Higher latency for image loads. Lambda cost per image request. Effectively makes the CDN useless for read-path caching of assets. |
Recommendation
Section titled “Recommendation”Option B (signed cookies) as the target architecture, with Option A as an acceptable Phase 1 shortcut if product images are not considered sensitive. Signed cookies provide tenant isolation with full CDN caching — the best balance of security and performance. Option D is architecturally sound but sacrifices the primary benefit of a CDN (caching). Option C adds too much edge complexity for this use case.
Note: Options B, C, and D are not mutually exclusive with the FileStore concept. The FileStore can be the write-path authority (presigned PUT URLs) while the read path uses signed cookies directly from CloudFront.
Discussion
Section titled “Discussion”Q: If the tenant id is in the header, does Option B still work?
A: No — not directly. CloudFront signed cookies validate against the
URL path, not request headers. The signed cookie’s custom policy contains a
Resource field like https://assets.example.com/{tenant-id}/*, and
CloudFront matches it against the request URL. If the tenant-id is only in an
X-Tenant-ID header, CloudFront has no way to enforce it.
This means the tenant-id must be part of the URL path for Option B to work.
This is already reflected in the DQ-004 recommendation (tenant-first key
structure: {tenant-id}/{feature}/{uuid}.{ext}). When CloudFront serves
assets, the URL would be
https://assets.arda.cards/{tenant-id}/product-images/{uuid}.png, and the
signed cookie scopes to https://assets.arda.cards/{tenant-id}/*.
The X-Tenant-ID header remains relevant for the write path (API calls to
the backend for presigned PUT URLs), but the read path (browser loading
images from CloudFront) relies on the tenant-id embedded in the URL path. This
is a natural consequence of the key structure — the S3 object key becomes the
CloudFront URL path.
Q: How does it work if the user has logged in as multiple tenants in one browser session?
A: CloudFront signed cookies use three fixed cookie names:
CloudFront-Policy, CloudFront-Signature, CloudFront-Key-Pair-Id. The
browser can hold multiple cookies with the same name if they have different
Path attributes. The approach for multi-tenant sessions:
-
Set cookies with tenant-scoped
Path: When the user accesses tenant A, the backend issues signed cookies withPath=/{tenant-a-id}/. When the user switches to tenant B, issue a second set withPath=/{tenant-b-id}/. -
Browser sends the right cookies per request: For a request to
/{tenant-a-id}/product-images/img.png, the browser sends only the cookies whosePathmatches/{tenant-a-id}/. The tenant-B cookies are not sent because theirPathdoes not match. -
CloudFront sees exactly one set: Since only the matching tenant’s cookies are sent, CloudFront validates them normally against the policy’s
Resourcefield.
Constraints and caveats:
- This relies on the browser’s standard cookie path-matching behavior (RFC 6265). All modern browsers implement this correctly.
- If a user has access to N tenants, they accumulate N sets of 3 cookies each. This is not a scalability concern for reasonable tenant counts.
- Cookie expiration should be short (1-8 hours) with refresh on tenant switch, to limit the window of exposure.
- The
Domainattribute on the cookies must match the asset distribution domain (e.g.,assets.arda.cards). If the SPA is on a different domain (app.arda.cards), the cookies must be set withDomain=.arda.cardsor the asset domain, andSameSite=None; Secureto allow cross-origin sending.
Alternative: If multi-tenant cookie management proves too complex for Phase 1, Option A (unguessable keys, no auth) avoids the issue entirely. The multi-tenant cookie behavior can be deferred until signed cookies are actually implemented.
Q: In Option B scenario, who sets the cookies? The SPA or the browser based on http responses?
A: The backend sets the cookies via HTTP response headers — the SPA does not construct or manage them directly. The flow:
- The SPA calls a backend endpoint (e.g.,
POST /auth/asset-cookiesor as part of the login/tenant-switch flow) over the normal API path (API Gateway + Cognito JWT). - The backend validates the JWT, extracts the tenant-id, and generates three
CloudFront signed cookies using the CloudFront private key:
CloudFront-Policy(base64-encoded JSON policy withResource=https://assets.arda.cards/{tenant-id}/*and expiration time)CloudFront-Signature(RSA signature of the policy)CloudFront-Key-Pair-Id(identifies which public key CloudFront should use)
- The backend returns these as
Set-Cookieheaders with:Path=/{tenant-id}/(scopes the cookie to that tenant’s URL prefix)Domain=.arda.cards(or the asset distribution domain)Secure; HttpOnly; SameSite=None(cross-origin, HTTPS-only, not accessible to JavaScript)
- The browser stores the cookies automatically and sends them on subsequent
requests to
assets.arda.cards/{tenant-id}/*— no SPA code needed for cookie management.
The SPA’s only responsibility is to trigger the cookie issuance (e.g., on login
or tenant switch) and to use the correct asset URLs that match the cookie path.
The HttpOnly flag means the SPA cannot read or modify the cookies, which is a
security benefit.
Refresh: The backend sets a reasonable expiration (e.g., 4-8 hours). The SPA
can proactively refresh by calling the cookie endpoint before expiration, or the
backend can issue fresh cookies piggy-backed on any authenticated API response
via Set-Cookie headers.
Q: Does the Back-End in this context mean the BFF that controls the User Interaction?
A: Yes — in the Arda architecture, the “backend” that sets cookies is the
BFF (Backend-for-Frontend), specifically the Next.js proxy layer
(proxy.ts in arda-frontend-app). This is the natural place because:
-
The BFF already manages the user session. It holds the Cognito JWT and the tenant context (
X-Tenant-ID). It is the component that knows which tenant the user is currently operating as. -
The BFF can set cookies on the correct domain. Since the BFF serves the SPA (e.g.,
app.arda.cards), it can setSet-Cookieheaders withDomain=.arda.cardsthat the browser will send toassets.arda.cards. A backend microservice behind API Gateway cannot easily set cookies on the asset domain because the response goes through API Gateway, which does not relaySet-Cookieheaders transparently. -
The cookie issuance is a UI concern, not a business logic concern. The microservices (operations, etc.) should not know about CloudFront cookie mechanics. The BFF is the boundary between the user’s browser session and the backend services.
Implementation path: The BFF would have a route (e.g.,
/api/asset-cookies) that:
- Validates the user’s session (Cognito JWT).
- Calls an internal endpoint or uses the AWS SDK directly to generate the three CloudFront signed cookies for the current tenant.
- Returns them as
Set-Cookieheaders.
Alternatively, the BFF could issue cookies as a side-effect of the tenant-switch or login flow, avoiding a separate round-trip.
Key implication for DQ-003: This reinforces Option C (shared library in common-module). The CloudFront cookie-signing logic would live in the BFF, not in the operations microservice. The operations microservice only handles the write path (presigned POST for uploads). The read path is entirely BFF + CloudFront — no microservice involvement.
Q: The BFF in Arda’s system is a Next.js application in the arda-frontend-app
repository that runs as an Amplify App in AWS. What are the implications of this
for Option B?
A: Running the BFF as an AWS Amplify Hosting application has several concrete implications for signed cookie issuance:
What works well:
-
AWS SDK access in server-side routes. Next.js API routes and server actions in Amplify run as Lambda functions with full Node.js runtime. The
@aws-sdk/cloudfront-signerpackage providesgetSignedCookies()— the BFF can call this directly to generate the three signed cookies. No external service call required; signing is a local cryptographic operation using the private key. -
Full response header control. Next.js API routes can set arbitrary
Set-Cookieheaders. Amplify’s managed CloudFront distribution preserves response headers from the origin (the Lambda function), so signed cookies flow through to the browser intact. -
IAM permissions are not needed for signing. CloudFront cookie signing uses an RSA private key directly (not an AWS API call). The Amplify execution role does not need any CloudFront-related IAM permissions for this operation.
Constraints to address:
-
Private key storage. The CloudFront RSA private key must be accessible to the BFF at runtime. Options:
- Amplify environment variables (encrypted at rest) — simplest, but environment variables have a 4 KB limit. RSA 2048-bit PEM keys are ~1.7 KB, so this fits.
- AWS Secrets Manager — more secure, supports rotation, but adds a runtime API call on cold start (~50ms). The BFF would cache the key in memory after first fetch.
- SSM Parameter Store (SecureString) — similar to Secrets Manager, lower cost.
Recommendation: Amplify environment variable for MVP simplicity. Migrate to Secrets Manager if key rotation becomes a requirement.
-
Cookie domain alignment. For cross-subdomain cookies to work:
- Amplify app:
app.arda.cards(or similar) - Asset distribution:
assets.arda.cards - Cookie
Domain=.arda.cards— both subdomains share the parent domain
This works if both subdomains are under the same registrable domain. The Amplify app’s route sets
Set-CookiewithDomain=.arda.cards, and the browser sends those cookies on requests toassets.arda.cards. No cross-origin issues because the cookie domain covers both. - Amplify app:
-
Two separate CloudFront distributions. Amplify creates a managed CloudFront distribution for the app. The asset bucket needs its own separate distribution (per DQ-006 Option B). This is fine — they are independent and have different subdomains. The Amplify-managed distribution cannot be extended with custom S3 origins (Amplify controls its configuration), which is another reason DQ-006 Option B (separate distribution) is the right choice.
-
proxy.tsand cookie issuance path. The currentproxy.ts(Node.js runtime, replacesmiddleware.ts) handles request proxying to API Gateway. Cookie issuance could be:- A dedicated API route (e.g.,
app/api/asset-cookies/route.ts) — cleanest separation. - Integrated into
proxy.tsas a side-effect on responses that include tenant context — avoids a separate round-trip but adds complexity to the proxy. - Issued during the authentication/tenant-switch flow — piggy-backed on an existing interaction.
- A dedicated API route (e.g.,
Net assessment: Amplify Hosting is fully compatible with Option B. The BFF has the runtime capabilities needed (AWS SDK, response headers, secret access). The main new artifact is the CloudFront key pair (public key registered with CloudFront, private key stored in Amplify environment variable) and a server- side route for cookie issuance. This is a small addition to the BFF, not a structural change.
Q: How would Option B work with a non-browser client?
A: It depends on the client type and its HTTP cookie support:
Clients with cookie jar support (e.g., mobile apps using URLSession/OkHttp,
curl --cookie-jar, Python requests.Session): These work identically to
browsers. The client calls the cookie endpoint, receives Set-Cookie headers,
stores them in its cookie jar, and automatically attaches them to subsequent
requests matching the Path and Domain.
Clients without cookie support (e.g., simple HTTP clients, wget, IoT
devices, server-to-server): Option B does not work natively. Alternatives:
-
CloudFront signed URLs (per-request, instead of cookies): The backend generates a signed URL for each specific asset. The client uses the signed URL directly. This is Option B’s fallback — CloudFront supports both signed cookies and signed URLs with the same key pair. The backend can offer both: cookies for browser clients, signed URLs for API clients.
-
Presigned S3 GET URLs (bypass CloudFront): For server-to-server access, the backend generates presigned S3 GET URLs directly. The client fetches from S3 without going through CloudFront. This is Denis’s Option D approach, appropriate for trusted backend clients that don’t need CDN caching.
-
Header-based auth via Lambda@Edge: Option C from the options table — the client sends a JWT in the
Authorizationheader. This works for any HTTP client but adds edge compute cost.
Practical impact: For the MVP (product images displayed in the SPA), all clients are browsers. Non-browser clients (e.g., mobile apps, API integrations) are a future concern, and the signed URL fallback handles them without architectural changes. The backend can expose both cookie and signed-URL endpoints behind the same key pair infrastructure.
Decision
Section titled “Decision”- Initial Implementation: Option A as the asset path has two hard to guess components (tenantId, assetId)
- Enhanced Security Implementation: Option B with signed cookies (disable Option A at this time)
- CDN Usage:
- Separate CloudFront configuration for Option A and Option B.
- Next.js/Amplify BFF
- Secret Manager to store CloudFront key pair (pattern for other parts of the system)
- Scope Extension:
arda-frontend-appneeds to be modified as part of this project.
Analysis
Section titled “Analysis”Two distributions, not one reconfigured. The decision calls for separate CloudFront distributions for Option A and Option B (not a single distribution reconfigured in-place). This means:
- Phase 1 (Option A): a distribution with no viewer access restriction. OAC
handles S3 origin access, but any request with the correct URL reaches the
asset. DNS points
assets.arda.cardsto this distribution. - Phase 2 (Option B): a new distribution requiring signed cookies (Restrict
Viewer Access + Trusted Key Groups). DNS swaps
assets.arda.cardsto the new distribution. The old distribution can be decommissioned. - The two-distribution approach avoids a reconfiguration that would break in-flight requests during the transition. The DNS swap provides a clean cutover with rollback capability (point DNS back to the old distribution).
“Disable Option A at this time” clarification. “At this time” means at the time Option B (signed cookies) is introduced — not immediately. During Phase 1, Option A is the active and only access model. When signed cookies are deployed, Option A’s distribution is replaced by Option B’s distribution, and unguarded access ceases.
Secrets Manager establishes a pattern. Choosing Secrets Manager over Amplify environment variables for the CloudFront key pair is slightly more complex but sets a precedent for other secrets the system may need. This is a Phase 2 artifact — Phase 1 (Option A) does not need the key pair at all.
Scope extension phasing. arda-frontend-app modifications are Phase 2 only
(BFF cookie issuance route via @aws-sdk/cloudfront-signer). Phase 1 requires
no BFF changes — the SPA renders <img src="https://assets.arda.cards/...">
directly. The project plan should reflect which repository changes belong to
which phase to avoid pulling arda-frontend-app into Phase 1 scope.
Applied To
Section titled “Applied To”DQ-003: FileStore as Lambda vs. Component Service
Section titled “DQ-003: FileStore as Lambda vs. Component Service”Context
Section titled “Context”Denis’s design proposes the FileStore as an AWS Lambda behind API Gateway. The existing system has backend components running on EKS. The FileStore’s responsibilities are: (1) authorize the request, (2) map the key to the tenant namespace, (3) generate presigned URLs. These are lightweight stateless operations.
| Option | Description | Trade-offs |
|---|---|---|
| A. Lambda behind API Gateway | Dedicated Lambda function handles /assets/* routes. Cognito authorizer on API Gateway validates JWT before Lambda invocation. Lambda maps tenant-id + key to S3 object, generates presigned URLs. | Pro: Aligns with Denis’s design. Scales to zero when not in use. No EKS pod overhead. Natural fit for the stateless presign operation. Clean separation from business logic. Con: New deployment artifact (Lambda). Different CI/CD pipeline than Kotlin services. Cannot share common-module Kotlin code (unless compiled to native/JVM Lambda). Cold start latency for JVM Lambdas (~1-3s). |
| B. EKS component service (Kotlin/Ktor) | FileStore is a regular Ktor module registered in a component (possibly operations initially, or a new filestore component). Uses the same common-module abstractions and deployment pipeline. | Pro: Reuses existing deployment infrastructure (Helm, EKS). Can share common-module S3 abstractions directly. Same language and testing patterns. No cold starts. Trusted components can call it directly (in-cluster). Con: Always-running pod even when idle. Tighter coupling to component lifecycle. Adds an HTTP hop if other components need to call it. |
| C. Shared library in common-module only | No separate service. Each component that needs file operations embeds the presigned URL logic directly (via common-module abstraction). The write path lives in the component’s own endpoints. | Pro: Simplest deployment — no new service. Each component controls its own upload/download routes. Aligns with how CSV upload already works (embedded in operations). Con: No centralized access control for the storage backend. Each component must independently implement tenant namespace mapping and authorization. Harder to enforce consistent key structure across components. |
Recommendation
Section titled “Recommendation”Option C for Phase 1 (product images) combined with Option
A as a future target. For the immediate use case, product image upload is an
operations-module concern. Embedding the presigned URL logic in common-module
(as a generalized S3 access abstraction) and wiring it through the Item module
endpoints is the simplest path. This does not preclude extracting a FileStore
Lambda later when multiple components need file operations — the
common-module abstraction becomes the shared logic that the Lambda wraps.
Denis’s Lambda design (Option A) is the right long-term architecture, but building it now would require standing up a new Lambda deployment pipeline before we have a second use case to justify it.
Decision
Section titled “Decision”- Option C, will re-evaluate Option A at a later time once the upload use-cases are better understood.
Design Note: Key Structure can be enforced by implementing a strongly typed Storage Access class
parametrized with the module name, entity name and property name. Class instantiation is at bootstrap time from Configuration parameters.
Runtime Object Access only needs tenantId, object key and extension (or derived from property type metadata if available). TenantId is available at
runtime in the ApplicationContext so there is no need to pass it as a parameter, keeping the Storage Access interface simple
and clean.
Applied To
Section titled “Applied To”DQ-004: Object Key Structure
Section titled “DQ-004: Object Key Structure”Context
Section titled “Context”Object keys must support tenant isolation, feature namespacing, CloudFront path-pattern routing, and collision avoidance. The key structure also determines CloudFront cache behavior configuration and signed cookie scoping.
Per the design exploration principles: objects are owned by business entities, accessed through the owning entity, and considered immutable.
| Option | Description | Trade-offs |
|---|---|---|
| A. Tenant-first | {tenant-id}/{feature}/{uuid}.{ext} e.g., a1b2c3d4/.../product-images/f47ac10b.png | Pro: Natural for IAM prefix-based policies scoping access to a tenant. Signed cookies can scope to /{tenant-id}/*. Con: CloudFront cache behaviors cannot route by tenant (too many tenants for static behavior rules). Feature-based CloudFront routing requires wildcards. |
| B. Feature-first | {feature}/{tenant-id}/{uuid}.{ext} e.g., product-images/a1b2c3d4/.../f47ac10b.png | Pro: CloudFront path patterns can route by feature (e.g., /product-images/*). Lifecycle rules can target features if needed. Con: Signed cookies scoped to /{feature}/{tenant-id}/* are more complex. IAM prefix policies for tenant isolation require conditions with wildcards. |
| C. Flat with UUID | {uuid}.{ext} All context in metadata only. | Pro: Simplest keys. Maximum collision avoidance. Con: No structural tenant isolation. No prefix-based lifecycle or IAM. Requires metadata lookup for any context. Debugging is harder. |
Recommendation
Section titled “Recommendation”Option A — tenant-first. Tenant isolation is the primary
organizational concern. Signed cookies scope naturally to
/{tenant-id}/*. IAM policies can restrict to a tenant prefix.
CloudFront path-pattern routing is less important for http assets (they all use
the same cache behavior: serve from S3 with OAC). The {feature} segment
provides logical grouping for debugging and future lifecycle differentiation
without affecting routing.
Full key template: {tenant-id}/{feature}/{uuid}.{ext}
Where:
{tenant-id}is the tenant UUID (fromX-Tenant-IDheader).{feature}is a static string identifying the use case (e.g.,product-images,user-profiles).{uuid}is a server-generated UUID for collision avoidance.{ext}is the file extension derived from the declared content type.
Decision
Section titled “Decision”Resolved by DQ-001. The DQ-001 decision
specifies the key format as
${tenantId}/${owning-module}/${entity-type}/${property-name}/${asset-uuid}.${extension},
which is a more granular variant of Option A (tenant-first). See the
DQ-001 Analysis for trade-off notes on domain model coupling and
signed cookie scoping alignment.
Applied To
Section titled “Applied To”DQ-005: Upload Workflow (Write Path)
Section titled “DQ-005: Upload Workflow (Write Path)”Context
Section titled “Context”The upload workflow must: (1) authenticate the user, (2) authorize the upload for the tenant, (3) generate a presigned PUT URL with appropriate constraints, (4) allow the client to upload directly to S3, (5) link the uploaded object to the business entity. Denis’s design describes steps 1-4 clearly. Step 5 is done by the client calling the Arda Component to persist the key.
The design exploration principle states that objects are immutable — “change” means uploading a new object and updating the entity reference.
| Option | Description | Trade-offs |
|---|---|---|
| A. Decoupled upload + entity update | (1) POST /item/upload-url -> presigned PUT URL + object key. (2) Client uploads to S3. (3) Client calls PUT /item/{id} with imageUrl set to the asset URL. Upload and entity update are independent operations. | Pro: Simplest. Reuses existing Item update endpoint. No new server-side state between upload and link. Aligns with Denis’s write flow. Client controls when to link. Con: Possible orphaned objects if step 3 never happens. The client must know the final asset URL format. |
| B. Upload-and-confirm | (1) POST /item/{id}/image -> presigned PUT URL + upload token. Server records pending upload. (2) Client uploads to S3. (3) POST /item/{id}/image/confirm with upload token. Server validates S3 object exists and sets imageUrl. | Pro: Server controls the full lifecycle. Can validate the upload (check S3 HeadObject). Prevents orphaned images (pending uploads can be cleaned up). Con: More complex. Requires server-side state for pending uploads. Two new endpoints instead of zero. |
| C. Upload via existing update | Client generates a UUID, uploads to a well-known key pattern, then includes the constructed URL in a normal Item create/update payload. No dedicated upload endpoint. | Pro: Zero new endpoints. Upload is just “put a file at a URL, then reference it.” Con: Client must know the S3 key convention. No server-side validation of the upload. Presigned URL generation must happen somewhere (can’t skip it with BLOCK_ALL). |
Recommendation
Section titled “Recommendation”Option A — decoupled upload + entity update. It is the
simplest workflow, aligns with Denis’s write-path design, and the existing Item
update endpoint already accepts imageUrl. Orphaned objects are mitigated by
the immutability principle (objects are cheap to store and can be cleaned up
periodically if needed). The presigned URL endpoint returns the complete asset
URL that the client will later set as imageUrl.
Discussion
Section titled “Discussion”Presigned POST vs. Presigned PUT (from content validation research):
Presigned PUT URLs cannot enforce Content-Type or Content-Length server-side. S3 does not reject uploads that differ in size or type from what was signed. This is a significant gap for an upload workflow that needs to constrain file types to images and enforce a maximum file size.
Presigned POST (form-based upload) solves this via a POST policy document that S3 validates server-side:
content-length-range— e.g.,["content-length-range", 1, 10485760]enforces 1 byte to 10 MB.Content-Typestarts-with — e.g.,["starts-with", "$Content-Type", "image/"]restricts to image MIME types.- Key prefix — e.g.,
["starts-with", "$key", "staging/"]forces uploads into the staging prefix for lifecycle-based orphan cleanup.
Impact on workflow: The endpoint should return presigned POST fields (URL + form fields + policy signature) instead of a presigned PUT URL. The client submits a multipart form POST to S3 instead of a raw PUT. This is a standard pattern (used by AWS documentation examples, Stripe, and others).
Orphan cleanup via staging prefix: Uploads land at
staging/{tenant-id}/{feature}/{uuid}.{ext}. On entity update, the backend
either: (a) copies the object to the final prefix and deletes the staging copy,
or (b) simply records the staging key as the final key (if the staging prefix is
CDN-accessible). An S3 lifecycle rule expires unclaimed staging/ objects after
7 days. A separate rule aborts incomplete multipart uploads after 1 day.
Partial upload handling: For single-part POST, S3 does not create the object if the client disconnects mid-upload (requires full body). For multipart uploads, the abort lifecycle rule handles fragments.
Decision
Section titled “Decision”More analysis required.
Applied To
Section titled “Applied To”DQ-006: CDN Integration Approach
Section titled “DQ-006: CDN Integration Approach”Context
Section titled “Context”The existing ApiCloudFront construct creates a distribution
for API Gateway (no caching, all methods, no S3 origin). A new CloudFront
configuration is needed for S3 asset serving. The question is whether to
extend the existing distribution or create a new one.
| Option | Description | Trade-offs |
|---|---|---|
| A. Add S3 behavior to existing API distribution | Add an additional cache behavior (e.g., /assets/*) to the existing ApiCloudFront distribution that routes to the S3 http-assets bucket with OAC. API routes remain the default behavior. | Pro: Single distribution, single domain. No additional CloudFront costs. Assets and API share the same origin domain. Con: Couples API and asset serving lifecycle. The existing construct would need modification. Cache invalidation for assets could affect API behavior configuration. |
| B. Separate CloudFront distribution for assets | New CDK construct AssetCloudFront with an S3 origin, OAC, and its own domain (e.g., assets.<purpose>.arda.cards). | Pro: Clean separation. Asset distribution can have its own caching, security, and lifecycle policies. Can configure signed cookies/URLs independently. No risk of affecting API distribution. Con: Additional CloudFront distribution cost (~$0). Separate domain requires additional DNS and certificate configuration. Cross-origin considerations for the SPA loading images from a different domain. |
| C. No CloudFront initially | Serve assets via S3 presigned GET URLs directly (no CDN). Add CloudFront later. | Pro: Simplest infrastructure. No CDN configuration needed. Presigned URLs provide access control inherently. Con: No edge caching. Higher S3 request costs. Higher latency for geographically distributed users. Every image load requires a presigned URL (backend involvement). Harder to add CDN later (URL format changes). |
Recommendation
Section titled “Recommendation”Option B — separate distribution. Asset serving has fundamentally different caching semantics than API calls (long TTL, immutable content, OAC required). A separate distribution keeps concerns cleanly separated and allows independent configuration of signed cookies/URLs without affecting the API distribution. The cross-origin concern is manageable with proper CORS headers.
Decision
Section titled “Decision”Option B — separate distribution with changes:
- The URL format should follow the pattern of other sub-services:
${partition}.${infrastructure}.assets.arda.cards- For Production System, in addition to the environment domain:
live.assets.arda.cards
- Implementation and deployment will be in three phases:
- Phase 1: Direct access to S3 bucket for download, no CDN
- Phase 2: Add CloudFront distribution without access control/cookies
- Phase 3: Add tenant based access control & browser cookies
Applied To
Section titled “Applied To”DQ-007: S3 Abstraction Placement
Section titled “DQ-007: S3 Abstraction Placement”Context
Section titled “Context”The existing CsvS3BucketDirectAccess in common-module is
CSV-specific. A general-purpose S3 access abstraction is needed for presigned
URL generation, key construction, and metadata management.
| Option | Description | Trade-offs |
|---|---|---|
| A. New abstraction in common-module alongside CSV | Create a new S3BucketAccess (or FileStoreAccess) interface and implementation in common-module/lib/infra/storage/, parallel to CsvS3BucketDirectAccess. Handles presigned PUT/GET URL generation, key construction, metadata. | Pro: Reusable by any component. Follows existing pattern placement. Can share the S3AsyncClient and S3Presigner infrastructure. Con: Adds to common-module surface area. Must be general enough for multiple use cases but concrete enough to be useful. |
| B. In operations, extract later | Build the abstraction in operations/common/lib/ first. Extract to common-module when a second component needs it. | Pro: Faster to ship. No common-module release cycle dependency. Can iterate on the API with a single consumer. Con: May need refactoring when extracting. Other components cannot reuse it until extraction. |
| C. Extend CsvS3BucketDirectAccess | Generalize the existing class to handle non-CSV files. | Pro: Single abstraction. No new classes. Con: The existing class has CSV-specific semantics deeply embedded (row streaming, batch processing, header aliases). Generalizing it would break its focused purpose. |
Recommendation
Section titled “Recommendation”Option A — new abstraction in common-module. The
presigned URL and key-construction logic is inherently cross-component (any
service that needs file upload/download will need it). Placing it in
common-module from the start avoids the extract-and-refactor cycle. It
complements rather than replaces CsvS3BucketDirectAccess.
Decision
Section titled “Decision”Agreed, Option A.
Additional Nice to have but not required: Extract the common elements between the existing CsvS3BucketDirectAccess and the new S3BucketAccess into a base class or utility helpers.
Applied To
Section titled “Applied To”DQ-008: Object Immutability and Versioning
Section titled “DQ-008: Object Immutability and Versioning”Context
Section titled “Context”The design exploration establishes that bulk objects are considered immutable — change is handled by updating references in business entities. Denis’s design enables S3 versioning. These two approaches have different implications.
| Option | Description | Trade-offs |
|---|---|---|
| A. Immutable objects, no versioning | Each upload creates a new object with a unique key (UUID). “Replacing” an image means uploading a new object and updating the entity reference. Old objects are orphaned and cleaned up periodically. S3 versioning disabled. | Pro: Aligns with design exploration principle. Simplest S3 configuration. CDN cache invalidation is unnecessary (new key = new URL = automatic cache bypass). No versioning storage cost. Con: Orphaned objects accumulate and need cleanup. Cannot recover “previous version” without application-level history (bitemporal entity history provides this). |
| B. Versioned objects, same key | Uploads overwrite the same key (derived from entity-id + feature). S3 versioning preserves previous versions. Entity always points to “latest.” | Pro: No orphaned objects. Built-in version history in S3. Can recover previous versions via S3 version-id. Con: CDN cache invalidation required on every update (CloudFront invalidation costs $0.005 per path after the first 1,000/month). Entity URL doesn’t change, so caches serve stale content until invalidated. Versioning increases storage cost (all versions retained). Contradicts the stated immutability principle. |
| C. Hybrid — immutable with lifecycle cleanup | Same as Option A, but with a lifecycle rule that transitions orphaned objects to cheaper storage (e.g., Glacier IR after 90 days) and eventually deletes them. “Orphan” detection via periodic scan comparing S3 keys against entity references. | Pro: Combines immutability with cost management. No cache invalidation needed. Storage cost controlled over time. Con: Requires an orphan cleanup mechanism (could be simple: lifecycle rule on a staging prefix, or complex: cross-reference scan). |
Recommendation
Section titled “Recommendation”Option A (immutable, no versioning) with a note to implement orphan cleanup if storage growth becomes a concern. The immutability principle is clean, avoids CDN cache invalidation entirely, and aligns with the bitemporal entity model (the entity’s history already tracks which image was active at any point in time). The orphan cost is negligible for product images (small files, low volume per tenant).
Decision
Section titled “Decision”Pending of human team review with Denis.
- Initial: Option A
- Future: Add Orphan Cleanup per
Cor when the referring entity is marked asretiredin the bitemporal model.
Applied To
Section titled “Applied To”Copyright: (c) Arda Systems 2025-2026, All rights reserved
Copyright: © Arda Systems 2025-2026, All rights reserved