Upload Product Images and Managed File Assets

Session

Restore Command

You are a Principal Engineer (see workspace/instructions/claude/agents/principal-engineer.md).
We are running a complex project definition and planning session using the
/complex-project-definition-and-planning skill.

Project directory: workspace/projects/mvp2/12-upload-product-images/
Read project-description.md for full context.

We are in Phase 1: Context Gathering. The project description has been written
and confirmed at a high level, but we are continuing to explore non-functional
constraints and infrastructure decisions before moving to Phase 2 (Design with
Alternatives).

Repositories involved: common-module, operations, infrastructure, api-test.
Reference the existing code patterns documented in the project description.

Key open areas still being explored:
- Security model for asset URLs (DQ-1)
- Bucket strategy and lifecycle (DQ-2)
- CDN architecture (DQ-6)
- Non-functional requirements (performance, cost, operational concerns)

Session Status

Phase	Status
Phase 1: Context Gathering	In progress — project description written, continuing NFR exploration
Phase 2: Design with Alternatives	Not started
Phase 3: Three-Document Creation	Not started
Phase 4: Decision Rounds	Not started
Phase 5: Release Planning	Not started
Phase 6: Plan Finalization	Not started

Context

The Arda platform needs a general-purpose file and asset upload mechanism that allows the UI to upload files to a managed S3 bucket. The uploaded assets can then be referenced by entity fields (e.g., Item.imageUrl) and served directly to HTTP clients without the backend server acting as a proxy.

The first use case is uploading product images (PNG, SVG, JPEG) to be set as the imageUrl of Item entities in the Item module of the operations component.

End Goal

Implement the ability to create and update Items that use product image URLs pointing to a managed S3 bucket. The use case includes:

A workflow for uploading image files via presigned S3 URLs.
Storing the resulting stable URL as Item.imageUrl.
Serving images directly to HTTP clients, ideally via an AWS CDN (CloudFront) to optimize delivery of static assets.

Use Case Specifications

The user-facing behavioral contracts for this project are defined in the product use case documentation. See the Use Cases Analysis for a summary of requirements, decisions, and rationale.

Use Case	Description	Link
`GEN::MEDIA::0001`	Set Entity Image — unified input surface covering file upload, drag-and-drop, clipboard paste, and URL entry	entity-media.md
`GEN::MEDIA::0002`	Remove Entity Image — clear image and revert to placeholder	entity-media.md
`REF::ITM::0003::0010`	Set Item Image During Creation	items.md
`REF::ITM::0004::0006`	Change or Remove Item Image	items.md
`REF::ITM::0006::0005`	Image Column in Bulk Import/Export	items.md

Existing Architecture

Infrastructure Layer (CDK)

A single UploadBucket is created per partition (e.g., alpha001-prod-partition-upload-bucket) via the BulkStoresStack CDK construct in /infrastructure/src/main/cdk/stacks/purpose/partition-bulk-stores.ts.

Current bucket characteristics:

S3-managed encryption (AES256), BLOCK_ALL public access, no versioning.
1-day TTL lifecycle with automatic expiration — designed for ephemeral upload processing (CSV files).
Conditional CORS for PUT/POST from whitelisted app URLs.
A presigning IAM role (UploadPreSigningRole) that the backend pod assumes to generate presigned URLs.
Cross-stack exports: UploadBucketArn, UploadBucketName, UploadPresignRoleArn keyed under ${Infrastructure}-${Purpose}-API-*.

The UploadBucket construct (/infrastructure/src/main/cdk/constructs/storage/public-upload-bucket.ts) is parameterized with name and expirationDays, making it reusable for creating additional buckets with different lifecycle policies.

Operations Component (CloudFormation)

/operations/src/main/cloudformation/pre-install.cfn.yml imports the bucket ARN and presign role ARN from the infrastructure exports. The pod’s service account role gets s3:GetObject, s3:PutObject, s3:ListBucket and the ability to assume the presigning role.

common-module — S3 Abstraction

CsvS3BucketDirectAccess in /common-module/lib/src/main/kotlin/cards/arda/common/lib/infra/storage/CsvS3ObjectDirectService.kt is the only S3 abstraction. It handles:

Presigned PUT URL generation with metadata headers (x-amz-meta-tenant-id, x-amz-meta-author).
Streaming reads via Flow<RawLine> with batch processing.
Compression support (GZIP, BZIP2).

This abstraction is CSV-specific — not a general file/asset service.

operations — CSV Upload Workflow (existing pattern)

The existing CSV upload flow provides the architectural pattern:

CsvUploadService (/operations/src/main/kotlin/cards/arda/operations/common/lib/service/csvUpload/CsvUploadService.kt) orchestrates: generate presigned URL, return job ID, client uploads CSV, server processes rows asynchronously.
JobService / JobTracker (/operations/src/main/kotlin/cards/arda/operations/system/batch/service/JobService.kt) provides async job tracking with status state machine (PENDING, RUNNING, COMPLETED, FAILED).
ItemCsvUploadService maps CSV rows to domain entities including imageUrl validation as URI.

Item Entity — `imageUrl` Already Exists

The Item entity already has imageUrl: URL? fully implemented at every layer:

Layer	Type	Location
Business Entity	`URL?`	`/operations/.../item/business/Item.kt`
API Input Model	`String?`	`/operations/.../item/api/Model.kt`
Persistence	`url("image_url").nullable()`	`/operations/.../item/persistence/ItemPersistence.kt`
CSV Proto	`string image_url` (URI validated)	`/operations/.../item/csv/v1beta1/item_row.proto`

The field is fully wired — it just lacks a mechanism to populate it from an uploaded file.

Project Scope

In Scope

General S3 file access abstraction in common-module — a reusable capability (not CSV-specific) for presigned URL generation, metadata management, and object key structuring. Should support multiple use cases without over-engineering.
S3 object key structure that provides tenant isolation and feature/module namespacing to minimize collisions and enable per-prefix lifecycle policies.
AWS resource creation/configuration — either a new persistent-asset bucket (no TTL expiration) or reconfiguration of the existing bucket with prefix-based lifecycle rules.
CDN integration — CloudFront distribution for serving uploaded assets directly to HTTP clients without backend proxying.
API endpoints for the image upload workflow:
- Request a presigned upload URL for a product image.
- Confirm the upload and set Item.imageUrl to the resulting asset URL.
Item module integration — wire the upload workflow into the existing Item create/update flow.

Out of Scope

Image processing pipelines (resizing, format conversion, thumbnailing).
Bulk image upload (batch processing of multiple images in one operation).
UI implementation (frontend upload component — separate project).
Migration of existing imageUrl values.

Future Use Cases (inform design, do not implement)

The implementation should be structured to support these future scenarios without requiring architectural changes:

User profile images.
Order document scans.
CSV file uploads for bulk processing (already exists, could be unified).
Other static assets referenced by entity fields.

Design Questions

DQ-1: Security of Asset URLs

Question: How should access to uploaded assets be secured? Is it possible to restrict content access based on tenant-id?

Considerations:

Public URLs via CloudFront: Simple, fast, cacheable. No tenant isolation at the URL level. Object keys would include tenant-id as a path prefix but anyone with the URL could access the content. Acceptable if image content is not sensitive.
Presigned GET URLs: Time-limited access, generated by the backend on each request. Provides per-request authorization but defeats CDN caching and requires backend involvement for every image load.
CloudFront signed URLs or signed cookies: Tenant-scoped access via CloudFront key pairs. More complex setup but enables CDN caching with access control. Could scope cookies to tenant-specific path prefixes.
CloudFront + Origin Access Control (OAC) + Lambda@Edge: Full tenant isolation by validating tenant tokens at the edge. Most secure but most complex.

Security risks to evaluate:

URL guessability if object keys contain predictable patterns.
Cross-tenant data leakage if URLs are shared or logged.
Whether product images are considered sensitive data requiring access control.

Use case cross-reference: The HTTPS-only scheme constraint and data: URI rejection are specified in GEN::MEDIA::0001::0004.FS. See Use Cases Analysis.

DQ-2: Bucket Strategy

Question: How many S3 buckets should exist and how should they be organized?

Options:

One bucket per partition (current state) — all content types share one bucket, differentiated by object key prefix. Lifecycle rules applied per prefix. Simple to manage, but mixes ephemeral and persistent content.
Multiple buckets per partition by lifecycle/purpose — separate buckets for different content lifecycles:
- ephemeral-upload (current bucket, 1-day TTL for CSV processing).
- http-assets (persistent, no TTL, CloudFront-fronted, for images and static assets).
- Future: internal-bulk-storage (longer TTL, no public access, for internal processing).
One bucket per component — each microservice gets its own bucket. Maximum isolation but more infrastructure to manage.
Hybrid — one ephemeral bucket (current) plus one persistent assets bucket per partition, shared across components.

Factors:

CloudFront can only have one S3 origin per behavior (path pattern), so bucket organization affects CDN routing.
IAM policies and presigning roles are per-bucket.
The existing UploadBucket construct is parameterized and reusable — adding a second bucket to BulkStoresStack is straightforward.
Lifecycle rules can be prefix-based within a single bucket, but separate buckets provide cleaner operational boundaries.

DQ-3: Object Key Structure

Question: What hierarchy should S3 object keys use?

Candidates:

{tenant-id}/{feature}/{entity-id}/{filename} — tenant-first for IAM policy scoping and prefix-based access control.
{feature}/{tenant-id}/{uuid}.{ext} — feature-first for CloudFront path pattern routing and lifecycle rules.
{feature}/{tenant-id}/{entity-id}/{uuid}.{ext} — hybrid with entity context for debugging/audit.

Constraints:

Must support prefix-based IAM policies for tenant isolation (if required).
Must support CloudFront path pattern routing.
Must minimize collision risk (UUID component required).
Should be predictable enough for the backend to construct without a lookup table.

DQ-4: S3 Abstraction in common-module

Question: What should the new general-purpose S3 abstraction look like?

Considerations:

Should generalize presigned URL generation (PUT and GET) beyond CSV files.
Should encapsulate the object key structure convention.
Should handle metadata (tenant-id, author, content-type, feature context).
Should be usable from any module in any component.
Should not over-abstract — start with what the image upload use case needs.
Relationship to existing CsvS3BucketDirectAccess: complement it, do not replace it (CSV-specific streaming logic remains valuable).

DQ-5: Upload Workflow and API Design

Question: What is the upload-then-link workflow?

Candidates:

Two-step: (1) POST to get presigned URL, (2) client uploads to S3, (3) PUT to Item to set imageUrl. Simple, but the Item update is a separate call and the image may be orphaned if step 3 never happens.
Upload-and-link: (1) POST to get presigned URL with Item context, (2) client uploads to S3, (3) POST to confirm upload, which validates the S3 object exists and atomically sets Item.imageUrl. Prevents orphaned images.
S3 event-driven: Upload triggers S3 event notification, Lambda or SQS consumer validates and links. Most decoupled but most infrastructure.

Use case cross-reference: The presigned upload workflow is the internal implementation for the managed upload path in GEN::MEDIA::0001::0006.FS (Confirm and Persist). The user-facing input detection is defined in GEN::MEDIA::0001::0002.FS. See Use Cases Analysis.

DQ-6: CDN Configuration

Question: How should CloudFront be configured for serving assets?

Considerations:

Origin Access Control (OAC) vs. Origin Access Identity (OAI) — OAC is the modern recommended approach.
Cache behavior routing by path prefix (e.g., /assets/* routes to the assets bucket).
Cache invalidation strategy when an image is replaced.
Custom domain and SSL certificate requirements.
Whether the CDN is created in the infrastructure CDK or managed separately.

Repositories Involved

Repository	Role	Changes Expected
`common-module`	General S3 file access abstraction	New classes in `lib/infra/storage/`
`operations`	Item module integration, API endpoints	New upload routes, module wiring
`infrastructure`	S3 bucket creation, CloudFront, IAM	New/updated CDK constructs and stacks
`api-test`	API verification	New Bruno test collections

References

Existing upload construct: /infrastructure/src/main/cdk/constructs/storage/public-upload-bucket.ts
Bulk stores stack: /infrastructure/src/main/cdk/stacks/purpose/partition-bulk-stores.ts
CSV upload service: /operations/src/main/kotlin/cards/arda/operations/common/lib/service/csvUpload/CsvUploadService.kt
S3 access abstraction: /common-module/lib/src/main/kotlin/cards/arda/common/lib/infra/storage/CsvS3ObjectDirectService.kt
Item entity: /operations/src/main/kotlin/cards/arda/operations/reference/item/business/Item.kt
Item endpoint: /operations/src/main/kotlin/cards/arda/operations/reference/item/api/rest/ItemEndpoint.kt
Pre-install CloudFormation: /operations/src/main/cloudformation/pre-install.cfn.yml
Download items spec: /technical-documentation/contents/1_specifications/demo202509/use-cases/download-items.md
Module design docs: /technical-documentation/contents/2_design/2_functional/general/module-design/index.md

Design Exploration

S3 Bucket Architecture at the system level.

The system is expected to need bulk storage for different purposes.

Objects stored will always be referenced by business entities in the sytem and their lifecycle and identity will be tied to the business entities that reference them or use them.
In all cases, the relationship between business entities and the objects they reference will be one-to-many in terms of referential integrity. When additional business entities need access to a bulk object, they will access it (referentially) through the business entity that owns it, regardless of the actual access path to the contents. i.e. to retrieve an item’s image, the client entity will request the image form the item entity and will not denormalize the reference except in rare cases.
In general, bulk objects will be considered immutable as their change can be handled by updating references in the business entities that point to them.
Bulk storage, under no circumstances will be used by different Modules to communicate or exchange data as shared global state

The different characteristics of the files to be stored:

Http Accessible/Internal Use only:
- Http Accessible assets need to be served over http as-is so that clients can display them or use them in other ways. A typical example is an item’s image, a user picture, a company logo, etc.
  - Http Accessible assets areexpected to be served over a CDN and may be large. They can be considered immutable as their change can be handled by updating references in the business entities that point to them.
  - Access to these assets needs to be partitioned by tenant. The security guarantees and design to acomplish this is to be defined. It needs to be a balance between security and leveraging AWS native capabilities (including CDN) without requiring involvement of the backend micro-services.
- Internal Use Assets are those that will be accessed only by internal backend services that are trusted and have appropriate AWS IAM permissions.
Internally Sourced/Externally Uploaded: Http Assets can be uploaded by external clients (users through the UI) or could be generated by internal processes in the system and directly placed in the S3 bucket using AWS SDKs from backend services that have the appropriate AWS IAM permissions.
Durable/Ephemeral: Some assets will be long lived (durable), with a lifecycle explicitly tied to the business entities that reference them. Others will be ephemeral, possibly “single use” (subject to retries) like an uploaded CSV file that once it is processed is not longer needed and can be purged, or a file provided to a user for download that has an expiration date or a “single-download” policy.

Key Questions:

Access control for Http Accessible Assets
- Integrate with API Gateway & Cognito via Lambda Functions?
- Separate access control?
- How it impacts CloudFront integration?
- How it impacts Signed URLs and upload/download performance?
- Is it possible to partition and secure based on tenant keys?
S3 Operational Configuration
- How many buckets to configure?
- By Partition, by Component or by Module?
- How many based on usage characteristics?

Exploration Todo

Items to research and resolve before moving to Phase 2 (Design with Alternatives). Mark items [x] as they are addressed and summarize findings inline or in linked sections above.

Security and Access Control

CloudFront signed URLs vs. signed cookies — Researched. Signed cookies can be scoped to /{tenant-id}/* via custom policy. Sub-ms edge verification. Not part of cache key, so full CDN caching preserved. Requires RSA/ECDSA trusted key groups. See DQ-002 in decision-log.md.
Origin Access Control (OAC) — Researched. OAC is the modern replacement for OAI. Uses SigV4 to sign requests to S3. Bucket policy grants s3:GetObject to cloudfront.amazonaws.com with SourceArn condition. OAC applies to the entire origin (no per-path scoping). See DQ-001, DQ-006.
Tenant isolation at the URL level — Options documented in DQ-002. Awaiting decision on whether product images are sensitive enough to require per-tenant access control. (Informs DQ-002)
Lambda@Edge / CloudFront Functions — Researched. CloudFront Functions cannot verify RS256/ES256 JWTs (only HMAC). Lambda@Edge can but adds latency and us-east-1 deployment constraint. See DQ-002 Option C.

Infrastructure and Cost

CloudFront pricing model — Researched. For 10K images x 500KB x 50 serves/month (US/EU): ~$0.50 requests + ~$20.23 data transfer = ~$20.73/month. However, CloudFront Always Free tier (1 TB out + 10M requests/month) likely covers this workload entirely ($0.00). Cache invalidation is irrelevant with immutable keys (DQ-008). See decision-log.md DQ-006.
S3 storage cost projection — Researched. For 100 tenants x 1,000 images x 500KB = ~47.7 GB: ~$1.10/month storage + $0.52 requests = ~$1.62/month. Orphan waste (10%) adds ~$0.11/month. Negligible cost. See DQ-001.
Multi-bucket vs. prefix-based lifecycle — Researched. S3 supports up to 1,000 prefix-scoped lifecycle rules per bucket. Different prefixes can have different expiration/transition rules. Separate buckets provide cleaner IAM and OAC boundaries. See DQ-001.
Existing CloudFront constructs — Found. ApiCloudFront construct exists at /infrastructure/src/main/cdk/constructs/xgress/api-cloudfront.ts but is API-specific (no caching, all methods, HTTP origin). No S3-origin CloudFront construct exists — net-new for assets. See DQ-006.

Operational Concerns

Orphaned object cleanup — Researched. Simplest approach: staging prefix + lifecycle rule. Uploads land in staging/{tenant}/{uuid}, backend copies to images/{tenant}/... on confirm, lifecycle rule expires staging/ objects after 7 days. Zero Lambda/SQS infrastructure. Also add a lifecycle rule to abort incomplete multipart uploads after 1 day. See DQ-005, DQ-008.
Upload failure handling — Researched. For single-part PUT/POST: if client disconnects mid-upload, S3 does not create the object (requires full body match to Content-Length). For multipart uploads: incomplete parts persist as invisible fragments that incur storage charges — lifecycle rule to abort after 1 day handles this. Presigned URL expiration: client gets 403, must request a new URL. See DQ-005.
Monitoring and alerting — Deferred to design phase. S3 metrics, CloudFront cache hit ratios, upload error rates. Existing ApigwDashboard construct provides a pattern for CloudWatch dashboards. (NFR — defer to design)
Storage growth and retention — Deferred to design phase. Per-tenant quotas and retention policy for replaced images should be specified in the design document. At projected volumes (~50 GB total), not a blocking concern. (NFR — defer to design)

Content Validation

File type and size enforcement — Researched. Critical finding: presigned PUT URLs cannot enforce Content-Type or Content-Length server-side. Presigned POST (form-based upload) can enforce via policy conditions: content-length-range (e.g., 1 byte to 10 MB) and Content-Type starts-with (e.g., image/). S3 validates these server-side. This means DQ-005 should use presigned POST, not presigned PUT. Post-upload Lambda validation (magic byte inspection) provides defense-in-depth against spoofed MIME types.
Malware scanning — Researched. GuardDuty Malware Protection for S3: ~$25.79/month for 100K images (post Feb 2025 price reduction). Likely disproportionate for MVP with authenticated B2B users uploading product images. Presigned POST constraints + post-upload magic byte validation covers the realistic threat model. Defer GuardDuty to a later phase; staging prefix pattern supports drop-in enablement later. (NFR — defer)

Local Development and Testing

LocalStack/MockAWS compatibility — Researched. MockAWS in common-module uses LocalStack with S3 only (LocalStackContainer.Service.S3). Supports: bucket creation, PutObject, GetObject, HeadObject, presigned URLs with SigV4 signing and custom metadata. CloudFront is not mocked by LocalStack — signed cookie/URL testing requires WireMock or real AWS integration tests. The S3 presigning abstraction is fully testable; CloudFront distribution logic is CDK-level (synthesized, not runtime-tested). See DQ-007.
Presigned URL testing — Researched. CsvS3DirectAccessTest in common-module tests presigned PUT URL generation including SigV4 signature validation, custom metadata headers (x-amz-meta-tenant-id, x-amz-meta-author), and URL structure assertions. The S3Presigner pattern is reusable for the new abstraction. Note: presigned POST (recommended for content validation) uses a different signing mechanism than presigned PUT — the test harness will need extension. See DQ-007.

Workflow and API

Two-phase upload pattern in the industry — Denis’s FileStore design follows the standard pattern (presigned URL -> direct upload -> persist key in entity). Same pattern used by GitHub (release assets), Slack (file uploads), Stripe (file uploads). See DQ-005.
Image replacement semantics — Addressed by DQ-008. Recommendation: immutable objects with new UUID keys. Old objects orphaned, cleaned up by lifecycle or periodic scan. Entity bitemporal history tracks which image was active when.
Bulk image upload (future) — Assessed. The recommended key structure ({tenant-id}/{feature}/{uuid}.{ext}) and presigned POST workflow accommodate batch uploads without redesign: the client requests N presigned POST forms in parallel, uploads N files to S3, then updates N entities. The abstraction in common-module would expose a batch variant that returns multiple presigned forms. The staging prefix + lifecycle cleanup pattern handles partial batch failures (some images uploaded, some not) naturally. No architectural changes needed. (Informs DQ-004)

Upload Product Images and Managed File Assets

Session

Restore Command

Session Status

Context

End Goal

Use Case Specifications

Existing Architecture

Infrastructure Layer (CDK)

Operations Component (CloudFormation)

common-module — S3 Abstraction

operations — CSV Upload Workflow (existing pattern)

Item Entity — imageUrl Already Exists

Project Scope

In Scope

Out of Scope

Future Use Cases (inform design, do not implement)

Design Questions

DQ-1: Security of Asset URLs

DQ-2: Bucket Strategy

DQ-3: Object Key Structure

DQ-4: S3 Abstraction in common-module

DQ-5: Upload Workflow and API Design

DQ-6: CDN Configuration

Repositories Involved

References

Design Exploration

S3 Bucket Architecture at the system level.

Exploration Todo

Security and Access Control

Infrastructure and Cost

Operational Concerns

Content Validation

Local Development and Testing

Workflow and API

Item Entity — `imageUrl` Already Exists