Static Asset Repository

Static Asset Repository¶

This document describes the Static Asset Repository used by Arda Cloud to store and serve customers’ assets such as
images, videos, documents, etc.

The Static Asset Repository provides per-tenant storage and access control, ensuring that each customer’s assets are
securely stored and accessible only to authorized users.
It is designed to be scalable, reliable, and performant, leveraging cloud storage technologies to meet the needs of
Arda Cloud’s customers.

The Static Asset Repository provides only secure reading and writing of files.

Architecture¶

The Static Asset Repository consists of the following components:

Access Control: The Static Asset Repository implements access control mechanisms to ensure that only authorized
users can access the stored assets. This includes authentication and authorization processes to verify user identities
and permissions.
API Layer: The Static Asset Repository exposes a set of APIs that allow customers to interact with their stored
assets. These APIs support operations such as uploading and downloading files.
Storage Backend: The Static Asset Repository uses a cloud storage service as the underlying storage backend to
store customers’ assets.

The Static Asset Repository is a typical microservice that encapsulates a single bounded context, and it is designed to
be loosely coupled with other components of the system, allowing for independent development, deployment, and scaling.

Components¶

Our API Gateway, along with Cognito for authentication and authorization, serves as the access control layer for the
Static Asset Repository. It ensures that only authenticated and authorized users can access the stored assets.
It relies on the request’s JWT token to identify the user and the tenant. It enforces access control policies and
redirects unauthenticated access to the authentication service.

The API layer is a simple AWS lambda function that associates the request URL with the tenant id to form a partitioned
namespace in the storage backend.
It also handles file upload and download requests, ensuring that they are properly routed to the correct storage
location based on the tenant’s namespace.
It redirects read requests to pre-signed URLs and replies to write requests with pre-signed URLs to allow direct upload
to
the storage backend, improving performance and reducing latency.

The storage backend is one, or more if needed, S3 buckets that are accessible only to the lambda. The number of buckets
and their configuration being entirely transparent to the users of the Static Asset Repository, who interact with it
solely through
the API layer.

Key Concepts¶

Arda, System and Storage Keys¶

The system distinguishes between the Arda-Key, which describes where the asset lies in the global Arda Cloud domain,
the System-Key, which is the unique identifier by which the asset is known to the Static Asset Repository,
and the Storage-Key, which is the actual key of the object in the storage backend.

The Arda-Key is specified by the user at asset creation and is used for logging, debugging and customer support. The
Arda-Key is not required to be unique, but it should be descriptive enough to help identify the asset in case of
issues.
The key should follow the format ${owning-module}/${entity-type}/${property-name}/${asset-name}.${ext}

The System-Key is a unique identifier for the asset that is generated by the Static Asset Repository at asset
creation. It is used to form the storageKey and must prevent name clashes.
The key is set to {uuid}/{basename from the Arda-Key} to ensure uniqueness while keeping some human readability.

The Storage-Key is generated by the Static Asset Repository at asset creation and is used to store and retrieve the
asset in the storage backend.
The key is the actual S3 object key, set to {tenantId}/{System-Key}.

The Static Asset Repository stores the Arda-Key as S3 object metadata, which can be queried from the S3 Metadata
Tables.
The Static Asset Repository maps dynamically between System-Key and Storage-Key.

Pre-Signed vs Persistent URLs¶

The underlying storage is designed to be opaque and not directly accessible by users, so the Static Asset Repository
API provides pre-signed URLs for both reading and writing assets. These pre-signed URLs are time-limited and provide
temporary read or write access to the asset in the storage backend. The presigned URLs are generated by the API layer
and combine the bucket host and the storage key.

The Persistent URL for an asset is the URL of the Static Asset Repository API endpoint that can be used to access the
asset. These persistent URLs are stable and do not change over time. The Persistent URLs are generated by the API layer
and combine the API Gateway host and the system key.

The Static Asset Repository API handles the generation of pre-signed URLs and the redirection of requests to the storage
backend, ensuring that users can access their assets securely and efficiently.

Interactions¶

These interaction diagrams introduce the generic “Arda Component” as a stand-in for specific components that keep track
of document assets, such as a user headshot for account or a product image for operations.

These interaction diagrams focus on the interactions between the SPA, the API Gateway, the Static Asset Repository
lambda and the S3 storage backend; other components of Arda Cloud use the same Static Asset Repository API to manage
their assets, but they might bypass the API Gateway and interact directly with the Static Asset Repository lambda, as
they are trusted components of the system. Opening up this path would be easier if the Static Asset Repository lambda
were instead deployed to the cluster as a regular component, which might be considered in the future.

Write¶

Read¶

Details¶

API¶

The Static Asset Repository API provides the following endpoints.

Note that the API Gateway is in charge of the actual authentication of the requests and will redirect unauthenticated
requests to the login page, while the API layer is in charge of authorization and will reject unauthorized requests with
a 403. This means that the API layer can assume that all requests it receives are authenticated and include a valid JWT
token.

POST /assets¶

The call requires a JWT token for authentication and authorization.
The payload specifies an ardaKey, the contentType of the asset to be uploaded, and, optionally, its contentLength
and a checksum for integrity verification.

Return

a 200 with a persistent URL for future access and a pre-signed upload URL for the
asset in the storage backend.
a 401 Unauthorized if the JWT token is missing, expired, or otherwise invalid.
a 403 if the user is not authorized to access the asset.

GET /assets/{systemKey}¶

The call requires a JWT token for authentication and authorization, and the systemKey path parameter to specify the
asset to be retrieved.

As per the S3 API, this call supports the optional query parameter versionId:

The optional versionId query parameter can be used to specify a particular version of the asset to retrieve. If not
specified, the latest version of the asset will be retrieved.

Return

a 302 redirect to a pre-signed URL for the asset in the storage backend.
a 401 Unauthorized if the JWT token is missing, expired, or otherwise invalid.
a 403 if the user is not authorized to access the asset.
a 404 if the asset does not exist.

HEAD /assets/{systemKey}¶

The call requires a JWT token for authentication and authorization, and the systemKey path parameter to specify the
asset
to be retrieved.

As per the S3 API, this call supports the optional query parameter versionId:

The optional versionId query parameter can be used to specify a particular version of the asset to retrieve. If not
specified, the latest version of the asset will be retrieved.

Return

a 200 with the headers of the asset in the storage backend, including metadata such as content type, content length,
checksum, etc.
a 302 redirect to the login page if the user is not authenticated.
a 401 Unauthorized if the JWT token is missing, expired, or otherwise invalid.
a 404 if the asset does not exist.

GET /versions/{systemKey}¶

The call requires a JWT token for authentication and authorization, and the systemKey path parameter to specify the
asset to be retrieved.

The call retrieves a list of all versions of the asset, instead of a specific version, in a payload TBD.
The call follows the pagination mechanism of the S3 API.

Return

a 200 with the list of versions of the asset in the storage backend, including metadata such as content type, content
length, checksum, etc.
a 302 redirect to the login page if the user is not authenticated. This to be actually handled by the API Gateway.
a 401 Unauthorized if the JWT token is missing, expired, or otherwise invalid.
a 404 if the asset does not exist.

Bucket organization¶

The Static Asset Repository uses initially a single S3 bucket to store all tenants’ assets, with a partitioned namespace
based on tenant IDs.

This approach simplifies management and reduces costs while providing sufficient isolation between tenants.

The Static Asset Repository might in the future introduce tenant-specific buckets if required, or shard the storage
across multiple buckets. This all remains transparent to the users of the Static Asset Repository, who interact with it
solely through the API layer.

The systemKey is set to {uuid}/{basename from the ardaKey} to ensure uniqueness while keeping some human readability.

The object key follows the format {tenantId}/{systemKey}.

Objects in the bucket are versioned to protect against accidental overwrites and deletions,
and to allow for recovery of previous versions if needed.

Static Asset Repository maintains custom object metadata:

arda-user records the user that created or last modified the asset; it is set to the JWT’s sub claim.
arda-key records the Arda Key of the asset, which will be used for logging.

Security¶

S3 Bucket¶

The bucket is configured with the following security settings to ensure that the stored assets are protected against
unauthorized access and data breaches:

Keep the default S3 Object Ownership, which is to have objects owned by the bucket owner and ACLs disabled, to
prevent unintended access to objects through ACLs.
Keep the default encryption at rest setting, which applies server-side encryption with Amazon S3 managed keys (SSE-S3).
Do not use S3 Object Lock.
Apply the default Tags (Infrastructure, Partition, Component) to the bucket for cost allocation and management
purposes.
Keep the default Public Access settings, which block all public access to the bucket and its objects, to prevent
unauthorized access to the stored assets.
Enable Bucket Versioning to protect against accidental overwrites and deletions, and to allow for recovery of
previous
versions if needed.
Set the Removal Policy to RETAIN to prevent accidental deletion of the bucket and its contents. This ensures that
the bucket and its assets are preserved even if the CDK stack is deleted.
Set autoDeleteObjects to false to prevent accidental deletion of objects when the bucket is deleted. This ensures
that the assets are preserved even if the bucket is deleted.

IAM Policy¶

Static Asset Repository is the only component in the system with direct access to the S3 bucket. Its IAM
policy is designed to follow the principle of least privilege, allowing only the necessary permissions for
Static Asset Repository to perform its operations. The initial set of permissions is:

s3:PutObject
s3:GetObject
s3:GetObjectVersion
s3:ListBucketVersions

No other policies should grant access to the bucket, and the bucket policy should explicitly deny any access that does
not come from Static Asset Repository.

Expiration¶

The signed URLs have a short expiration time (initially set to 5 minutes) to minimize the risk of unauthorized access if
the URL is leaked or intercepted.
The underlying S3 object does not have an application-level expiration; its cacheability is controlled by the HTTP
response headers set on the object (for example, Cache-Control and ETag). Browsers and intermediaries may treat each
pre-signed URL (which includes a unique query string) as a separate cache entry, so only the object response—not the
pre-signed URL itself—is expected to be cached.

Appendix¶

Lifecycle¶

A bucket configured with RemovalPolicy.RETAIN and autoDeleteObjects: false is treated as a stateful asset whose
lifecycle extends beyond the CDK stack.

Lifecycle behavior:

Event	Expected behavior	Required operator action
Create	CloudFormation creates and configures the bucket.	Verify required tags and baseline controls are applied.
Update (no replacement)	CloudFormation updates bucket configuration in place; data remains.	Validate no policy regression and no unexpected permission broadening.
Update (replacement required)	New bucket is created; old bucket is retained because of `RETAIN`.	Plan and execute explicit data migration before cutover; register old bucket as retained resource.
Stack deletion	Stack is deleted, but the bucket and all versions remain.	Record retained bucket in runtime inventory with owner and retention intent.
Final decommission	No automatic delete occurs for retained bucket/data.	Follow the decommission runbook (approval, archive/export if needed, object purge, bucket delete).

Operational runbooks:

Stack deletion with retained bucket
Confirm business owner approval and ticket/reference for deletion.
Capture bucket inventory (including versions and delete markers) for traceability.
Verify retention/compliance constraints (for example legal hold requirements) before deleting the stack.
Delete the CDK stack.
Register the retained bucket in the retained-resources inventory with tags: Owner, System, Environment,
DataClass, RetentionPolicy, DecommissionTicket.
Final decommission of retained bucket
Obtain explicit legal/compliance approval to remove retained data.
Archive/export required data to long-term storage if mandated by policy.
Remove all objects, versions, and delete markers.
Delete the bucket.
Close the decommission ticket and remove the bucket from retained-resources inventory.

Static Asset Repository