Deployment Orchestration (amm.sh)

The amm.sh script (“Arda Money Making”) is the top-level deployment orchestrator for the Arda platform. It provisions an entire Infrastructure and one or more Partitions in a single run, coordinating CDK, CloudFormation, Helm, and kubectl commands in sequence.

Usage

Command Line

./amm.sh [--profile <aws_profile>] [--region <aws_region>] <infrastructure> <partition...>

Argument	Required	Description
`--profile <profile>`	No	Sets `AWS_PROFILE`. Defaults to `Admin-<infrastructure>` via `AWS_DEFAULT_PROFILE` when running locally.
`--region <region>`	No	Sets `AWS_REGION`. When omitted, the region is inferred from the AWS profile.
`<infrastructure>`	Yes	The Infrastructure name: `Alpha001`, `Alpha002`, or `SandboxKyle002`.
`<partition...>`	Yes	One or more Partition names, or the keyword `all`.

The all keyword expands to a predefined list per Infrastructure:

Infrastructure	`all` expands to
`Alpha001`	`demo`, `prod`
`Alpha002`	`dev`, `stage`
`SandboxKyle002`	`kyle`

Examples:

# Deploy Alpha002 infrastructure + dev partition (local, with SSO)
./amm.sh Alpha002 dev

# Deploy Alpha001 with both partitions
./amm.sh Alpha001 all

# Explicit profile and region
./amm.sh --profile Admin-Alpha002 --region us-east-1 Alpha002 dev stage

GitHub Actions Workflow

The amm.yml workflow provides a workflow_dispatch trigger with a dropdown of Infrastructure/Partition combinations:

Alpha001/demo, Alpha001/prod
Alpha002/dev (default), Alpha002/stage
SandboxKyle002/kyle

The workflow:

Splits the environment input into infrastructure and partition.
Fetches AWS account ID and region from the purpose-configuration-action using a locator URL.
Assumes the IAM role <Infrastructure>-I-GitHubActionInfrastructure via OIDC (id-token: write).
Runs npm install, then invokes ./amm.sh <infrastructure> <partition>.

Secrets are passed as environment variables — the script detects GITHUB_ACTIONS=true and skips the local 1Password / SSO login paths.

Required Credentials and Secrets

Local runs (interactive)

The script uses 1Password CLI (op read) to resolve secrets at runtime. The operator must be signed into 1Password and have access to the following vaults:

Vault	Secrets	Used for
`Arda-SystemsOAM`	`Amplify_GitHub_AccessToken`, `GPR-Read token`	Amplify GitHub integration, GitHub Packages auth
`Arda-ProdOAM`	`ARDA-SIGNUP-KEY`	HubSpot signup authentication
`Arda-StageOAM`	`HubSpot/client_secret`, `HubSpot/private_access_token`, `Pylon/widget_secret`	Third-party integrations
Per-partition vault	`ARDA-API-KEY`	Partition API key

The per-partition vault is resolved via the PARTITION_VAULT_MAP:

Partition	1Password Vault
`dev`	`Arda-DevOAM`
`stage`	`Arda-StageOAM`
`demo`	`Arda-DemoOAM`
`prod`	`Arda-SystemsOAM`
`kyle`	`Arda-SandboxKyle`

AWS authentication uses SSO — the script calls aws sso login before the Infrastructure step and again before each Partition step.

GitHub Actions runs

All secrets are stored as GitHub Actions repository secrets:

Secret	Value
`AMPLIFY_GITHUB_ACCESSTOKEN`	GitHub PAT for Amplify source access
`ARDA_API_KEY_<partition>`	Per-partition API key (e.g., `ARDA_API_KEY_dev`)
`ARDA_SIGNUP_KEY_KYLE`	HubSpot signup key
`HUBSPOT_CLIENT_KEY_STAGE`	HubSpot client secret
`HUBSPOT_PAT_STAGE`	HubSpot private access token
`PYLON_WIDGET_KEY_STAGE`	Pylon widget secret
`GPR_READ_KEY`	GitHub Packages read token

The IAM role is assumed via OIDC federation (role-to-assume), not long-lived credentials.

Dry Run and Validation

The script does not have a --dry-run flag. Each tool it orchestrates has its own preview mechanism that must be invoked individually.

CDK: synth and diff

synth generates CloudFormation templates without deploying. It does not require AWS credentials and runs as part of CI on every push and PR (the synth-each-cdk-app matrix job in ci.yaml).

# Synth a specific Infrastructure or Partition target
npm run synth:named -- Alpha002/infra
npm run synth:named -- Alpha002/dev

diff compares the synthesized templates against the currently deployed stacks. This requires valid AWS credentials.

npx cdk diff \
  --app 'npx ts-node -r tsconfig-paths/register --prefer-ts-exts src/main/cdk/instances/Alpha002/infra.ts'

CloudFormation: no-execute-changeset

For the raw CloudFormation templates (src/main/cfn/*.cfn.yaml), use --no-execute-changeset to create and inspect a changeset without applying it:

aws cloudformation deploy \
  --stack-name Alpha002-dev-Secrets \
  --template-file src/main/cfn/partitionSecrets.cfn.yaml \
  --no-execute-changeset \
  --parameter-overrides Infrastructure=Alpha002 Partition=dev \
    ArdaApiKey=... ArdaSignupKey=... HubspotClientKey=... HubspotPAT=... PylonWidgetKey=...

The changeset appears in the CloudFormation console for review. Delete it after inspection to avoid stale changesets blocking future deploys.

Helm: dry-run

helm upgrade --install --dry-run \
  --version 4.13.0 \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace dev-ingress-nginx \
  --set "controller.ingressClass=dev-nginx" \
  ingress-nginx ingress-nginx

This renders the manifests and validates them against the cluster API without applying changes.

kubectl: dry-run

kubectl apply --dry-run=client -f <manifest>

Use --dry-run=server for server-side validation (requires cluster connectivity).

CI as a validation gate

The ci.yaml workflow synthesizes every Infrastructure/Partition combination in a matrix:

Alpha001/infra, Alpha001/demo, Alpha001/prod,
Alpha002/infra, Alpha002/dev, Alpha002/stage,
SandboxKyle002/infra, SandboxKyle002/kyle

This catches CDK compilation errors, construct misconfiguration, and missing exports before any deployment. The all-synth-results job gates the pipeline — all targets must synth successfully for the build to pass.

Effects

Infrastructure Phase (runs once per invocation)

Pre-existing state assumptions

Green-field: The AWS account must exist and CDK must have been bootstrapped (cdk bootstrap). The script bootstraps automatically, but a pre-existing CDKToolkit stack from a different bootstrap version may require manual cleanup.
Upgrade: All prior CloudFormation stacks from the Infrastructure layer must be in a stable state (CREATE_COMPLETE, UPDATE_COMPLETE). ROLLBACK_COMPLETE stacks must be deleted manually before re-running.

Resources created or updated

Step	Tool	Resources
CloudWatch logging	CloudFormation (`cloudWatch.cfn.yaml`)	Log group `/arda/oam/deployments` with 14-day retention; log stream for the current date
CDK bootstrap	`cdk bootstrap`	`CDKToolkit` stack (S3 staging bucket, IAM roles)
Infrastructure CDK	`cdk deploy` (all stacks via `instances/<Infra>/infra.ts`)	VPC, EKS cluster, IAM roles, Route53 hosted zones, NLBs, security groups — everything in the Infrastructure layer
EKS kubeconfig	`aws eks update-kubeconfig`	Local `~/.kube/config` entry for the cluster
Fluent Bit logging	`kubectl apply`	`aws-observability` namespace, `aws-logging` ConfigMap (Fluent Bit → CloudWatch `/<infra>/eks-logs`)
AWS Load Balancer Controller	Helm (`aws-load-balancer-controller` v1.13.4)	Namespace `aws-load-balancer-controller`, LBC deployment, `ServiceAccount` with IAM role annotation
External Secrets Operator	Helm (`external-secrets` v0.19.1)	Namespace `external-secrets`, ESO deployment (cluster-scoped CRDs disabled)

Partition Phase (repeats for each partition)

Pre-existing state assumptions

Green-field: The Infrastructure phase must have completed successfully. CloudFormation exports from the Infrastructure layer (e.g., <Infra>-I-EksClusterName, NLB target group ARNs) must exist.
Upgrade: Partition CloudFormation stacks must be in a stable state. For Amplify targets, the <Infra>-<Part>-Amplify stack must exist before the branch/domain stack can be deployed.

Resources created or updated

Step	Tool	Resources
Partition CDK	`cdk deploy` (via `instances/<Infra>/<partition>.ts`)	Cognito user pools, API Gateway, DynamoDB tables, S3 buckets, Lambda functions, CloudFront distributions — everything in the Partition layer
nginx Ingress	Helm (`ingress-nginx` v4.13.0)	Namespace `<partition>-ingress-nginx`, nginx controller (2 replicas, `ClusterIP`), IngressClass `<partition>-nginx`
Target Group Bindings	`kubectl apply`	`TargetGroupBinding` CRs linking nginx to the NLB target groups (HTTP port 80, HTTPS port 443). Stale bindings are deleted.
Partition secrets	CloudFormation (`partitionSecrets.cfn.yaml`)	5 Secrets Manager secrets: `ArdaApiKey`, `ArdaSignupSecretKey`, `HubspotClientSecret`, `HubspotPrivateAccessToken`, `PylonWidgetSecret`
Amplify (full targets)	CloudFormation	See Amplify deployment
Amplify (manual targets)	CloudFormation + AWS CLI	See Amplify deployment

Amplify Deployment

The script handles two Amplify paths depending on whether the Infrastructure:Partition pair is in the AMPLIFY_DEPLOY_TARGETS list.

Full Amplify targets (SandboxKyle002:kyle, Alpha001:demo):

amplify.cfn.yaml — Creates the Amplify app, IAM service role, compute role, and wires environment variables from CloudFormation exports and Secrets Manager references.
amplifyBranch.cfn.yaml — Creates the branch resource, domain association, and optionally a PR preview branch (enabled only for dev).
Compute role workaround — Works around aws-cdk#34992 by calling aws amplify update-app if the compute role ARN drifts.
Initial deployment — Triggers an Amplify RELEASE job.

Auto-build is disabled for demo; PR preview is enabled only for dev.

Manual Amplify targets (all other partitions: Alpha001:prod, Alpha002:dev, Alpha002:stage):

amplifyComputeRole.cfn.yaml — Creates only the IAM compute role (SecretsManager, Cognito, Logging).
Attaches the role to the existing Amplify app via aws amplify update-app.
Merges INFRASTRUCTURE, PARTITION, NEXT_PUBLIC_INFRASTRUCTURE, NEXT_PUBLIC_PARTITION, and (if available) CLOUDFRONT_KEY_PAIR_ID into the app’s existing environment variables.

Deployment Flow

Overall Sequence

PlantUML diagram

Decision Logic Reference

The flow diagram above contains several branching points. This section documents the exact logic behind each decision.

”Running locally?” (credential resolution)

Evaluated by checking the GITHUB_ACTIONS environment variable and AWS_DEFAULT_PROFILE:

if [[ "${GITHUB_ACTIONS:-}" != "true" && (! -v AWS_DEFAULT_PROFILE || -z "${AWS_DEFAULT_PROFILE}") ]]; then
  # Local path: resolve secrets from 1Password, set AWS_DEFAULT_PROFILE
fi

When GITHUB_ACTIONS=true, the script assumes all secrets are already present in the environment (injected by the workflow’s env block) and skips 1Password resolution entirely. The aws sso login calls throughout the script are also gated on this variable — they are no-ops in CI.

When AWS_DEFAULT_PROFILE is already set (even outside CI), the script also skips credential resolution, allowing operators to pre-configure their environment.

”Full Amplify target?” (Amplify deployment path)

The script maintains a hardcoded list of Infrastructure:Partition pairs that receive full Amplify deployment (app creation, branch, domain, initial job):

AMPLIFY_DEPLOY_TARGETS=("SandboxKyle002:kyle" "Alpha001:demo")

The check is a substring match against this array:

amplify_target="${infrastructure}:${partition}"
if [[ " ${AMPLIFY_DEPLOY_TARGETS[*]} " == *" ${amplify_target} "* ]]; then
  # Full path: deploy amplify.cfn.yaml + amplifyBranch.cfn.yaml + workaround + initial job
else
  # Manual path: deploy amplifyComputeRole.cfn.yaml + attach role + merge env vars
fi

All other partitions (Alpha001:prod, Alpha002:dev, Alpha002:stage) follow the “manual” path — they have Amplify apps created outside this script (e.g., via the AWS console or a separate process), and amm.sh only manages the compute role and environment variables.

Amplify auto-build and PR preview flags

Within the full Amplify path, two boolean flags are derived from the partition name:

Flag	Default	Exception	Rationale
`enable_auto_build`	`true`	`false` for `demo`	Demo deployments are triggered manually to control when changes go live
`enable_pr_preview`	`false`	`true` for `dev`	Only the dev partition creates a secondary `main` branch resource to enable Amplify PR preview builds

Amplify repository and branch resolution

The script uses two associative arrays to map each Infrastructure:Partition pair to its GitHub repository and branch:

declare -A AMPLIFY_APP_REPOS=(
  [SandboxKyle002:kyle]="Arda-cards/kyle-frontend-app"
  [Alpha001:demo]="Arda-cards/arda-frontend-app"
  [Alpha002:dev]="Arda-cards/arda-frontend-app"
  [Alpha002:stage]="Arda-cards/arda-frontend-app"
  [Alpha001:prod]="Arda-cards/arda-frontend-app"
)
declare -A AMPLIFY_BRANCH_NAMES=(
  [dev]="main"  [stage]="main"  [demo]="main"  [prod]="main"  [kyle]="main"
)

Currently all partitions deploy the main branch. The AMPLIFY_APP_REPOS map allows different partitions to point at different frontend repositories (e.g., kyle uses a separate fork).

Compute role workaround

After deploying the Amplify app and branch stacks, the script checks whether the computeRoleArn on the live Amplify app matches the CloudFormation export. This works around aws-cdk#34992 where CloudFormation silently fails to set the property:

COMPUTE_ROLE_ARN_VALUE="$(aws amplify get-app --app-id "${APP_ID}" --query "app.computeRoleArn" --output text)"
if [[ "${COMPUTE_ROLE_ARN_VALUE}" != "${COMPUTE_ROLE_ARN}" ]]; then
  aws amplify update-app --app-id "${APP_ID}" --compute-role-arn "${COMPUTE_ROLE_ARN}"
fi

This is a conditional fix — it only calls update-app when there is actual drift.

ARDA_API_KEY resolution (per-partition, local only)

Inside the partition loop, the API key is resolved only when running locally and the environment variable is not already set:

if [[ "${GITHUB_ACTIONS:-}" != "true" && -z "${ARDA_API_KEY:-}" ]]; then
  ARDA_API_KEY="$(resolve_arda_api_key "${partition}")"
fi

The resolve_arda_api_key function looks up the partition name in PARTITION_VAULT_MAP and calls op read against the corresponding 1Password vault. In CI, ARDA_API_KEY is injected per-partition by the workflow using the secrets[format('ARDA_API_KEY_{0}', partition)] pattern.

CloudFront key pair ID (manual Amplify path)

When merging environment variables for manually-created Amplify apps, the script conditionally includes CLOUDFRONT_KEY_PAIR_ID only if the CloudFormation export exists:

KEY_PAIR_ID="$(aws cloudformation list-exports --output text \
  --query "Exports[?Name=='${infrastructure}-${partition}-API-ImageCdnSigningKeyId'].Value")"
if [[ -n "${KEY_PAIR_ID}" && "${KEY_PAIR_ID}" != "None" ]]; then
  # Include CLOUDFRONT_KEY_PAIR_ID in the merged env vars
fi

This handles partitions that do not have the ImageStorageStack deployed (e.g., early-stage environments without image CDN support).

Deployment Logging Detail

The script records structured JSON to CloudWatch throughout the run:

PlantUML diagram

Failure Modes and Diagnostics

CDK Bootstrap Failures

Symptom	Cause	Resolution
`CDKToolkit` stack in `ROLLBACK_COMPLETE`	Previous bootstrap failed mid-way	Delete the `CDKToolkit` stack manually, then re-run
`already exists` error during bootstrap	Stale `CDKToolkit` from different bootstrap version	Delete and re-bootstrap, or run `cdk bootstrap --force`

CDK Deploy Failures

Symptom	Cause	Resolution
Stack in `ROLLBACK_COMPLETE`	A previous create failed	Delete the stack in CloudFormation console, then re-run
`UPDATE_ROLLBACK_COMPLETE`	A previous update failed and rolled back	The stack is usable; re-run will attempt another update
`Resource already exists`	RETAIN-policy resource survived a rollback	Manually delete the resource (follow the RETAIN cleanup order in the infrastructure repo’s `knowledge-base/cdk-construct-patterns.md`), then re-run
Cross-stack export in use	Trying to remove an export consumed by another stack	Deploy the consuming stack first to remove the dependency

Helm Failures

Symptom	Cause	Resolution
`helm upgrade --install` times out	Pods not reaching `Ready` state	Check `kubectl get pods -n <namespace>`, inspect events and logs
`--atomic` rollback	Helm auto-rolled back a failed release	Inspect `helm history <release> -n <namespace>` for error details
`ServiceAccount` annotation mismatch	IAM role ARN changed but Helm didn’t update	Delete the SA manually: `kubectl delete sa -n <namespace> <sa-name>`, then re-run

Kubernetes / Target Group Binding Failures

Symptom	Cause	Resolution
`TargetGroupBinding` stuck in `Progressing`	LBC not running or target group ARN invalid	Verify LBC pods are healthy; check the ARN matches the NLB export
Stale bindings not deleted	Script only deletes bindings with non-matching ARNs	If ARN matches but binding is broken, delete manually with `kubectl delete tgb`

Amplify Failures

Symptom	Cause	Resolution
`Vendor response doesn't contain <attribute>`	CloudFormation export not yet available	Ensure the Partition CDK stacks completed; re-run
Compute role not attached	aws-cdk#34992 — CloudFormation does not set `computeRoleArn`	The script works around this; if it persists, run `aws amplify update-app` manually
Initial job fails	Build error in the frontend app	Check Amplify console build logs; this is a frontend issue, not an infrastructure issue

Credential / Authentication Failures

Symptom	Cause	Resolution
`op read` fails	1Password CLI not authenticated	Run `eval $(op signin)`
`aws sso login` hangs	Browser-based SSO flow not completing	Complete the SSO flow in the browser; check `~/.aws/config` for the profile
`ExpiredTokenException`	SSO session expired mid-run	The script calls `aws sso login` before each phase; if it still expires, the run took too long — re-run
OIDC role assumption fails (CI)	IAM trust policy doesn’t include the GitHub repo/branch	Update the trust policy on the `<Infra>-I-GitHubActionInfrastructure` role

Deployment Log Diagnostics

Every run logs a structured JSON entry to CloudWatch (/arda/oam/deployments). The entry includes:

status: succeeded, failed, or interrupted
exit_code: the process exit code
git.branch, git.commit, git.worktree_dirty: the exact code version deployed
aws_profile, aws_region: the AWS identity used
infrastructure, partitions: what was targeted
version: the git tag matching the deployed commit (if any)

Query recent deployments:

aws logs filter-log-events \
  --log-group-name /arda/oam/deployments \
  --start-time $(date -u -d '24 hours ago' +%s000) \
  --filter-pattern '{ $.status = "failed" }'

Modification Guide

Adding a New Infrastructure

Create CDK instance files in src/main/cdk/instances/<NewInfra>/infra.ts and one file per partition.
Add the Infrastructure to RUNTIME_ACCOUNTS in src/main/cdk/platform/aws-configuration.ts.
Add the partition expansion to the all case in amm.sh.
Add the Infrastructure:Partition entries to AMPLIFY_DEPLOY_TARGETS, AMPLIFY_BRANCH_NAMES, and AMPLIFY_APP_REPOS if Amplify is needed.
Add the partition → 1Password vault mapping to PARTITION_VAULT_MAP.
Add the new environment to the amm.yml workflow’s options list.
Create the IAM role <NewInfra>-I-GitHubActionInfrastructure with OIDC trust for GitHub Actions.

Adding a New Partition to an Existing Infrastructure

Create src/main/cdk/instances/<Infra>/<partition>.ts.
Update the all expansion in amm.sh.
Add the PARTITION_VAULT_MAP entry.
Add AMPLIFY_BRANCH_NAMES and AMPLIFY_APP_REPOS entries.
If the partition should auto-deploy via Amplify, add it to AMPLIFY_DEPLOY_TARGETS.
Add <Infra>/<partition> to the amm.yml workflow options and the ci.yaml synth matrix.
Create the ARDA_API_KEY_<partition> GitHub Actions secret.

Adding a New Secret

Add a parameter to partitionSecrets.cfn.yaml with NoEcho: true.
Add the corresponding AWS::SecretsManager::Secret resource and output/export.
In amm.sh:
- Add the op read call in the local-credentials block.
- Pass it to the aws cloudformation deploy --parameter-overrides for the secrets stack.
In amm.yml: add the GitHub Actions secret reference to the env block.
If the secret is consumed by Amplify, add it to the amplify.cfn.yaml EnvironmentVariables.

Best Practices

Test with cdk synth first. The CI pipeline runs synth for every Infrastructure/Partition combination. Run npm run synth:named -- <Infra>/<target> locally before modifying amm.sh.
Keep Helm chart versions pinned. All helm upgrade --install calls specify --version. Bump versions deliberately and test in a sandbox first.
Use --atomic for Helm. All Helm installs use --atomic, which auto-rolls back on failure. Do not remove this flag.
Respect the deployment order. Infrastructure must complete before any Partition. Secrets must be deployed before Amplify (Amplify references secret ARNs via CloudFormation exports).
Do not skip aws sso login. The script calls it before each phase to handle session expiry on long runs. Removing these calls will cause failures on multi-partition deployments.
Preserve the EXIT trap. The log_run_completion trap ensures every run is logged to CloudWatch, even on failure. If you restructure the script, ensure the trap remains installed early and covers all exit paths.
Idempotency. CloudFormation deploy and Helm upgrade --install are idempotent — they report “no changes” for already-up-to-date resources. However, the script re-executes every step from the beginning on each run; there is no --from-step resume capability. Re-runs are safe but not instant — expect infrastructure and Helm steps to repeat. See Failure Mode Analysis for known side effects on re-run (e.g., unconditional amplify start-job).