Skip to content

Deployment Orchestration (amm.sh)

The amm.sh script (“Arda Money Making”) is the top-level deployment orchestrator for the Arda platform. It provisions an entire Infrastructure and one or more Partitions in a single run, coordinating CDK, CloudFormation, Helm, and kubectl commands in sequence.

Terminal window
./amm.sh [--profile <aws_profile>] [--region <aws_region>] <infrastructure> <partition...>
ArgumentRequiredDescription
--profile <profile>NoSets AWS_PROFILE. Defaults to Admin-<infrastructure> via AWS_DEFAULT_PROFILE when running locally.
--region <region>NoSets AWS_REGION. When omitted, the region is inferred from the AWS profile.
<infrastructure>YesThe Infrastructure name: Alpha001, Alpha002, or SandboxKyle002.
<partition...>YesOne or more Partition names, or the keyword all.

The all keyword expands to a predefined list per Infrastructure:

Infrastructureall expands to
Alpha001demo, prod
Alpha002dev, stage
SandboxKyle002kyle

Examples:

Terminal window
# Deploy Alpha002 infrastructure + dev partition (local, with SSO)
./amm.sh Alpha002 dev
# Deploy Alpha001 with both partitions
./amm.sh Alpha001 all
# Explicit profile and region
./amm.sh --profile Admin-Alpha002 --region us-east-1 Alpha002 dev stage

The amm.yml workflow provides a workflow_dispatch trigger with a dropdown of Infrastructure/Partition combinations:

  • Alpha001/demo, Alpha001/prod
  • Alpha002/dev (default), Alpha002/stage
  • SandboxKyle002/kyle

The workflow:

  1. Splits the environment input into infrastructure and partition.
  2. Fetches AWS account ID and region from the purpose-configuration-action using a locator URL.
  3. Assumes the IAM role <Infrastructure>-I-GitHubActionInfrastructure via OIDC (id-token: write).
  4. Runs npm install, then invokes ./amm.sh <infrastructure> <partition>.

Secrets are passed as environment variables — the script detects GITHUB_ACTIONS=true and skips the local 1Password / SSO login paths.

The script uses 1Password CLI (op read) to resolve secrets at runtime. The operator must be signed into 1Password and have access to the following vaults:

VaultSecretsUsed for
Arda-SystemsOAMAmplify_GitHub_AccessToken, GPR-Read tokenAmplify GitHub integration, GitHub Packages auth
Arda-ProdOAMARDA-SIGNUP-KEYHubSpot signup authentication
Arda-StageOAMHubSpot/client_secret, HubSpot/private_access_token, Pylon/widget_secretThird-party integrations
Per-partition vaultARDA-API-KEYPartition API key

The per-partition vault is resolved via the PARTITION_VAULT_MAP:

Partition1Password Vault
devArda-DevOAM
stageArda-StageOAM
demoArda-DemoOAM
prodArda-SystemsOAM
kyleArda-SandboxKyle

AWS authentication uses SSO — the script calls aws sso login before the Infrastructure step and again before each Partition step.

All secrets are stored as GitHub Actions repository secrets:

SecretValue
AMPLIFY_GITHUB_ACCESSTOKENGitHub PAT for Amplify source access
ARDA_API_KEY_<partition>Per-partition API key (e.g., ARDA_API_KEY_dev)
ARDA_SIGNUP_KEY_KYLEHubSpot signup key
HUBSPOT_CLIENT_KEY_STAGEHubSpot client secret
HUBSPOT_PAT_STAGEHubSpot private access token
PYLON_WIDGET_KEY_STAGEPylon widget secret
GPR_READ_KEYGitHub Packages read token

The IAM role is assumed via OIDC federation (role-to-assume), not long-lived credentials.

The script does not have a --dry-run flag. Each tool it orchestrates has its own preview mechanism that must be invoked individually.

synth generates CloudFormation templates without deploying. It does not require AWS credentials and runs as part of CI on every push and PR (the synth-each-cdk-app matrix job in ci.yaml).

Terminal window
# Synth a specific Infrastructure or Partition target
npm run synth:named -- Alpha002/infra
npm run synth:named -- Alpha002/dev

diff compares the synthesized templates against the currently deployed stacks. This requires valid AWS credentials.

Terminal window
npx cdk diff \
--app 'npx ts-node -r tsconfig-paths/register --prefer-ts-exts src/main/cdk/instances/Alpha002/infra.ts'

For the raw CloudFormation templates (src/main/cfn/*.cfn.yaml), use --no-execute-changeset to create and inspect a changeset without applying it:

Terminal window
aws cloudformation deploy \
--stack-name Alpha002-dev-Secrets \
--template-file src/main/cfn/partitionSecrets.cfn.yaml \
--no-execute-changeset \
--parameter-overrides Infrastructure=Alpha002 Partition=dev \
ArdaApiKey=... ArdaSignupKey=... HubspotClientKey=... HubspotPAT=... PylonWidgetKey=...

The changeset appears in the CloudFormation console for review. Delete it after inspection to avoid stale changesets blocking future deploys.

Terminal window
helm upgrade --install --dry-run \
--version 4.13.0 \
--repo https://kubernetes.github.io/ingress-nginx \
--namespace dev-ingress-nginx \
--set "controller.ingressClass=dev-nginx" \
ingress-nginx ingress-nginx

This renders the manifests and validates them against the cluster API without applying changes.

Terminal window
kubectl apply --dry-run=client -f <manifest>

Use --dry-run=server for server-side validation (requires cluster connectivity).

The ci.yaml workflow synthesizes every Infrastructure/Partition combination in a matrix:

Alpha001/infra, Alpha001/demo, Alpha001/prod,
Alpha002/infra, Alpha002/dev, Alpha002/stage,
SandboxKyle002/infra, SandboxKyle002/kyle

This catches CDK compilation errors, construct misconfiguration, and missing exports before any deployment. The all-synth-results job gates the pipeline — all targets must synth successfully for the build to pass.

Infrastructure Phase (runs once per invocation)

Section titled “Infrastructure Phase (runs once per invocation)”
  • Green-field: The AWS account must exist and CDK must have been bootstrapped (cdk bootstrap). The script bootstraps automatically, but a pre-existing CDKToolkit stack from a different bootstrap version may require manual cleanup.
  • Upgrade: All prior CloudFormation stacks from the Infrastructure layer must be in a stable state (CREATE_COMPLETE, UPDATE_COMPLETE). ROLLBACK_COMPLETE stacks must be deleted manually before re-running.
StepToolResources
CloudWatch loggingCloudFormation (cloudWatch.cfn.yaml)Log group /arda/oam/deployments with 14-day retention; log stream for the current date
CDK bootstrapcdk bootstrapCDKToolkit stack (S3 staging bucket, IAM roles)
Infrastructure CDKcdk deploy (all stacks via instances/<Infra>/infra.ts)VPC, EKS cluster, IAM roles, Route53 hosted zones, NLBs, security groups — everything in the Infrastructure layer
EKS kubeconfigaws eks update-kubeconfigLocal ~/.kube/config entry for the cluster
Fluent Bit loggingkubectl applyaws-observability namespace, aws-logging ConfigMap (Fluent Bit → CloudWatch /<infra>/eks-logs)
AWS Load Balancer ControllerHelm (aws-load-balancer-controller v1.13.4)Namespace aws-load-balancer-controller, LBC deployment, ServiceAccount with IAM role annotation
External Secrets OperatorHelm (external-secrets v0.19.1)Namespace external-secrets, ESO deployment (cluster-scoped CRDs disabled)

Partition Phase (repeats for each partition)

Section titled “Partition Phase (repeats for each partition)”
  • Green-field: The Infrastructure phase must have completed successfully. CloudFormation exports from the Infrastructure layer (e.g., <Infra>-I-EksClusterName, NLB target group ARNs) must exist.
  • Upgrade: Partition CloudFormation stacks must be in a stable state. For Amplify targets, the <Infra>-<Part>-Amplify stack must exist before the branch/domain stack can be deployed.
StepToolResources
Partition CDKcdk deploy (via instances/<Infra>/<partition>.ts)Cognito user pools, API Gateway, DynamoDB tables, S3 buckets, Lambda functions, CloudFront distributions — everything in the Partition layer
nginx IngressHelm (ingress-nginx v4.13.0)Namespace <partition>-ingress-nginx, nginx controller (2 replicas, ClusterIP), IngressClass <partition>-nginx
Target Group Bindingskubectl applyTargetGroupBinding CRs linking nginx to the NLB target groups (HTTP port 80, HTTPS port 443). Stale bindings are deleted.
Partition secretsCloudFormation (partitionSecrets.cfn.yaml)5 Secrets Manager secrets: ArdaApiKey, ArdaSignupSecretKey, HubspotClientSecret, HubspotPrivateAccessToken, PylonWidgetSecret
Amplify (full targets)CloudFormationSee Amplify deployment
Amplify (manual targets)CloudFormation + AWS CLISee Amplify deployment

The script handles two Amplify paths depending on whether the Infrastructure:Partition pair is in the AMPLIFY_DEPLOY_TARGETS list.

Full Amplify targets (SandboxKyle002:kyle, Alpha001:demo):

  1. amplify.cfn.yaml — Creates the Amplify app, IAM service role, compute role, and wires environment variables from CloudFormation exports and Secrets Manager references.
  2. amplifyBranch.cfn.yaml — Creates the branch resource, domain association, and optionally a PR preview branch (enabled only for dev).
  3. Compute role workaround — Works around aws-cdk#34992 by calling aws amplify update-app if the compute role ARN drifts.
  4. Initial deployment — Triggers an Amplify RELEASE job.

Auto-build is disabled for demo; PR preview is enabled only for dev.

Manual Amplify targets (all other partitions: Alpha001:prod, Alpha002:dev, Alpha002:stage):

  1. amplifyComputeRole.cfn.yaml — Creates only the IAM compute role (SecretsManager, Cognito, Logging).
  2. Attaches the role to the existing Amplify app via aws amplify update-app.
  3. Merges INFRASTRUCTURE, PARTITION, NEXT_PUBLIC_INFRASTRUCTURE, NEXT_PUBLIC_PARTITION, and (if available) CLOUDFRONT_KEY_PAIR_ID into the app’s existing environment variables.

PlantUML diagram

The flow diagram above contains several branching points. This section documents the exact logic behind each decision.

”Running locally?” (credential resolution)

Section titled “”Running locally?” (credential resolution)”

Evaluated by checking the GITHUB_ACTIONS environment variable and AWS_DEFAULT_PROFILE:

Terminal window
if [[ "${GITHUB_ACTIONS:-}" != "true" && (! -v AWS_DEFAULT_PROFILE || -z "${AWS_DEFAULT_PROFILE}") ]]; then
# Local path: resolve secrets from 1Password, set AWS_DEFAULT_PROFILE
fi

When GITHUB_ACTIONS=true, the script assumes all secrets are already present in the environment (injected by the workflow’s env block) and skips 1Password resolution entirely. The aws sso login calls throughout the script are also gated on this variable — they are no-ops in CI.

When AWS_DEFAULT_PROFILE is already set (even outside CI), the script also skips credential resolution, allowing operators to pre-configure their environment.

”Full Amplify target?” (Amplify deployment path)

Section titled “”Full Amplify target?” (Amplify deployment path)”

The script maintains a hardcoded list of Infrastructure:Partition pairs that receive full Amplify deployment (app creation, branch, domain, initial job):

Terminal window
AMPLIFY_DEPLOY_TARGETS=("SandboxKyle002:kyle" "Alpha001:demo")

The check is a substring match against this array:

Terminal window
amplify_target="${infrastructure}:${partition}"
if [[ " ${AMPLIFY_DEPLOY_TARGETS[*]} " == *" ${amplify_target} "* ]]; then
# Full path: deploy amplify.cfn.yaml + amplifyBranch.cfn.yaml + workaround + initial job
else
# Manual path: deploy amplifyComputeRole.cfn.yaml + attach role + merge env vars
fi

All other partitions (Alpha001:prod, Alpha002:dev, Alpha002:stage) follow the “manual” path — they have Amplify apps created outside this script (e.g., via the AWS console or a separate process), and amm.sh only manages the compute role and environment variables.

Within the full Amplify path, two boolean flags are derived from the partition name:

FlagDefaultExceptionRationale
enable_auto_buildtruefalse for demoDemo deployments are triggered manually to control when changes go live
enable_pr_previewfalsetrue for devOnly the dev partition creates a secondary main branch resource to enable Amplify PR preview builds

The script uses two associative arrays to map each Infrastructure:Partition pair to its GitHub repository and branch:

Terminal window
declare -A AMPLIFY_APP_REPOS=(
[SandboxKyle002:kyle]="Arda-cards/kyle-frontend-app"
[Alpha001:demo]="Arda-cards/arda-frontend-app"
[Alpha002:dev]="Arda-cards/arda-frontend-app"
[Alpha002:stage]="Arda-cards/arda-frontend-app"
[Alpha001:prod]="Arda-cards/arda-frontend-app"
)
declare -A AMPLIFY_BRANCH_NAMES=(
[dev]="main" [stage]="main" [demo]="main" [prod]="main" [kyle]="main"
)

Currently all partitions deploy the main branch. The AMPLIFY_APP_REPOS map allows different partitions to point at different frontend repositories (e.g., kyle uses a separate fork).

After deploying the Amplify app and branch stacks, the script checks whether the computeRoleArn on the live Amplify app matches the CloudFormation export. This works around aws-cdk#34992 where CloudFormation silently fails to set the property:

Terminal window
COMPUTE_ROLE_ARN_VALUE="$(aws amplify get-app --app-id "${APP_ID}" --query "app.computeRoleArn" --output text)"
if [[ "${COMPUTE_ROLE_ARN_VALUE}" != "${COMPUTE_ROLE_ARN}" ]]; then
aws amplify update-app --app-id "${APP_ID}" --compute-role-arn "${COMPUTE_ROLE_ARN}"
fi

This is a conditional fix — it only calls update-app when there is actual drift.

ARDA_API_KEY resolution (per-partition, local only)

Section titled “ARDA_API_KEY resolution (per-partition, local only)”

Inside the partition loop, the API key is resolved only when running locally and the environment variable is not already set:

Terminal window
if [[ "${GITHUB_ACTIONS:-}" != "true" && -z "${ARDA_API_KEY:-}" ]]; then
ARDA_API_KEY="$(resolve_arda_api_key "${partition}")"
fi

The resolve_arda_api_key function looks up the partition name in PARTITION_VAULT_MAP and calls op read against the corresponding 1Password vault. In CI, ARDA_API_KEY is injected per-partition by the workflow using the secrets[format('ARDA_API_KEY_{0}', partition)] pattern.

CloudFront key pair ID (manual Amplify path)

Section titled “CloudFront key pair ID (manual Amplify path)”

When merging environment variables for manually-created Amplify apps, the script conditionally includes CLOUDFRONT_KEY_PAIR_ID only if the CloudFormation export exists:

Terminal window
KEY_PAIR_ID="$(aws cloudformation list-exports --output text \
--query "Exports[?Name=='${infrastructure}-${partition}-API-ImageCdnSigningKeyId'].Value")"
if [[ -n "${KEY_PAIR_ID}" && "${KEY_PAIR_ID}" != "None" ]]; then
# Include CLOUDFRONT_KEY_PAIR_ID in the merged env vars
fi

This handles partitions that do not have the ImageStorageStack deployed (e.g., early-stage environments without image CDN support).

The script records structured JSON to CloudWatch throughout the run:

PlantUML diagram

SymptomCauseResolution
CDKToolkit stack in ROLLBACK_COMPLETEPrevious bootstrap failed mid-wayDelete the CDKToolkit stack manually, then re-run
already exists error during bootstrapStale CDKToolkit from different bootstrap versionDelete and re-bootstrap, or run cdk bootstrap --force
SymptomCauseResolution
Stack in ROLLBACK_COMPLETEA previous create failedDelete the stack in CloudFormation console, then re-run
UPDATE_ROLLBACK_COMPLETEA previous update failed and rolled backThe stack is usable; re-run will attempt another update
Resource already existsRETAIN-policy resource survived a rollbackManually delete the resource (follow the RETAIN cleanup order in the infrastructure repo’s knowledge-base/cdk-construct-patterns.md), then re-run
Cross-stack export in useTrying to remove an export consumed by another stackDeploy the consuming stack first to remove the dependency
SymptomCauseResolution
helm upgrade --install times outPods not reaching Ready stateCheck kubectl get pods -n <namespace>, inspect events and logs
--atomic rollbackHelm auto-rolled back a failed releaseInspect helm history <release> -n <namespace> for error details
ServiceAccount annotation mismatchIAM role ARN changed but Helm didn’t updateDelete the SA manually: kubectl delete sa -n <namespace> <sa-name>, then re-run

Kubernetes / Target Group Binding Failures

Section titled “Kubernetes / Target Group Binding Failures”
SymptomCauseResolution
TargetGroupBinding stuck in ProgressingLBC not running or target group ARN invalidVerify LBC pods are healthy; check the ARN matches the NLB export
Stale bindings not deletedScript only deletes bindings with non-matching ARNsIf ARN matches but binding is broken, delete manually with kubectl delete tgb
SymptomCauseResolution
Vendor response doesn't contain <attribute>CloudFormation export not yet availableEnsure the Partition CDK stacks completed; re-run
Compute role not attachedaws-cdk#34992 — CloudFormation does not set computeRoleArnThe script works around this; if it persists, run aws amplify update-app manually
Initial job failsBuild error in the frontend appCheck Amplify console build logs; this is a frontend issue, not an infrastructure issue
SymptomCauseResolution
op read fails1Password CLI not authenticatedRun eval $(op signin)
aws sso login hangsBrowser-based SSO flow not completingComplete the SSO flow in the browser; check ~/.aws/config for the profile
ExpiredTokenExceptionSSO session expired mid-runThe script calls aws sso login before each phase; if it still expires, the run took too long — re-run
OIDC role assumption fails (CI)IAM trust policy doesn’t include the GitHub repo/branchUpdate the trust policy on the <Infra>-I-GitHubActionInfrastructure role

Every run logs a structured JSON entry to CloudWatch (/arda/oam/deployments). The entry includes:

  • status: succeeded, failed, or interrupted
  • exit_code: the process exit code
  • git.branch, git.commit, git.worktree_dirty: the exact code version deployed
  • aws_profile, aws_region: the AWS identity used
  • infrastructure, partitions: what was targeted
  • version: the git tag matching the deployed commit (if any)

Query recent deployments:

Terminal window
aws logs filter-log-events \
--log-group-name /arda/oam/deployments \
--start-time $(date -u -d '24 hours ago' +%s000) \
--filter-pattern '{ $.status = "failed" }'
  1. Create CDK instance files in src/main/cdk/instances/<NewInfra>/infra.ts and one file per partition.
  2. Add the Infrastructure to RUNTIME_ACCOUNTS in src/main/cdk/platform/aws-configuration.ts.
  3. Add the partition expansion to the all case in amm.sh.
  4. Add the Infrastructure:Partition entries to AMPLIFY_DEPLOY_TARGETS, AMPLIFY_BRANCH_NAMES, and AMPLIFY_APP_REPOS if Amplify is needed.
  5. Add the partition → 1Password vault mapping to PARTITION_VAULT_MAP.
  6. Add the new environment to the amm.yml workflow’s options list.
  7. Create the IAM role <NewInfra>-I-GitHubActionInfrastructure with OIDC trust for GitHub Actions.

Adding a New Partition to an Existing Infrastructure

Section titled “Adding a New Partition to an Existing Infrastructure”
  1. Create src/main/cdk/instances/<Infra>/<partition>.ts.
  2. Update the all expansion in amm.sh.
  3. Add the PARTITION_VAULT_MAP entry.
  4. Add AMPLIFY_BRANCH_NAMES and AMPLIFY_APP_REPOS entries.
  5. If the partition should auto-deploy via Amplify, add it to AMPLIFY_DEPLOY_TARGETS.
  6. Add <Infra>/<partition> to the amm.yml workflow options and the ci.yaml synth matrix.
  7. Create the ARDA_API_KEY_<partition> GitHub Actions secret.
  1. Add a parameter to partitionSecrets.cfn.yaml with NoEcho: true.
  2. Add the corresponding AWS::SecretsManager::Secret resource and output/export.
  3. In amm.sh:
    • Add the op read call in the local-credentials block.
    • Pass it to the aws cloudformation deploy --parameter-overrides for the secrets stack.
  4. In amm.yml: add the GitHub Actions secret reference to the env block.
  5. If the secret is consumed by Amplify, add it to the amplify.cfn.yaml EnvironmentVariables.
  • Test with cdk synth first. The CI pipeline runs synth for every Infrastructure/Partition combination. Run npm run synth:named -- <Infra>/<target> locally before modifying amm.sh.
  • Keep Helm chart versions pinned. All helm upgrade --install calls specify --version. Bump versions deliberately and test in a sandbox first.
  • Use --atomic for Helm. All Helm installs use --atomic, which auto-rolls back on failure. Do not remove this flag.
  • Respect the deployment order. Infrastructure must complete before any Partition. Secrets must be deployed before Amplify (Amplify references secret ARNs via CloudFormation exports).
  • Do not skip aws sso login. The script calls it before each phase to handle session expiry on long runs. Removing these calls will cause failures on multi-partition deployments.
  • Preserve the EXIT trap. The log_run_completion trap ensures every run is logged to CloudWatch, even on failure. If you restructure the script, ensure the trap remains installed early and covers all exit paths.
  • Idempotency. CloudFormation deploy and Helm upgrade --install are idempotent — they report “no changes” for already-up-to-date resources. However, the script re-executes every step from the beginning on each run; there is no --from-step resume capability. Re-runs are safe but not instant — expect infrastructure and Helm steps to repeat. See Failure Mode Analysis for known side effects on re-run (e.g., unconditional amplify start-job).