Inspecting Cluster Logs

Arda services run on EKS clusters. Application logs are available from two sources:

Kubernetes pod logs — live logs from running pods via kubectl logs. These rotate when pods restart and have limited retention.
AWS CloudWatch Logs — persisted logs collected by fluent-bit. These survive pod restarts and are retained according to the log group policy.

When pod logs have rotated off, CloudWatch is the authoritative source.

Clusters and AWS Profiles

Cluster	AWS profile	Log group	Region
Alpha001 (Production)	`Admin-Alpha1`	`/Alpha001/eks-logs`	`us-east-2`
Alpha002 (Dev/Stage)	`Admin-Alpha2`	`/Alpha002/eks-logs`	`us-east-2`

Prerequisites

Authenticate with the AWS profile for your target cluster:

aws sso login --profile <aws-profile>

2. Set kubectl context

export AWS_PROFILE=<aws-profile>
kubectl config use-context <cluster-context>

Verify connectivity:

kubectl get namespaces

Kubernetes Namespaces

Services are deployed in namespaces following the pattern <env>-<component>. Common namespaces:

Component	Prod namespace	Dev namespace
Operations (item, kanban, orders)	`prod-operations`	`dev-operations`
Item Data Authority	`prod-item-data-authority`	`dev-item-data-authority`
Accounts	`prod-accounts`	`dev-accounts`
Ingress	`prod-ingress-nginx`	`dev-ingress-nginx`
Bastion (DB access)	`prod-bastion`	`dev-bastion`

Method 1: Kubernetes Pod Logs

Pod logs are the fastest way to inspect a running service. They are not persisted across pod restarts.

List pods in a namespace

kubectl get pods -n <namespace>

Tail live logs

kubectl logs -n <namespace> <pod-name> --tail=200 -f

Search recent logs for a pattern

kubectl logs -n <namespace> <pod-name> --tail=5000 | grep "<pattern>"

Search across all pods in a namespace

for pod in $(kubectl get pods -n <namespace> -o name); do
  echo "=== $pod ==="
  kubectl logs -n <namespace> "$pod" --tail=5000 2>/dev/null | grep "<pattern>"
done

Limitations

Pod logs are lost when a pod restarts or is replaced.
Log buffer size varies; older entries may have rotated off.
For historical logs, use CloudWatch (Method 2 below).

Method 2: AWS CloudWatch Logs

CloudWatch retains logs collected by fluent-bit from all EKS pods. Each pod’s logs appear as a log stream within the cluster’s log group. Log stream names follow the pattern:

<namespace>.<pod-name>_<namespace>_<container-name>-<container-id>

Find log streams for a service

aws logs describe-log-streams \
  --profile <aws-profile> \
  --log-group-name "/<cluster>/eks-logs" \
  --log-stream-name-prefix "<namespace>" \
  --order-by LastEventTime \
  --descending \
  --limit 10 \
  --output json | jq '.logStreams[] | {name: .logStreamName, lastEvent: (.lastEventTimestamp/1000 | todate)}'

Replace <cluster> with the cluster name (e.g., Alpha001) and <namespace> with the Kubernetes namespace prefix (e.g., prod-operations).

Filter logs by time window and pattern

aws logs filter-log-events \
  --profile <aws-profile> \
  --log-group-name "/<cluster>/eks-logs" \
  --log-stream-names "<stream-name>" \
  --start-time <epoch-ms> \
  --end-time <epoch-ms> \
  --filter-pattern "<pattern>" \
  --output json > scratch/cw-output.json

The --filter-pattern argument uses CloudWatch filter syntax:

Pattern form	Example	Meaning
Quoted literal	`"399a4ea4"`	Substring match
Multiple terms (AND)	`"PUT" "item"`	Both terms present
JSON field match	`{ $.level = "ERROR" }`	Structured log field

Always redirect output to scratch/ — raw CloudWatch output can be very large.

Parse fluent-bit JSON logs

CloudWatch stores fluent-bit structured JSON. Each event’s message field contains a JSON envelope with a log field holding the actual application log line. Use this snippet to extract readable log lines from saved output:

python3 -c "
import json, sys
data = json.load(open('scratch/cw-output.json'))
for event in data.get('events', []):
    msg = event.get('message', '')
    try:
        parsed = json.loads(msg)
        print(parsed.get('log', msg).rstrip())
    except json.JSONDecodeError:
        print(msg.rstrip())
" > scratch/cw-parsed.txt

Time conversion helpers

CloudWatch timestamps are epoch milliseconds. These helpers convert between ISO-8601 timestamps and epoch milliseconds:

# ISO-8601 date to epoch milliseconds
date -d "2026-03-03T17:18:50Z" +%s000                              # Linux
date -j -f "%Y-%m-%dT%H:%M:%SZ" "2026-03-03T17:18:50Z" +%s000    # macOS

# Epoch milliseconds to ISO-8601
date -r $((1772558330000/1000)) -u "+%Y-%m-%dT%H:%M:%S UTC"       # macOS
date -d @$((1772558330000/1000)) -u "+%Y-%m-%dT%H:%M:%S UTC"      # Linux

Ingress logs

To find the API call that triggered a specific operation, search the ingress controller logs. These include source IP, HTTP method, path, status code, and user-agent — useful for tracing which client initiated a request.

aws logs filter-log-events \
  --profile <aws-profile> \
  --log-group-name "/<cluster>/eks-logs" \
  --log-stream-name-prefix "<env>-ingress-nginx" \
  --start-time <epoch-ms> \
  --end-time <epoch-ms> \
  --filter-pattern "<entity-id-or-path>" \
  --output json > scratch/cw-ingress.json

Operational Notes

Always use --output json with aws logs commands for reliable downstream parsing.
Prefer filter-log-events with --filter-pattern over downloading entire log streams — it reduces data transfer and processing time.
For time-sensitive investigations, narrow the --start-time/--end-time window as tightly as possible before expanding.
When pod logs are unavailable (rotated, pod restarted), fall back to CloudWatch immediately rather than waiting for the pod to restart.

Accessing Arda APIs — Make authenticated API calls to Arda environments.
API Testing with Bruno — Run Bruno API tests against local or remote.