Skip to content

Inspecting Cluster Logs

Arda services run on EKS clusters. Application logs are available from two sources:

  1. Kubernetes pod logs — live logs from running pods via kubectl logs. These rotate when pods restart and have limited retention.
  2. AWS CloudWatch Logs — persisted logs collected by fluent-bit. These survive pod restarts and are retained according to the log group policy.

When pod logs have rotated off, CloudWatch is the authoritative source.

ClusterAWS profileLog groupRegion
Alpha001 (Production)Admin-Alpha1/Alpha001/eks-logsus-east-2
Alpha002 (Dev/Stage)Admin-Alpha2/Alpha002/eks-logsus-east-2

Authenticate with the AWS profile for your target cluster:

Terminal window
aws sso login --profile <aws-profile>
Terminal window
export AWS_PROFILE=<aws-profile>
kubectl config use-context <cluster-context>

Verify connectivity:

Terminal window
kubectl get namespaces

Services are deployed in namespaces following the pattern <env>-<component>. Common namespaces:

ComponentProd namespaceDev namespace
Operations (item, kanban, orders)prod-operationsdev-operations
Item Data Authorityprod-item-data-authoritydev-item-data-authority
Accountsprod-accountsdev-accounts
Ingressprod-ingress-nginxdev-ingress-nginx
Bastion (DB access)prod-bastiondev-bastion

Pod logs are the fastest way to inspect a running service. They are not persisted across pod restarts.

Terminal window
kubectl get pods -n <namespace>
Terminal window
kubectl logs -n <namespace> <pod-name> --tail=200 -f
Terminal window
kubectl logs -n <namespace> <pod-name> --tail=5000 | grep "<pattern>"
Terminal window
for pod in $(kubectl get pods -n <namespace> -o name); do
echo "=== $pod ==="
kubectl logs -n <namespace> "$pod" --tail=5000 2>/dev/null | grep "<pattern>"
done
  • Pod logs are lost when a pod restarts or is replaced.
  • Log buffer size varies; older entries may have rotated off.
  • For historical logs, use CloudWatch (Method 2 below).

CloudWatch retains logs collected by fluent-bit from all EKS pods. Each pod’s logs appear as a log stream within the cluster’s log group. Log stream names follow the pattern:

<namespace>.<pod-name>_<namespace>_<container-name>-<container-id>
Terminal window
aws logs describe-log-streams \
--profile <aws-profile> \
--log-group-name "/<cluster>/eks-logs" \
--log-stream-name-prefix "<namespace>" \
--order-by LastEventTime \
--descending \
--limit 10 \
--output json | jq '.logStreams[] | {name: .logStreamName, lastEvent: (.lastEventTimestamp/1000 | todate)}'

Replace <cluster> with the cluster name (e.g., Alpha001) and <namespace> with the Kubernetes namespace prefix (e.g., prod-operations).

Terminal window
aws logs filter-log-events \
--profile <aws-profile> \
--log-group-name "/<cluster>/eks-logs" \
--log-stream-names "<stream-name>" \
--start-time <epoch-ms> \
--end-time <epoch-ms> \
--filter-pattern "<pattern>" \
--output json > scratch/cw-output.json

The --filter-pattern argument uses CloudWatch filter syntax:

Pattern formExampleMeaning
Quoted literal"399a4ea4"Substring match
Multiple terms (AND)"PUT" "item"Both terms present
JSON field match{ $.level = "ERROR" }Structured log field

Always redirect output to scratch/ — raw CloudWatch output can be very large.

CloudWatch stores fluent-bit structured JSON. Each event’s message field contains a JSON envelope with a log field holding the actual application log line. Use this snippet to extract readable log lines from saved output:

Terminal window
python3 -c "
import json, sys
data = json.load(open('scratch/cw-output.json'))
for event in data.get('events', []):
msg = event.get('message', '')
try:
parsed = json.loads(msg)
print(parsed.get('log', msg).rstrip())
except json.JSONDecodeError:
print(msg.rstrip())
" > scratch/cw-parsed.txt

CloudWatch timestamps are epoch milliseconds. These helpers convert between ISO-8601 timestamps and epoch milliseconds:

Terminal window
# ISO-8601 date to epoch milliseconds
date -d "2026-03-03T17:18:50Z" +%s000 # Linux
date -j -f "%Y-%m-%dT%H:%M:%SZ" "2026-03-03T17:18:50Z" +%s000 # macOS
# Epoch milliseconds to ISO-8601
date -r $((1772558330000/1000)) -u "+%Y-%m-%dT%H:%M:%S UTC" # macOS
date -d @$((1772558330000/1000)) -u "+%Y-%m-%dT%H:%M:%S UTC" # Linux

To find the API call that triggered a specific operation, search the ingress controller logs. These include source IP, HTTP method, path, status code, and user-agent — useful for tracing which client initiated a request.

Terminal window
aws logs filter-log-events \
--profile <aws-profile> \
--log-group-name "/<cluster>/eks-logs" \
--log-stream-name-prefix "<env>-ingress-nginx" \
--start-time <epoch-ms> \
--end-time <epoch-ms> \
--filter-pattern "<entity-id-or-path>" \
--output json > scratch/cw-ingress.json
  • Always use --output json with aws logs commands for reliable downstream parsing.
  • Prefer filter-log-events with --filter-pattern over downloading entire log streams — it reduces data transfer and processing time.
  • For time-sensitive investigations, narrow the --start-time/--end-time window as tightly as possible before expanding.
  • When pod logs are unavailable (rotated, pod restarted), fall back to CloudWatch immediately rather than waiting for the pod to restart.

Copyright: (c) Arda Systems 2025-2026, All rights reserved