Skip to content

Overview

Add Sentry-based error capture and OpenTelemetry performance tracing to Arda’s backend components (operations, accounts) through the shared common-module library. This replaces the homegrown performance monitoring and MDC propagation plugins with standards-based OTel instrumentation, adds health endpoints, and provisions secrets through the existing CloudFormation/external-secrets pipeline with no new infrastructure to deploy.

Arda currently has no application-level observability beyond CloudWatch logs. Diagnosing errors requires grepping CloudWatch. There is no distributed tracing, no structured error capture, no release health tracking, and no PII-safe error forwarding. As the platform grows (gRPC inter-service calls, additional components), the lack of instrumentation becomes increasingly costly.

ConcernTechnologyRationale
Error captureSentry Kotlin SDK (sentry-kotlin-extensions)Native Kotlin coroutine support, beforeSend hooks for scrubbing
Performance tracingOpenTelemetry Java SDK + Ktor instrumentationVendor-neutral, first-class gRPC/HTTP/JDBC support, future-proof
Trace export to SentryOTel OTLP exporter to Sentry’s OTLP endpointSentry natively ingests OTLP spans; no collector needed
Trace-log correlationOTel MDC injection (trace_id, span_id in SLF4J MDC)CloudWatch log lines become searchable by Sentry trace ID
Request ID correlationX-Request-ID recorded as OTel span attribute arda.request_idPreserves external API contract; bridges client-facing ID to internal trace

One Sentry project per component, environment tags for partition separation:

Sentry ProjectComponentDSN Secret (AWS SM)
arda-operationsoperations{infrastructure}-SentryDsn-Operations
arda-accountsaccounts-component{infrastructure}-SentryDsn-Accounts

Environment tag values: alpha001-prod, alpha001-demo, alpha002-dev, alpha002-stage.

Phase 1: Common Module — SDK Integration

Section titled “Phase 1: Common Module — SDK Integration”

Core library changes in common-module under cards.arda.common.lib.oam.observability:

  • Sentry initialization — reads DSN from configuration, graceful degradation when DSN is missing/invalid (application never fails to start due to observability)
  • OTel initialization — OTLP HTTP exporter to Sentry when DSN is configured, logging exporter fallback for local development
  • Replace homegrown monitoring plugins — remove ServerPerfMonitoringPlugin, ClientPerfMonitoringPlugin, ServerMDCPropagationPlugin, ClientMDCPropagationPlugin; replace with OTel Ktor server/client instrumentation and MDC injection
  • X-Request-ID bridge — record as OTel span attribute arda.request_id
  • Sentry integration in StatusPages — capture exceptions with request context; classify 5XX as errors, 4XX as breadcrumbs
  • Data scrubbing (beforeSend) — filter by error type, scrub PII (tenant IDs, emails, auth headers, DB connection strings), configurable via ComponentBuilder
  • Health/readiness endpointsGET /{component}/oam/health/live and /ready, unauthenticated, excluded from tracing
  • OAM configuration enhancement — include release version, Sentry status, OTel status

Phase 2: Helm Chart and Secret Provisioning

Section titled “Phase 2: Helm Chart and Secret Provisioning”
  • Sentry project setup GitHub Action — idempotent composite action (sentry-project-setup-action) that creates the Sentry project and sets the SENTRY_DSN repo secret automatically
  • CloudFormation/CI pipeline — add SentryDsn parameter following the DocumintApiKey pattern through reusable_deployment.yamlpre.jsonpre-install.cfn.yml → AWS Secrets Manager
  • Helm chart changes (both components) — ExternalSecret data, secrets.properties mapping, configmap entries, liveness/readiness probes, per-environment sample rates
  • Organization-level secretsSENTRY_AUTH_TOKEN (Sentry API) and SENTRY_SETUP_TOKEN (GitHub PAT for gh secret set)
  • Initialize SentryInitializer and OTelInitializer in both operations and accounts Main.kt
  • Wrap DataSource with OTel JDBC instrumentation
  • Update logback configuration to include trace_id and span_id in log pattern
  • Unit tests for Sentry initialization, beforeSend scrubber, request ID bridge, health endpoints, OAM configuration
  • Local integration test (empty DSN, logging exporter, health endpoints)
  • Dev environment smoke test (error in Sentry, trace in Sentry, trace_id in CloudWatch, arda.request_id attribute)
  • CI recommendation for Sentry release notification
#QuestionDecision
DQ-001Sentry project topologyOne project per component, environment tags for partition separation
DQ-002Tracing technologyOTel for tracing + Sentry for errors; OTLP export to Sentry (no vendor lock-in)
DQ-003X-Request-ID correlationRecord as span attribute arda.request_id (decoupled from trace ID)
DQ-004CloudWatch interactionUnchanged; OTel MDC adds trace_id/span_id to log lines; homegrown perf logs removed
DQ-005DSN provisioningCloudFormation (DocumintApiKey pattern) → AWS SM → external-secrets → secrets.properties
DQ-006Health endpointsAdded to common-module; unauthenticated, excluded from tracing
DQ-007Data scrubbingbeforeSend filters 4XX, scrubs PII from 5XX; configurable via ComponentBuilder
DQ-008Missing DSN behaviorSentry disabled gracefully; OTel falls back to logging exporter
DQ-009DSN secret source in CIRepository-level GitHub Actions secret, one per component repo
DQ-010OAM package restructuringoam.observability + oam.configuration packages
DQ-011Sentry project automationIdempotent GitHub Action using Sentry API + gh secret set
  • Sentry SDK and OTel SDK integration in common-module
  • Removal of homegrown monitoring plugins
  • X-Request-ID bridge, health endpoints, data scrubbing, OAM enhancement
  • Sentry project setup GitHub Action (sentry-project-setup-action)
  • CloudFormation, CI pipeline, and Helm chart changes for both components
  • Logback trace context injection
  • Unit tests and integration verification
  • SQS consumer instrumentation
  • gRPC interceptor (deferred until gRPC calls are introduced)
  • Sentry alert rule implementation (recommendations provided)
  • Source map / ProGuard deobfuscation
  • CloudWatch Metrics, X-Ray, or AWS-side observability resources
  • New infrastructure deployment (no OTel Collector)
  • common-module — SDK integration, shared observability code
  • operations — initialization, Helm chart, CloudFormation, CI pipeline
  • accounts-component — initialization, Helm chart, CloudFormation, CI pipeline
  • infrastructure — CloudFormation template changes
  • sentry-project-setup-action (new) — idempotent Sentry project creation action
  • Problem statement: /workspace/projects/ad-hoc/common-module/sentry/description.md
  • Full specification: /workspace/projects/ad-hoc/common-module/sentry/specification.md
  • Decision log: /workspace/projects/ad-hoc/common-module/sentry/decision-log.md