PDEV-442 — Sentry organisation configuration for Arda services
Recommendation for how to evolve the arda-systems Sentry organisation
as we instrument the JVM components (operations, future
accounts-component). The design lands at two projects total:
arda-frontend (existing) and platform-be (new — covers all
back-end services). Component-level differentiation inside
platform-be uses the OpenTelemetry service.name attribute, set
by the Helm chart from the existing application.name helper.
Pairs with pod_capacity.md § Recommendation #5, which covers the
operations-side Helm wiring.
System context
Section titled “System context”Arda’s deployed runtime today has the following layers, of which only three need Sentry instrumentation:
| Layer | Tech | Sentry needed? | Notes |
|---|---|---|---|
| Front-end SPA | Next.js (React) in browser | Yes — already instrumented | Project arda-frontend (existing) |
| Front-end BFF | Next.js on Amplify SSR Compute (Lambda) | Yes — same project as SPA | Shares codebase + deploy unit |
| Cognito | AWS managed | No | CloudTrail covers auditing |
| API Gateway | AWS managed | No | CloudWatch metrics |
| EKS pods — operations | Kotlin/Ktor on Fargate JVM | Yes — new | Subject of this work |
| EKS pods — accounts | Kotlin/Ktor on Fargate JVM | Yes — new | Same pattern as operations |
| Aurora Postgres | RDS Performance Insights | No | DB-level signals via PI / pg_stat_statements |
Two partitions, four environments:
| Partition | AWS account | Environments |
|---|---|---|
| Alpha001 | production-grade | prod, demo |
| Alpha002 | non-production | stage, dev |
Repository / runtime mapping:
| Repository | Runtime presence | Sentry project? |
|---|---|---|
arda-frontend-app | Amplify (SPA + BFF) | arda-frontend (existing) |
operations | EKS Fargate JVM | platform-be (new — shared with accounts-component) |
accounts-component | EKS Fargate JVM | platform-be (same project as operations; differentiated via service.name tag) |
common-module | Library, no runtime | None (init lives in host app) |
infrastructure | IaC, no runtime | None |
ux-prototype | Storybook, no production runtime | None |
One project: arda-frontend, covering both the SPA and the BFF. They
share a codebase and a deploy unit, so this is correct. Environment
values flow in via the SDK’s environment setting in the existing
instrumentation.
Proposed structure — 2 projects in the arda-systems org
Section titled “Proposed structure — 2 projects in the arda-systems org”| Project | Purpose | SDK | Runtime |
|---|---|---|---|
arda-frontend (existing — keep) | SPA + BFF | @sentry/nextjs | Browser + Amplify Lambda |
platform-be (new) | All back-end JVM services (operations, future accounts-component, …) | OTel Java agent (pod_capacity.md §#5) | EKS Fargate JVM |
Why one project for the whole back end
Section titled “Why one project for the whole back end”Earlier drafts of this doc proposed one project per component
(platform-operations, platform-accounts). We reverted to a
single platform-be project after weighing the trade-offs against
Arda’s current scale.
The single-project choice trades quota isolation between back-end components (a runaway operations logger could in principle drown out accounts visibility) for operational simplicity: one set of alert rules, one ownership file, one source-map upload pipeline, one Discover scope for cross-component queries. With a single back-end team today, the duplication cost of per-component projects is real and recurring while the quota-isolation risk is hypothetical — operations dominates event volume and the org quota is the binding constraint, not per-project caps.
Component-level differentiation inside the shared project is done
via the OpenTelemetry service.name attribute (see § Component
differentiation via service.name below). Sentry surfaces it as a
tag in Issues, Performance, and Discover — the same dimension every
alert rule and dashboard widget would scope by.
The frontend stays in its own project because its SDK, deploy unit, and ownership are genuinely separate; it would not benefit from sharing back-end’s alert rules or quotas.
This is not a one-way door. If/when team ownership diverges
(e.g. accounts gets a dedicated owner) or event volume grows past
shared-quota tolerance, splitting platform-be into per-component
projects is straightforward: stand up the new project, point the
new component’s DSN at it, leave existing issues in platform-be
searchable. The current single-project decision should be revisited
at either of those trigger points.
Component differentiation via service.name
Section titled “Component differentiation via service.name”Set in the Helm chart’s templates/deployment.yaml, outside the
Sentry-enabled gate (so it’s available for any future
OTel-aware integration, not just Sentry):
- name: OTEL_SERVICE_NAME value: {{ include "application.name" . | quote }}The application.name helper already returns the component name
(operations, future accounts) — same value used for
SENTRY_RELEASE and K8s labels. Every chart instance therefore
self-tags correctly with zero configuration.
Two conventions are load-bearing for the single-project design to remain operable as it grows:
- All alert rules must scope by
service.name— the rule author has to includeservice.name:operations(or whichever component) in the rule conditions. Codify this in the alert-rule README inside Sentry’s project settings. - The
OTEL_SERVICE_NAMEenv var must be set by every back-end component’s Helm chart — chart templates should include the snippet above as part of the standard deployment pattern. Make this a chart-review checkpoint.
What I’d explicitly not do
Section titled “What I’d explicitly not do”- Don’t split
arda-frontendinto SPA + BFF projects. They share a codebase, deploy together, and stitch better in Sentry as one project. Revisit only if SPA event volume drowns out BFF triage, which it shouldn’t given the relative call rates. - Don’t fragment
platform-beby component prematurely. See the trade-off discussion above. Wait for an operationally motivated split, not a theoretical one. - Don’t create projects per environment. Environment is a tag in Sentry; projects shouldn’t fragment by env.
- Don’t create projects per partition. Same argument — partition is part of the environment tag.
- Don’t add a Sentry project for
common-module. It’s a library; Sentry instrumentation belongs in the host application. - Don’t add Sentry projects for
infrastructureorux-prototype. No runtime presence in deployed envs.
Environment tag — {partition}-{env}
Section titled “Environment tag — {partition}-{env}”| Tag value | Where |
|---|---|
alpha001-prod | Alpha001 prod |
alpha001-demo | Alpha001 demo |
alpha002-stage | Alpha002 stage |
alpha002-dev | Alpha002 dev |
Why this shape:
- One Sentry project, four environment values per project — the canonical Sentry pattern.
- Matches the partition naming used everywhere else in Arda
(1Password vaults
Arda-{Env}OAM, kubectl contextsAlpha001/Alpha002, CDK app names). - Lets queries scope to “everything prod-like” (
environment:*-prod) or “everything on alpha001” (environment:alpha001-*) without needing a separatepartitiontag. - Sentry treats environment as a free-form tag, so future expansion
(e.g.,
alpha003-prod) is zero-cost.
Set per pod via the SENTRY_ENVIRONMENT env var that the Helm chart
already templates. For the JVM components this resolves from
{{ .Values.global.infrastructure }}-{{ .Values.global.purpose }}
(the existing Helm value names match). For the frontend, set the same
shape via the Next.js Sentry config.
Suggestion: align the existing arda-frontend env values to the same
convention as part of this rollout if they aren’t already — Sentry
handles environment renames gracefully.
DSNs — 2 DSNs, one per project
Section titled “DSNs — 2 DSNs, one per project”A Sentry DSN identifies the project, not the environment. Each project has exactly one DSN; that DSN is used by every pod in every environment for any component routed to that project.
| DSN | Used by |
|---|---|
arda-frontend DSN | SPA + Amplify BFF — already in place |
platform-be DSN | operations pod and, when it ships, accounts-component pod — in all four envs |
DSNs are not authentication credentials. They identify the project; anyone with the DSN can send events to it but not read them. No rotation required. Storing them in 1Password is purely consistency with the existing operational pattern.
Storage convention
Section titled “Storage convention”Sentry DSNs are different from most credentials we store in 1Password.
A DSN identifies the Sentry project, and the project’s
environment-disambiguation is done by the environment tag, not by
the DSN. So the same DSN value is used by every pod of a given
component across all four deployment environments — it is common
by design, not by coincidence or by resource-sharing.
This distinguishes Sentry DSNs from credentials like ArdaApiKey or
the operations DB passwords: those happen to be the same value today
but could legitimately diverge per environment (e.g., a key rotation
in prod that takes a week to propagate to demo). For those, the
“same value in all four Arda-{Env}OAM vaults” pattern is the right
default. For Sentry DSNs, by contrast, divergence would mean
“different Sentry project,” which would defeat the whole point of
the design.
Sentry DSNs therefore live in the workspace-wide Arda-SystemsOAM
vault (not duplicated across the four partition vaults). The
single project covering all back-end services means a single
multi-field item:
| Vault | Item | Fields |
|---|---|---|
Arda-SystemsOAM | be-sentry-dsn | dsn, project-slug (platform-be), sentry-org (arda-systems) |
The pipeline from 1P to K8s is the standard 1P → amm.sh → AWS
Secrets Manager → ESO → K8s pattern (see pod_capacity.md
§ Provisioning pipeline). The AWS SM secret is
infrastructure-scoped at {Infrastructure}-SentryDsn; the K8s
secret materialised by ESO inside each component’s pod namespace
is be-sentry-dsn (key dsn). Both operations and a future
accounts-component chart read from the same AWS SM secret —
the namespace boundary makes the K8s secrets unique without
needing a component qualifier in the resource name.
When would a Sentry value belong in a partition vault?
Section titled “When would a Sentry value belong in a partition vault?”If we ever introduce per-partition Sentry configuration — for example, Sentry monitor URLs, deploy-hook tokens, or alert-rule webhook secrets that are environment-scoped — those would follow the standard four-vault pattern. The DSN is the explicit exception because it identifies a project that, by design, spans all environments.
Release tag — {component}@{version}
Section titled “Release tag — {component}@{version}”| Project | Release value |
|---|---|
arda-frontend | arda-frontend@{package.json version} (or arda-frontend@{git short SHA} if the existing setup uses commits) |
platform-be | {component}@{Chart.AppVersion} per chart instance — e.g. operations@1.2.3, future accounts@0.5.1 (rendered by Helm; see pod_capacity.md § Rec #5) |
Sentry uses this to compute first-seen / regression info and to
anchor releases to deploys. Set via SENTRY_RELEASE env var. With
the single-project design, the {component}@… prefix in the
release value is what disambiguates operations releases from
accounts releases inside the shared project’s release timeline —
the same way service.name disambiguates events. The chart helper
application.name provides the prefix at zero configuration cost.
Sampling defaults
Section titled “Sampling defaults”| Project | Env | tracesSampleRate |
|---|---|---|
arda-frontend | dev / stage | 1.0 |
arda-frontend | demo / prod | existing (likely 0.1–0.3) |
platform-be | dev / stage | 1.0 (full sampling for debugging) |
platform-be | demo | 0.1 (production-facing demo env) |
platform-be | prod | 0.1 (CPU-conscious cap; agent overhead at this rate is ~1–3 % CPU on the 2 vCPU prod pod) |
Sampling decisions are made at the head (where the trace starts —
usually the SPA, occasionally the BFF for server-initiated work).
Downstream services honor the parent decision when the sentry-trace
header is present. Setting the BFF to 5 % in prod effectively samples
the JVM tier at 5 % too for SPA-initiated traffic; the JVM’s own
tracesSampleRate only matters for internal traffic (scheduled jobs,
gRPC, intra-cluster calls).
Teams (optional, low-cost)
Section titled “Teams (optional, low-cost)”Two Sentry teams worth creating up front:
frontend— ownsarda-frontendplatform— ownsplatform-be
Today these are the same set of humans so this creates no friction. As the team grows, alert-rule routing and issue assignment become trivial — Sentry routes issues to the team that owns the project. If accounts-component eventually gets a separate owner, the single-project design’s first-revisit trigger fires (see § Why one project for the whole back end above).
Provisioning order
Section titled “Provisioning order”- (done) Create the
platform-beproject in thearda-systemsorg. DSN captured. - Pre-create the 4 environment values in
platform-bevia Sentry’s Environments page (alpha001-prod,alpha001-demo,alpha002-stage,alpha002-dev) so they appear correctly the first time events arrive. (Sentry auto-creates them on first event too; pre-creating is purely cosmetic so the dropdown is ordered correctly.) - Migrate
arda-frontend’s existing environment values to the{partition}-{env}convention if not already aligned. Sentry handles renames gracefully. - (done) Add the DSN to the workspace-wide
Arda-SystemsOAM1Password vault asbe-sentry-dsn(fields:dsn,project-slug=platform-be,sentry-org=arda-systems). Single entry total — Sentry DSN is common across environments by design. - Infrastructure CDK / amm.sh wiring (separate ticket per
infrastructure-improvements.md§4) — provision the{Infrastructure}-SentryDsnAWS Secrets Manager secret per infrastructure, value sourced fromop://Arda-SystemsOAM/be-sentry-dsn/dsnviaamm.shand passed to CDK viacdk.CfnParameter. - ESO sources the
{Infrastructure}-SentryDsnAWS SM secret and materialises a K8s secret namedbe-sentry-dsn(keydsn) in each operations namespace. The ExternalSecret is declared in the operations chart and gated byoam.performance.sentry.enabled. - Ship the Helm chart wiring (PDEV-488 #5) with
oam.performance.sentry.enabled: truein all four envs day-one (blanket telemetry policy — seepod_capacity.md§ Rec #5 per-env enablement), andOTEL_SERVICE_NAMEset outside the Sentry gate (component differentiation works regardless of whether Sentry is on). - No per-env flip-on step. Step 7 ships with
enabled: truein all four envs; activation is uniform day-one. If a specific env later needs to be dialed back (quota, noise from a misbehaving integration), fliptracesSampleRateorenabledwith a one-line values change.
If steps 5-7 land after a deploy (the upstream AWS SM secret isn’t
yet in place when the chart rolls out), the pod still starts; Sentry
runs in disabled mode until the secret materialises. The chart
uses secretKeyRef: optional: true so the pod is fail-soft against
the missing K8s secret. No outage, no rollback. See
pod_capacity.md § Recommendation #5 Failure Modes for full
behaviour.
What this enables
Section titled “What this enables”Once the agent is loaded (oam.performance.sentry.enabled: true
in any env), the team gets:
- Distributed traces that span SPA → BFF → API Gateway →
operations / accounts. The hand-correlation we did during the
PDEV-442 investigation becomes a single trace view in Sentry,
with each back-end span carrying its
service.nametag for component attribution. - Stack traces on 5xx without an operations code change — the
37 % silent-500 path on
kanban-card/details(PDEV-490 OP3) becomes visible immediately; OP3’s contract fix can land on its own merits but is no longer the only way to get observability on those errors. - JVM runtime metrics (heap, GC, thread counts) per pod, shipped
by the OTel agent — complements the JFR work in PDEV-488 #4. The
metrics are tagged by
service.nameso the sameplatform-beproject surfaces per-component charts. - Release-anchored issue tracking — first-seen / regressed-in
data tied to deploys via the
{component}@{Chart.AppVersion}release tag (SENTRY_RELEASE), so we can see whether a particular Helm bump introduced a new error and which component it landed in.
Dashboarding strategy
Section titled “Dashboarding strategy”Sentry is the primary observability surface for application-layer signals; CloudWatch / Performance Insights remain the home for infrastructure-layer signals. Three categories, decided once so we don’t relitigate per ticket:
Lands in Sentry (free with the OTel agent in PDEV-488 #5)
- Operations request latency p50/p95/p99 — server spans (Ktor / Netty).
- DB query latency + per-request slow-query attribution — JDBC spans
with statement text and duration. Different from
pg_stat_statements(which is aggregate / server-side); for “which endpoint is slow because of which query?”, the JDBC-span view is more useful. - Uncaught exceptions and 5xx attribution.
- Distributed traces SPA → BFF → API Gateway → operations → DB.
- JVM runtime metrics: heap, GC pause duration, GC frequency,
thread counts, code cache, classloader. Emitted by the OTel
agent’s JVM metrics module
(
otel.instrumentation.runtime-telemetry.enabled=true, default in recent agent versions).
The expected setup is one cross-project Sentry Performance
Dashboard — widgets sourced from arda-frontend,
platform-be, and (later) platform-accounts. Build it
after PDEV-488 lands in dev and the data starts flowing; no
separate ticket needed.
Possible in Sentry with caveats
- JVM GC pauses. Use the OTel-emitted
jvm.gc.durationhistogram, not the raw-Xlog:gc*text log. PDEV-488 #3 still ships the GC log to CloudWatch for forensic depth, but the dashboard source is the OTel metric. - Continuous profiling. Sentry Continuous Profiling for JVM uses
async-profiler, which does not work on EKS Fargate (same
constraint that made us pick JFR for PDEV-488 #4). Profiling
centralization is blocked until we leave Fargate (recommendation
#6 in
pod_capacity.md, deferred). Until then, JFR remains the profiling mechanism, and analysis stays in JMC / IntelliJ.
Stays outside Sentry
pg_stat_statementsserver-side aggregates → Aurora / RDS Performance Insights.- Aurora slow-query log lines → CloudWatch Logs.
- Pod-level CPU / memory from
metrics-server→ CloudWatch Container Insights (orkubectl top). - HPA scaling events / K8s events → CloudWatch /
kubectl get events.
These signals fire when the Sentry view points “down the stack” (DB or pod-resource bottlenecks); at our current scale they don’t warrant duplicating into Sentry via custom shipping.
Out of scope for this analysis
Section titled “Out of scope for this analysis”- Sentry billing / quota negotiation. The new projects will add to the org’s event quota. A one-line confirmation with the Sentry billing owner is worth doing before flipping prod, but no detailed cost modeling is in scope here.
- Sentry Crons / Heartbeats for scheduled JVM tasks. Worth revisiting if operations introduces periodic jobs that warrant uptime monitoring; not needed today.
- Sentry Continuous Profiling integration with JFR. The OTel agent in PDEV-488 #5 emits standard JVM metrics; Sentry’s profiling product can ingest JFR but requires additional configuration. Separate follow-up.
- Source-map / debug-symbol upload for SPA / Kotlin. Frontend already
uploads source maps as part of the Amplify build; Kotlin debug
symbol upload via
sentry-cliis a small follow-on to PDEV-488 #5 if symbolic stack traces are needed.
Copyright: © Arda Systems 2025-2026, All rights reserved