PDEV-498 — postgres-database-initializer changes for pg_stat_statements
Implementation proposal for the changes
PDEV-498
requires in the postgres-database-initializer image so that
pg_stat_statements becomes queryable from every application
database the initializer creates on each Aurora cluster.
Context recap
Section titled “Context recap”PDEV-498 (sub-issue of
PDEV-479)
needs CREATE EXTENSION IF NOT EXISTS pg_stat_statements; to run on
each Aurora cluster as part of pod startup. Loading the library at
cluster level is PDEV-479’s responsibility
(shared_preload_libraries in a custom DB-cluster parameter group;
instance restart required); this ticket lights up the queryable view
so
PDEV-490
can read aggregate query stats to drive the SQL optimisations
(composite bitemporal indexes, dropping unused COUNTs, etc.).
Where the extension should live
Section titled “Where the extension should live”pg_stat_statements’ underlying tracking is cluster-wide: the
shared-memory hash collects queries from every database on the
cluster regardless of where CREATE EXTENSION has been run.
CREATE EXTENSION controls where the view exists (which DB a
connected user can SELECT * FROM pg_stat_statements from), not
what is tracked.
Practical implication: operators connect to the app DB they’re
debugging (shop_access, etc.), not to the postgres admin DB.
Creating the extension only in postgres would mean an extra
disconnect/reconnect every time someone wants to see slow-query data
— with no offsetting benefit, since the underlying data is the
same. Creating it in every app DB the initializer creates is the
right convention.
Earlier drafts of PDEV-498 prescribed admin-DB-only creation; that was overly conservative and is superseded by this proposal.
How the image works today
Section titled “How the image works today”entrypoint.shrunspsqlwith-f values.sql -f setup.sql -f create.sql(ordestroy.sqlondown).create.sqlcreates the app DB + owner role + role for permissions, then\c :"database_name"switches to that DB and creates any per-DB extensions (pg_trgm,btree_ginalways; plus anything invalues.propertiesextensions=…).- The extension list is normalised in
entrypoint.sh(lines 37–52): always-included floor + user-supplied additions, deduplicated and sorted. setup.sqlsetsON_ERROR_STOP onfor the whole session — any individual SQL command failure aborts the script.
Proposed design
Section titled “Proposed design”Add pg_stat_statements to the existing always-included extensions
floor. No new property, no new code path. The extension joins
pg_trgm and btree_gin in the default list that gets created in
every app DB the initializer creates.
echo "pg_trgm,btree_gin,$(sed -n -e 's/^extensions=//p' "${values}")" |echo "pg_trgm,btree_gin,pg_stat_statements,$(sed -n -e 's/^extensions=//p' "${values}")" |That’s the single substantive change. Everything else in the image
already works: the extension renders into :'extensions', gets
created via the existing \gexec loop in create.sql lines 61–64,
and is now visible to anyone connecting to the app DB with read
privileges on pg_stat_statements (typically rds_superuser /
pg_read_all_stats).
Deliberately fail-loud
Section titled “Deliberately fail-loud”pg_stat_statements is the first member of the default extensions
floor with a cluster-level prerequisite —
shared_preload_libraries must include it (set via PDEV-479’s
DB-cluster parameter group + instance restart). If the prerequisite
isn’t satisfied, CREATE EXTENSION errors with ERROR: pg_stat_statements must be loaded via shared_preload_libraries.
pg_trgm and btree_gin are stock-Postgres extensions with no
preload dependency, so this is a new failure mode for the image.
The deliberate choice is to let this fail loud: the existing
ON_ERROR_STOP on semantics will abort the init container, K8s will
retry, the pod stays in Init:CrashLoopBackOff until the cluster
parameter group is applied.
Rationale:
- The failure happens at most once per cluster, ever — once
shared_preload_librariesis set on a cluster, every subsequent pod-start succeeds. - The failure surfaces in dev first (the lowest-impact env), which immediately prompts a review of cluster configuration across the rest of the platform — exactly the right escalation timing.
- The alternative (fail-soft, warn-and-continue) would risk silently
rolling out without the observability data ever materialising;
PDEV-490’s work would then quietly proceed without
pg_stat_statementsdata and produce sub-optimal SQL fixes. - Low practical blast radius: this is a one-time bootstrap coordination, not an ongoing reliability concern.
The image’s existing semantics (ON_ERROR_STOP on, init container
crash-loops on error) are correct here; no special-casing needed.
Files to change
Section titled “Files to change”1. src/main/docker/entrypoint.sh
Section titled “1. src/main/docker/entrypoint.sh”One line. Add pg_stat_statements to the default extensions floor on
line 39 (between btree_gin and the user-supplied $(sed …)
expansion).
for extension in $( echo "pg_trgm,btree_gin,$(sed -n -e 's/^extensions=//p' "${values}")" | echo "pg_trgm,btree_gin,pg_stat_statements,$(sed -n -e 's/^extensions=//p' "${values}")" | awk -F',' '{ for (i = 1; i <= NF; i++) { gsub(/^[[:space:]]+|[[:space:]]+$/, "", $i); if ($i != "") print $i } }' | sort -u \ ); do2. src/main/docker/create.sql
Section titled “2. src/main/docker/create.sql”The existing \gexec loop on lines 61–64 already iterates the
comma-separated :'extensions' value and emits CREATE EXTENSION IF NOT EXISTS for each entry. With pg_stat_statements now in the
floor, it’s just another item the loop handles.
Empirical correction added during implementation: On modern
Postgres (16+), CREATE EXTENSION pg_stat_statements succeeds
without shared_preload_libraries — the schema objects install
cleanly, but the underlying shared-memory hash is never allocated.
Queries against pg_stat_statements then error with
pg_stat_statements must be loaded via shared_preload_libraries at
query time, not at create time. Verified empirically against
postgres:16-alpine3.20 during the implementation of this ticket.
To achieve the fail-loud-at-init-time design intent (so the failure
surfaces before downstream consumers query and find silent emptiness),
add an explicit verify-query immediately after the extension-creation
\gexec block:
-- Verify pg_stat_statements is functional, not just installed.DO $$BEGIN IF EXISTS (SELECT 1 FROM pg_extension WHERE extname = 'pg_stat_statements') THEN PERFORM count(*) FROM pg_stat_statements; END IF;END;$$;Without preload, the PERFORM raises the “must be loaded” error;
ON_ERROR_STOP on (from setup.sql) aborts the init script; pod
enters Init:CrashLoopBackOff — fail-loud as designed.
3. src/main/docker/destroy.sql
Section titled “3. src/main/docker/destroy.sql”No change. Existing behaviour is correct — the script drops the app DB and roles, not the extensions inside it (they go with the DB).
4. README.md
Section titled “4. README.md”Update the extensions row of the values.properties table to
mention the new floor member, and add a short paragraph about the
cluster prerequisite:
| Property | Required | Description |
|---|---|---|
extensions | no | Comma-separated names of Postgres extensions to create; pg_trgm, btree_gin, and pg_stat_statements are always included. |
New paragraph (placement: right after the values.properties table):
Cluster prerequisite for
pg_stat_statements. The image always attempts to create thepg_stat_statementsextension. This requires the cluster to havepg_stat_statementslisted in itsshared_preload_libraries(typically set via an Aurora DB-cluster parameter group + instance restart). If the prerequisite is not satisfied, the init container fails with an error and the pod entersInit:CrashLoopBackOffuntil the cluster configuration is corrected. This is a one-time bootstrap concern, not an ongoing reliability concern — once configured, every subsequent initialization succeeds.
5. src/test/docker/
Section titled “5. src/test/docker/”Two complementary test fixtures:
(a) Happy-path fixture — configure the test Postgres with
shared_preload_libraries=pg_stat_statements via a command:
override in compose.yaml (e.g.,
command: ['postgres', '-c', 'shared_preload_libraries=pg_stat_statements']).
Assert that after initializer runs, pg_stat_statements is present
in pg_extension inside the app DB (joined with pg_namespace if
necessary to confirm the schema). Use the existing
values_all.properties / values_minimal.properties fixtures
unchanged.
(b) Prerequisite-missing fixture — run against a stock
postgres:16 (no preload override) and assert the initializer exits
non-zero with pg_stat_statements must be loaded in stderr. Confirms
the fail-loud contract.
Concrete shape depends on compose.yaml and tests.sh
conventions (read these before finalising the test layout).
6. CHANGELOG.md
Section titled “6. CHANGELOG.md”The repo uses the direct-edit CHANGELOG model (visible from
.github/clq/changemap.json). Suggested entry:
Added
pg_stat_statementsto the default extensions floor — the extension is now created in every database the initializer provisions. Cluster prerequisite:shared_preload_librariesmust includepg_stat_statements(deliberately fail-loud if not set). Required by Arda PDEV-498 to expose cluster-wide query observability from any application database.
Edge cases / failure modes
Section titled “Edge cases / failure modes”| Scenario | Behaviour |
|---|---|
Cluster has shared_preload_libraries=pg_stat_statements (PDEV-479 deployed) | Extension created in the app DB; visible from SELECT * FROM pg_extension WHERE extname = 'pg_stat_statements'. |
| Cluster does not have the parameter group applied yet | CREATE EXTENSION errors; ON_ERROR_STOP on aborts the init script; pod stays in Init:CrashLoopBackOff. Surfaces clearly in kubectl describe pod and pod logs. Deploy-order escalation prompts cluster-config review. |
Master user lacks rds_superuser / sufficient privilege | Same as above — CREATE EXTENSION errors, init crash-loops. |
extensions=foo,bar user-supplied additions | Joined with the floor (now including pg_stat_statements); deduplicated; sorted; each created in turn. No change from existing behaviour. |
| Extension name with invalid characters | Validation in entrypoint.sh (existing regex) rejects before reaching SQL. No change. |
Deploy-order coordination
Section titled “Deploy-order coordination”This image change ships independently and produces a new image tag. The operations chart (and any other chart using this initializer) picks up the new image via its own bump-the-tag flow — not part of this ticket. Coordination requirement:
- PDEV-479 lands the DB-cluster parameter group on each Aurora cluster (with the instance restart) before any chart that references the new initializer image deploys to that cluster.
- If a chart redeploy lands first by accident, the operations pod
crash-loops on
Init. Fix is to apply PDEV-479’s parameter group and let K8s retry.
PDEV-498 already links blocks: PDEV-490 to express the downstream
consumer relationship. The PDEV-479 ↔ PDEV-498 deploy ordering is a
soft sequencing concern (recoverable, not a hard data-loss risk).
Why this is simpler than the original proposal
Section titled “Why this is simpler than the original proposal”The earlier draft of this doc proposed a separate cluster_extensions
property and an admin-DB-only CREATE block before the \c switch.
That added a new concept to the image, a new SQL code path, and a
fail-soft DO/EXCEPTION wrapper — all in service of an outcome
that’s worse (the view only visible from postgres, not from app
DBs). This revised design drops all of that complexity and reduces
the implementation to a one-line entrypoint.sh change plus tests +
docs.
Out of scope
Section titled “Out of scope”- Cluster-level parameter group — PDEV-479’s responsibility.
CREATE EXTENSIONin thepostgresadmin DB — not pursued. Every app DB gets the view; operators read from there. If a future need surfaces for admin-DB visibility (e.g., a tooling integration that connects only topostgres), revisit then.pg_stat_statements.track,pg_stat_statements.maxtuning — covered by parent PDEV-479’s parameter-group change.- Operations chart init-container image-tag bump — handled by the normal release pipeline, not by this ticket or change.
Copyright: © Arda Systems 2025-2026, All rights reserved