Benchmark methodology

Exact methodology behind the PostgreSQL-on-EC2 numbers on this site: versions, configuration deltas, instance and storage specs, dataset sizes, the measurement procedure, and the limitations that bound it.

The benchmarking tool that provisions the hosts, runs the workload, and produces the data on this site is open source: github.com/anivaniuk/sanebench.

Benchmark runs: 30 May – 17 June 2026 (us-east-1). 414 measured repetitions across 23 instance types.

What we measure

Self-managed PostgreSQL 17.8 on EC2 (not RDS), one workload, one access pattern: 90% primary-key reads, 10% single-row inserts – the shape behind a typical OLTP service. Every figure is a real measurement against a real instance.

Environment

PostgreSQL	17.8, from the Amazon Linux 2023 `postgresql17` package. The bootstrap aborts if the running server is not major 17, so no run can silently record the wrong version.
OS / AMI	Amazon Linux 2023, official AWS AMIs (x86-64 and arm64).
Region / AZ	us-east-1, single AZ us-east-1a (DB and client co-located).
Filesystem	XFS on a dedicated EBS volume mounted at `/pgdata`; PostgreSQL is the only writer.
Connection path	Client → pgbouncer 1.24.1 (transaction pooling) → PostgreSQL on localhost:5432. Pool size is `vCPUs × 4`.
Client / driver	A separate `c7g.large` (Graviton) load generator in the same AZ, identical for every run, using the Go `pgx` driver (extended protocol, prepared statements). Keeping the client off the DB host and constant across runs keeps comparisons fair.

Workload

Pattern	`mixed_90_10` – 90% random primary-key SELECTs, 10% single-row INSERTs.
Schema	One logged, primary-keyed table (`sanebench_kv`), 512 bytes per row. Inserts append past the preloaded max via a dedicated sequence, hitting the same B-tree the reads contend on.
Concurrency	32 concurrent connections.
Dataset sizes	1 GB (1,953,125 rows), 10 GB (19,531,250), 50 GB (97,656,250) – run on each instance to show behaviour as the working set crosses the memory boundary. The table is fully preloaded before any measurement.

Storage

PGDATA lives on a 200 GB gp3 EBS volume. Each instance is benchmarked against two provisioned profiles so you can separate compute limits from I/O limits:

Profile	Size	IOPS	Throughput
Baseline	200 GB	3,000	125 MB/s
High	200 GB	16,000	1,000 MB/s

PostgreSQL configuration

Identical policy on every host. Everything not listed below is left at the PostgreSQL default; the worker dumps the full effective pg_settings into each run’s output, so any value can be verified after the fact.

Setting	Value	Note
`shared_buffers`	25% of RAM	Sized per instance from this host’s memory.
`effective_cache_size`	70% of RAM	Sized per instance.
`huge_pages`	`try` (effectively off)	Left at the PostgreSQL default. No huge pages are reserved at the OS level (`vm.nr_hugepages` is 0), so `try` silently falls back and `shared_buffers` runs on standard 4 KB pages.
`max_connections`	100	Pooled access stays well under this.
`wal_compression`	on (pglz)	Default is off.
`checkpoint_timeout`	15min	Default is 5min.
table autovacuum	disabled	Set on `sanebench_kv` after load – see limitations.

Procedure

For each instance type × disk profile, an automated harness (over AWS SSM, no SSH) provisions a fresh, isolated pair of hosts and runs:

Load the dataset in parallel, add the primary key, ANALYZE, then disable table-level autovacuum.
Warm up 2 minutes so caches settle and hint bits are set – warmup traffic is never counted.
Measure 10 minutes, recording throughput and latency.
Repeat the measured window 3 times on the same warm server.
Tear down all infrastructure; the next combination starts clean.

Each repetition is stored as its own record (with full metadata and a methodology fingerprint), so every published number traces back to an individual measurement.

What the numbers mean

Each figure is the mean of the 3 repetitions; min, max, and standard deviation are kept in the dataset. Throughput is requests per second. Latency is reported as average plus the 95th and 99th percentiles, because tail latency is what users feel. Cost-efficiency divides throughput by the us-east-1 on-demand price of the instance plus its EBS volume.

Limitations

Repeats are not fully independent samples. The 3 windows run back-to-back on one warm, evolving session; the insert workload grows the table across them. However, this shouldn't significantly influence the results. This is also consistent across all the measurements.
Autovacuum is disabled on the table during measurement. This prevents an autovacuum worker – triggered by the bulk load – from eating the gp3 IOPS budget mid-run. This is a known limitaion of the current model, and I plan to rework it in later iterations.
pgbouncer is in the path. Numbers reflect pooled, transaction-mode access. Direct libpq connections, session pooling, or a different pool size would shift results.
Single point in the design space. Concurrency is fixed at 32; datasets top out at 50 GB; one region, one AZ, gp3 only. Very large working sets, higher concurrency, io2, and instance storage are currently not covered.
T-family results are Standard credit mode – sustained included-price behaviour. Bursting only lasts while CPU credits are earned; a credit-starved t3/t4g will perform worse than shown.
The client can bottleneck at high load. It is held constant (c7g.large) so comparisons stay valid, but an instance that out-runs the client would read as flat. The plan is to use more performant client later for more performant DB instances.
Point-in-time. Measured May–June 2026. AWS hardware, AMIs, and pricing change; re-run before betting on absolute figures.

Found a problem, or want a workload or instance type covered? The blog is the place to reach me.