What we measure?

We run PostgreSQL 17 directly on EC2 instances (self-managed, not RDS), with its data on a dedicated gp3 volume. Against each instance we run a simple, realistic workload: a single indexed key-value table, queried by 90% primary-key reads and 10% single-row inserts from 32 concurrent connections. It is the access pattern behind a typical OLTP application – look rows up by ID, write new ones.

The same workload is repeated across three dataset sizes – 1 GB, 10 GB, and 50 GB – so you can see how an instance behaves when the data fits comfortably in memory and when it stops fitting. Each row is 512 bytes; the table is preloaded before any measurement starts.

How a benchmark runs?

For each instance type and disk configuration, an automated harness provisions a fresh pair of EC2 hosts in the same availability zone (us-east-1): one running PostgreSQL, and a separate load-generator host that issues the queries. Keeping the client on its own machine means the database never competes with the benchmark tool for CPU, and using the same load-generator instance type (c7g.large) for every run keeps the client side constant across comparisons.

Each measurement follows the same script:

When benchmarking is finished, the infrastructure is destroyed and the next one starts from scratch. Results are stored per run, so the published numbers can always be traced back to individual measurements.

What the numbers mean?

Every figure shown is the mean of the three repeated runs; the spread (min, max, standard deviation) is kept in the dataset. Throughput is requests per second. Latency is reported as the average plus the 95th and 99th percentiles, because tail latency is usually what your users feel. Cost-efficiency figures divide throughput by the us-east-1 on-demand price of the instance plus its disk – no reservations, no spot pricing.

If anything here seems off, or you want a workload or instance type covered, the blog is the place to reach me.