Logo

dev-resources.site

for different kinds of informations.

Postgres partitioning performance: Hash vs. List

Published at
5/1/2024
Categories
postgres
performance
hash
index
Author
msdousti
Categories
4 categories in total
postgres
open
performance
open
hash
open
index
open
Author
8 person written this
msdousti
open
Postgres partitioning performance: Hash vs. List

In our design, we came up with a would-be large PostgreSQL table that just stores IDs of incoming (Kafka) events, for the purpose of de-duplication. The IDs are of type UUID.

create table t(id uuid primary key);
Enter fullscreen mode Exit fullscreen mode

After some consideration, we decided to partition this table into 16 partitions.

Before we continue, let's make it clear that this article is about a very specific case and workload. However, it should provide you with enough insight to customize it based on your own needs.

Hash partitioning

The initial idea was to use hash partitioning.

drop table if exists t;

create table t(id uuid primary key)
    partition by hash(id);

create table t_00
    partition of t 
    for values with (modulus 16, remainder 0);

create table t_01
    partition of t 
    for values with (modulus 16, remainder 1);

create table t_02
    partition of t 
    for values with (modulus 16, remainder 2);

create table t_03
    partition of t 
    for values with (modulus 16, remainder 3);

create table t_04
    partition of t 
    for values with (modulus 16, remainder 4);

create table t_05
    partition of t 
    for values with (modulus 16, remainder 5);

create table t_06
    partition of t 
    for values with (modulus 16, remainder 6);

create table t_07
    partition of t 
    for values with (modulus 16, remainder 7);

create table t_08
    partition of t 
    for values with (modulus 16, remainder 8);

create table t_09
    partition of t 
    for values with (modulus 16, remainder 9);

create table t_10
    partition of t 
    for values with (modulus 16, remainder 10);

create table t_11
    partition of t 
    for values with (modulus 16, remainder 11);

create table t_12
    partition of t 
    for values with (modulus 16, remainder 12);

create table t_13
    partition of t 
    for values with (modulus 16, remainder 13);

create table t_14
    partition of t 
    for values with (modulus 16, remainder 14);

create table t_15
    partition of t 
    for values with (modulus 16, remainder 15);
Enter fullscreen mode Exit fullscreen mode

A competing idea was to ditch hash and use the last digit of the id as partitioning key. I'll discuss the idea in the next section, but let's first benchmark the hash partitioning approach:

time \
pgbench -c10 -t 900 -j30 -n -f - << EOF
insert into t
select gen_random_uuid()
from generate_series(1, 1000);
EOF
Enter fullscreen mode Exit fullscreen mode

I chose to run 10 connections to DB, each sending 900 queries to the DB, using 30 concurrent threads. Each query will insert 1000 UUIDs in the table.

Why these numbers? Just for fun. They should conform to the real traffic to be indicative of anything. But let's just see how this turns out on my personal laptop (old, 2017 model!):

pgbench (16.2 (Ubuntu 16.2-1ubuntu4))
transaction type: -
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 900
number of transactions actually processed: 9000/9000
number of failed transactions: 0 (0.000%)
latency average = 13.776 ms
initial connection time = 6.491 ms
tps = 725.901931 (without initial connection time)

real    0m12.438s
user    0m0.162s
sys     0m0.396s
Enter fullscreen mode Exit fullscreen mode

It took 12.4 seconds to insert 9,000,000 rows. The average TPS (transaction per second) is 725.9.

Using psql metacommands, we can see the table/index sizes:

  • Using \dt+ to see table sizes (some columns are removed for brevity):
β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Name β”‚       Type        β”‚  Size   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ t    β”‚ partitioned table β”‚ 0 bytes β”‚
β”‚ t_00 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_01 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_02 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_03 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_04 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_05 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_06 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_07 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_08 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_09 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_10 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_11 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_12 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_13 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_14 β”‚ table             β”‚ 24 MB   β”‚
β”‚ t_15 β”‚ table             β”‚ 24 MB   β”‚
β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode
  • Using \di+ to see index sizes (some columns are removed for brevity):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Name    β”‚       Type        β”‚  Size   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ t_pkey    β”‚ partitioned index β”‚ 0 bytes β”‚
β”‚ t_00_pkey β”‚ index             β”‚ 21 MB   β”‚
β”‚ t_01_pkey β”‚ index             β”‚ 21 MB   β”‚
β”‚ t_02_pkey β”‚ index             β”‚ 22 MB   β”‚
β”‚ t_03_pkey β”‚ index             β”‚ 20 MB   β”‚
β”‚ t_04_pkey β”‚ index             β”‚ 21 MB   β”‚
β”‚ t_05_pkey β”‚ index             β”‚ 21 MB   β”‚
β”‚ t_06_pkey β”‚ index             β”‚ 21 MB   β”‚
β”‚ t_07_pkey β”‚ index             β”‚ 20 MB   β”‚
β”‚ t_08_pkey β”‚ index             β”‚ 20 MB   β”‚
β”‚ t_09_pkey β”‚ index             β”‚ 21 MB   β”‚
β”‚ t_10_pkey β”‚ index             β”‚ 21 MB   β”‚
β”‚ t_11_pkey β”‚ index             β”‚ 21 MB   β”‚
β”‚ t_12_pkey β”‚ index             β”‚ 21 MB   β”‚
β”‚ t_13_pkey β”‚ index             β”‚ 21 MB   β”‚
β”‚ t_14_pkey β”‚ index             β”‚ 21 MB   β”‚
β”‚ t_15_pkey β”‚ index             β”‚ 21 MB   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

Notice the indexes are almost as large as the tables themselves. Also while the data is equally distributed among partitions (each 24 MB), the index sizes range from 20 to 22 MB. The total size for indexes is 334 MB.

List partitioning

If we want to use the right digit of the id as the partitioning key, the primary key cannot be added to the parent table:

create table t(id uuid primary key)
partition by list (lower(left(id::text, 1)));
Enter fullscreen mode Exit fullscreen mode

results in error:

ERROR:  unsupported PRIMARY KEY constraint with partition key definition
DETAIL:  PRIMARY KEY constraints cannot be used when partition keys include expressions.
Enter fullscreen mode Exit fullscreen mode

So, we decided to add the primary key to each individual partition (this effectively enforces uniqueness across all data):

drop table if exists t;

create table t(id uuid not null)
partition by list (lower(left(id::text, 1)));

create table t_00
partition of t
(id primary key)
for values in ('0');

create table t_01
partition of t
(id primary key) 
for values in ('1');

create table t_02
partition of t
(id primary key) 
for values in ('2');

create table t_03
partition of t
(id primary key) 
for values in ('3');

create table t_04
partition of t
(id primary key) 
for values in ('4');

create table t_05
partition of t
(id primary key) 
for values in ('5');

create table t_06
partition of t
(id primary key) 
for values in ('6');

create table t_07
partition of t
(id primary key) 
for values in ('7');

create table t_08
partition of t
(id primary key) 
for values in ('8');

create table t_09
partition of t
(id primary key) 
for values in ('9');

create table t_10
partition of t
(id primary key) 
for values in ('a');

create table t_11
partition of t
(id primary key) 
for values in ('b');

create table t_12
partition of t
(id primary key) 
for values in ('c');

create table t_13
partition of t
(id primary key) 
for values in ('d');

create table t_14
partition of t
(id primary key) 
for values in ('e');

create table t_15
partition of t
(id primary key) 
for values in ('f');
Enter fullscreen mode Exit fullscreen mode

Now let's benchmark again:

time \
pgbench -c10 -t 900 -j30 -n -f - << EOF
insert into t
select gen_random_uuid()
from generate_series(1, 1000);
EOF
Enter fullscreen mode Exit fullscreen mode

result:

pgbench (16.2 (Ubuntu 16.2-1ubuntu4))
transaction type: -
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 900
number of transactions actually processed: 9000/9000
number of failed transactions: 0 (0.000%)
latency average = 15.123 ms
initial connection time = 8.810 ms
tps = 661.264382 (without initial connection time)

real    0m13.654s
user    0m0.150s
sys     0m0.409s
Enter fullscreen mode Exit fullscreen mode

This is slower than the hash partition:

  • Duration: 13.654s instead of 12.438s
  • TPS: 661.264382 instead of 725.901931

So, we lose the primary key, and it's even slower! Hash partitioning is a clear winner here.

Using \dt+ and \di+ results in almost identical results, so let's not repeat myself.

Using hash indexes instead of b-tree indexes

Another suggested approach is to enforce uniqueness using hash indexes instead of b-tree indexes. The benefit is that they are often smaller and faster than b-tree indexes in case equality checking is the only important operation.

Postgres primary keys do not yet support this, but we may use a hack by applying a non-equality constraint based on hashes:

drop table if exists t;

create table t(id uuid not null)
partition by list (left(id::text, 1));

create table t_00
partition of t
(exclude using hash(id with =))
for values in ('0');

create table t_01
partition of t
(exclude using hash(id with =)) 
for values in ('1');

create table t_02
partition of t
(exclude using hash(id with =)) 
for values in ('2');

create table t_03
partition of t
(exclude using hash(id with =)) 
for values in ('3');

create table t_04
partition of t
(exclude using hash(id with =)) 
for values in ('4');

create table t_05
partition of t
(exclude using hash(id with =)) 
for values in ('5');

create table t_06
partition of t
(exclude using hash(id with =)) 
for values in ('6');

create table t_07
partition of t
(exclude using hash(id with =)) 
for values in ('7');

create table t_08
partition of t
(exclude using hash(id with =)) 
for values in ('8');

create table t_09
partition of t
(exclude using hash(id with =)) 
for values in ('9');

create table t_10
partition of t
(exclude using hash(id with =)) 
for values in ('a');

create table t_11
partition of t
(exclude using hash(id with =)) 
for values in ('b');

create table t_12
partition of t
(exclude using hash(id with =)) 
for values in ('c');

create table t_13
partition of t
(exclude using hash(id with =)) 
for values in ('d');

create table t_14
partition of t
(exclude using hash(id with =)) 
for values in ('e');

create table t_15
partition of t
(exclude using hash(id with =)) 
for values in ('f');
Enter fullscreen mode Exit fullscreen mode

Let's benchmark this as well:

time \
pgbench -c10 -t 900 -j30 -n -f - << EOF
insert into t
select gen_random_uuid()
from generate_series(1, 1000);
EOF
Enter fullscreen mode Exit fullscreen mode

result:

pgbench (16.2 (Ubuntu 16.2-1ubuntu4))
transaction type: -
scaling factor: 1
query mode: simple
number of clients: 10
number of threads: 10
maximum number of tries: 1
number of transactions per client: 900
number of transactions actually processed: 9000/9000
number of failed transactions: 0 (0.000%)
latency average = 16.686 ms
initial connection time = 7.089 ms
tps = 599.314265 (without initial connection time)

real    0m15.067s
user    0m0.127s
sys     0m0.468s
Enter fullscreen mode Exit fullscreen mode

Well, I didn't expect that. It's even slower now. Looking at the table sizes (\dt+), they are the same as before (24MB).

However, index sizes (\di+) are a tiny bit smaller:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”
β”‚     Name     β”‚ Size  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
β”‚ t_00_id_excl β”‚ 20 MB β”‚
β”‚ t_01_id_excl β”‚ 20 MB β”‚
β”‚ t_02_id_excl β”‚ 20 MB β”‚
β”‚ t_03_id_excl β”‚ 20 MB β”‚
β”‚ t_04_id_excl β”‚ 20 MB β”‚
β”‚ t_05_id_excl β”‚ 20 MB β”‚
β”‚ t_06_id_excl β”‚ 20 MB β”‚
β”‚ t_07_id_excl β”‚ 20 MB β”‚
β”‚ t_08_id_excl β”‚ 20 MB β”‚
β”‚ t_09_id_excl β”‚ 20 MB β”‚
β”‚ t_10_id_excl β”‚ 20 MB β”‚
β”‚ t_11_id_excl β”‚ 20 MB β”‚
β”‚ t_12_id_excl β”‚ 20 MB β”‚
β”‚ t_13_id_excl β”‚ 20 MB β”‚
β”‚ t_14_id_excl β”‚ 20 MB β”‚
β”‚ t_15_id_excl β”‚ 20 MB β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

So, in total, the size of the index was reduced from 334 MB to 320 MB.

Summary

  • Hash partitioning outperforms list partitioning in the above example (pay attention: not always)
  • Hash partitioning has the added benefit that all tables have primary keys (again, specific to the above example). This is important when using logical replication. For instance, to use AWS RDS blue/green deployment:

    Make sure that all tables in the DB instance have a primary key. PostgreSQL logical replication doesn't allow UPDATE or DELETE operations on tables that don't have a primary key.

  • Using hash indexes instead of b-tree was not a performance boost, but reduced index size by less than 5%.

Edits

Edit (2024-05-03)

A colleague of mine, who asked to remain anonymous, explained the reason why list-partitioning was slower:

In your example, to compute the partition key of the list-based approach, you use a cast (cast UUID to text), then two functions are applied (LEFT and LOWER). The functions should be pretty quick, but the cast is slow. That’s why the combined effect is slower than the hash function, which is implemented in C and is quite fast.

Another colleague, Tim, gave a nice summary:

So, if I got it right, in essence it says β€œDon’t try to be fancy, just do it in a boring way and PostgreSQL will deal with it in an optimized way.”

Which reminds me of a story of when we implemented a variant of strlen() function and observed it’s slower than the GLIBC library function by a factor of 300 times! I should write a post on that too πŸ™‚

Further reading

For more info on hash indexes, look at this Twitter thread and the links thereof.

index Article's
30 articles in total
Favicon
Avoiding the Pitfalls of Duplicate Indexes in MySQL
Favicon
SQL Performance: A Deep Dive into Indexing with examples
Favicon
Tried-and-True Methods to Immediately Get Google to Index Your Website
Favicon
Mysql Database Index Explained for Beginners
Favicon
Database: Indexing issues with random UUIDs
Favicon
Nol indekslangan to'plamlar
Favicon
Quantitative trading strategy using trading volume weighted index
Favicon
Apache SOLR
Favicon
How to Replace Character at Nth Index in Python String
Favicon
Postgres partitioning performance: Hash vs. List
Favicon
A Novice Guide to Azure AI Search
Favicon
PostgreSQL index Correlation with UPDATE
Favicon
fetch vs index methods when working with arrays in Ruby
Favicon
How To Quickly Define an Efficient SQL Index for GROUP BY Queries
Favicon
Optimize Mongo DB Performance By Adding Indexes to Your Collection
Favicon
Optimizing Database Performance with Concatenated Indexes
Favicon
A Guide to Sargable Queries
Favicon
The Most Annoying Optimizer Fail in Postgres βœ… Best index solved it.
Favicon
To speed up the search process, let’s set an index for array data within a JSON-formatted column
Favicon
How To Get Google To Index Your Site Fast in 2024 | Index your website Readmorr.com
Favicon
Detectando Γ­ndices no MongoDB
Favicon
10 Best Practices While Using MongoDB Indexes
Favicon
Summarize Heap-Only Tuple and Index-Only Scans in PostgreSQL
Favicon
Uncovering the Significance of Indexes in Apache AGE
Favicon
Partial Indexes in MongoDB: A Brief Overview
Favicon
Making sure our database queries are fast
Favicon
Principles of Database Index Design
Favicon
The Simple SQL Question You May Be Asked on Interview and Fail
Favicon
Index usage monitoring in YugabyteDB & PostgreSQL
Favicon
Database Indexing with PostgreSQL

Featured ones: