Logo

dev-resources.site

for different kinds of informations.

No Gap Ordered Numbering in SQL: A Unique Index to Serialize In Read Committed

Published at
11/25/2024
Categories
yugabytedb
distributed
postgres
database
Author
franckpachot
Author
12 person written this
franckpachot
open
No Gap Ordered Numbering in SQL: A Unique Index to Serialize In Read Committed

If you require a no-gap numbering system, using a sequence is not an option (see reasons here). Instead, you can utilize a table with fixed rows that you manually update. However, let's consider a scenario where you cannot have fixed rows and need to read the last value from the table in order to calculate the next number.

Which isolation level do you think you need? You must be sure that the max() you have read stays the same until you commit the max()+1 insert. This usually requires a serializable isolation level to order the transactions one after the other. You don't want a phantom read to appear once you have calculated the next value.

I have previously discussed how an index can enhance the performance under a serializable isolation level. Here, I will create an index to achieve serialization without using the serializable isolation level.

Here is a simple table:

create table demo (
 primary key(id)
 , id int generated always as identity
 , type varchar(10)
 , num int
 , unique(type, num)
);
Enter fullscreen mode Exit fullscreen mode

My business rule states that each "type" is assigned a number, starting from one for the first row with this type and increasing in the order in which they were committed without allowing any gaps.

The following query can do that:

explain
insert into demo(type,num)
  select type, nextnum from (
   select 'x' as type , coalesce(max(num),0)+1 as nextnum
   from demo
   where type='x'
) next
;
Enter fullscreen mode Exit fullscreen mode

In a Serializable isolation level, the SELECT acquires a lock to declare the read intent so that a write can detect a conflict between the read and write state. In PostgreSQL, it is a predicate lock, and in YugabyteDB, a lock on a key range. As I exposed in the previous posts of this series, an index can improve this to avoid lock escalation to the whole table. As, here, I have defined a unique constraint, it has created an index. The same index is also used to get the last value quickly, as it is always at the end of the index.

Without a unique index, an isolation level lower than Serializable would not be able to detect the conflict when two transactions read the same last value simultaneously and then attempt to insert that same value. This situation could lead to duplicate entries. However, at the serializable isolation level, such anomalies would be detected due to the read-write conflict, preventing duplicates from occurring.

As the index is unique, the write-write conflict when two sessions try to insert the same value avoids such an anomaly. For this reason, our logic can work in the Reading Committed isolation level.

In terms of performance, it is clear that it only scans one index entry, and there is only one Flush Request required to obtain the next value and insert the new row, which ensures that it does not add latency:

yugabyte=# delete from demo;
DELETE 10002

yugabyte=# -- add many types
yugabyte=# insert into demo(type, num) select format('#%s',type),num from generate_series(1,100) as type, generate_series(1,100) as num;
INSERT 0 10000

yugabyte=# -- prepared statement to call two times
yugabyte=# prepare insert_new(text) as
 insert into demo(type,num)
  select type, nextnum from (
   select $1 as type , coalesce(max(num),0)+1 as nextnum
   from demo where type=$1
) next returning *
;

yugabyte=# -- call a new type
yugabyte=# explain (analyze, dist) execute insert_new('#0');
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on demo  (cost=4.12..4.15 rows=1 width=46) (actual time=0.483..0.484 rows=1 loops=1)
   Storage Table Write Requests: 1
   Storage Index Write Requests: 1
   ->  Subquery Scan on next  (cost=4.12..4.15 rows=1 width=46) (actual time=0.437..0.438 rows=1 loops=1)
         ->  Result  (cost=4.12..4.13 rows=1 width=36) (actual time=0.432..0.432 rows=1 loops=1)
               InitPlan 1 (returns $0)
                 ->  Limit  (cost=0.00..4.12 rows=1 width=4) (actual time=0.429..0.429 rows=0 loops=1)
                       ->  Index Only Scan Backward using demo_type_num_key on demo demo_1  (cost=0.00..4.12 rows=1 width=4) (actual time=0.427..0.427 rows=0 loops=1)
                             Index Cond: ((type = '#0'::text) AND (num IS NOT NULL))
                             Heap Fetches: 0
                             Storage Index Read Requests: 1
                             Storage Index Read Execution Time: 0.327 ms
 Planning Time: 0.217 ms
 Execution Time: 6.016 ms
 Storage Read Requests: 1
 Storage Read Execution Time: 0.327 ms
 Storage Rows Scanned: 0
 Storage Write Requests: 2
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 1
 Storage Flush Execution Time: 5.419 ms
 Storage Execution Time: 5.746 ms
 Peak Memory Usage: 56 kB
(24 rows)

yugabyte=# -- call an existing type
yugabyte=# explain (analyze, dist) execute insert_new('#1');
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Insert on demo  (cost=4.12..4.15 rows=1 width=46) (actual time=0.568..0.569 rows=1 loops=1)
   Storage Table Write Requests: 1
   Storage Index Write Requests: 1
   ->  Subquery Scan on next  (cost=4.12..4.15 rows=1 width=46) (actual time=0.516..0.517 rows=1 loops=1)
         ->  Result  (cost=4.12..4.13 rows=1 width=36) (actual time=0.511..0.512 rows=1 loops=1)
               InitPlan 1 (returns $0)
                 ->  Limit  (cost=0.00..4.12 rows=1 width=4) (actual time=0.507..0.508 rows=1 loops=1)
                       ->  Index Only Scan Backward using demo_type_num_key on demo demo_1  (cost=0.00..4.12 rows=1 width=4) (actual time=0.506..0.506 rows=1 loops=1)
                             Index Cond: ((type = '#1'::text) AND (num IS NOT NULL))
                             Heap Fetches: 0
                             Storage Index Read Requests: 1
                             Storage Index Read Execution Time: 0.387 ms
                             Storage Index Rows Scanned: 1
 Planning Time: 0.198 ms
 Execution Time: 5.136 ms
 Storage Read Requests: 1
 Storage Read Execution Time: 0.387 ms
 Storage Rows Scanned: 1
 Storage Write Requests: 2
 Catalog Read Requests: 0
 Catalog Write Requests: 0
 Storage Flush Requests: 1
 Storage Flush Execution Time: 3.766 ms
 Storage Execution Time: 4.153 ms
 Peak Memory Usage: 56 kB
(25 rows)

Enter fullscreen mode Exit fullscreen mode

In both cases, there's one Read Request to read the maximum value, two Write Requests to insert in the table and index, and only one Flush Request to wait on Raft consensus, which is the latency to the follower.
I utilized a secondary index to evaluate the worst-case scenario. However, the primary key will likely be (type, num). In YugabyteDB, unlike in PostgreSQL, the table is stored based on the primary key, which means that an insert is a single write operation. This design has the advantage of colocating rows with the same type, which are likely to be queried together.

Remember that SQL isolation levels were established without considering Multi-Version Concurrency Control (MVCC) reads, explicit locks, or indexes, as these elements are not included in the SQL standard. To achieve efficient and consistent concurrency control, it is essential to understand how your database works and how it acquires locks. This is also why YugabyteDB is runtime-compatible with PostgreSQL, ensuring similar behavior. Wire protocol and dialect compatibility are insufficient to successfully migrate applications between PostgreSQL-compatible databases.

distributed Article's
30 articles in total
Favicon
PostgreSQL plan_cache_mode
Favicon
Index Filtering in PostgreSQL and YugabyteDB (Index Scan instead of Index Only Scan)
Favicon
Book Review: Designing Data-Intensive Applications
Favicon
More details in pg_locks for YugabyteDB
Favicon
Large IntentsDB MemTable with Many Small SST Files
Favicon
MapReduce - A Simplified Approach to Big Data Processing
Favicon
Challenges of Asynchronous Messaging in Software Design
Favicon
Aurora DSQL: How it Compares to YugabyteDB
Favicon
Document data modeling to avoid write skew anomalies
Favicon
When to replace IN() with EXISTS() - correlated and uncorrelated subqueries
Favicon
2024.2: Faster with Shared Memory Between PostgreSQL and TServer Layers
Favicon
DynamoDB-style Limits for Predictable SQL Performance?
Favicon
Aurora DSQL: Create a Serverless Cluster and Connect with PostgreSQL Client
Favicon
Amazon Aurora DSQL: Which PostgreSQL Service Should I Use on AWS ?
Favicon
YugabyteDB MVCC and Updates: columns vs. JSON
Favicon
Aurora Limitless - Creation
Favicon
No Gap Ordered Numbering in SQL: A Unique Index to Serialize In Read Committed
Favicon
What's behind the Call Home option?
Favicon
Reverse Proxy and Load Balancing: Do we need both?
Favicon
AWS re:Invent 2024 - Which sessions I'll try to attend.
Favicon
pgSphere and Q3C on Distributed SQL
Favicon
IN() Index Scan in PostgreSQL 17 and YugabyteDB LSM Tree
Favicon
Frequent Re-Connections improved by Connection Manager
Favicon
Maintaining Throughput With Less Physical Connections
Favicon
YugabyteDB Connection Manager: a Database Resident Connection Pool with Shared Processes
Favicon
Parallel JavaScript Machine
Favicon
Asynch replication for Disaster Recovery, Read Replicas, and Change Data Capture
Favicon
RocksDB, Key-Value Storage, and Packed Rows: the backbone of YugabyteDB's distributed tablets flexibility
Favicon
SQL as fast as NoSQL, Bulk Loads, Covering and Partial Indexes
Favicon
Fault Tolerance with Raft and no Single Point of Failure

Featured ones: