dev-resources.site

for different kinds of informations.

Referential integrity In The Absence Of Foreign Key

Published at

12/19/2024

Categories

4 categories in total

Author

12 person written this

franckpachot

open

Referential integrity In The Absence Of Foreign Key

Like some distributed databases (*), Aurora DSQL does not support foreign keys. How can you maintain the referential integrity of your data?

It's often stated that the application can handle this rather than the database, but the application lacks one critical capability: the ability to see what ongoing concurrent transactions are doing.
To prevent database corruption during a race condition, verifying that the parent key exists and has not been concurrently deleted is essential. Similarly, when you remove a key and confirm there are no child rows, it’s vital to ensure that no one is currently adding one.

While the database can observe the activities of other transactions, the application cannot due to transaction isolation. How can you address this? There are four solutions that I described in Declarative vs. Application side Foreign Key referential integrity:

Declare the Foreign Key in the database. ⛔ Aurora DSQL doesn't support Foreign Keys.
Use a serializable isolation level to get an error when conflicting concurrent operations occur. ⛔ Aurora DSQL doesn't support Serializable.
Lock the parent or child table when inserting or deleting from the other. ⛔ Aurora DSQL doesn't support LOCK TABLE.
Lock the parent key when inserting a row in the child table ✅.

Another solution employs advisory locks instead of rows or tables to lock a value. In this approach, inserts into the child table require a shared lock on the foreign key value, while deletes on the parent key necessitate an exclusive lock. ⛔ Aurora DSQL doesn't support advisory locks.

Let's explore the possibility of parent row locking.

SELECT FOR SHARE is sufficient to prevent a concurrent deletion. ⛔ Aurora DSQL doesn't support LOCK FOR SHARE.
One solution remains: SELECT FOR UPDATE on the parent row when inserting into the child table ✅.

So, one possibility remains. I'll use the "orders"/"orderlines" schema that I created in a previous post:

dsql=> \d orders
                              Table "public.orders"
   Column   |           Type           | Collation | Nullable |      Default
------------+--------------------------+-----------+----------+-------------------
 order_id   | uuid                     |           | not null | gen_random_uuid()
 country    | text                     |           |          |
 created_at | timestamp with time zone |           |          | now()
Indexes:
    "orders_pkey" PRIMARY KEY, btree_index (order_id) INCLUDE (country, created_at)
    "orders_country_created_at_idx" btree_index (country, created_at)

dsql=> \d orderlines
                              Table "public.orderlines"
         Column          |           Type           | Collation | Nullable | Default
-------------------------+--------------------------+-----------+----------+---------
 order_id                | uuid                     |           | not null |
 line_id                 | integer                  |           | not null |
 product                 | text                     |           |          |
 denormalized_country    | text                     |           |          |
 denormalized_created_at | timestamp with time zone |           |          |
Indexes:
    "orderlines_pkey" PRIMARY KEY, btree_index (order_id, line_id) INCLUDE (product, denormalized_country, denormalized_created_at)
    "orderlines_denormalized_country_product_denormalized_create_idx" btree_index (denormalized_country, product, denormalized_created_at)

dsql=>

I wasn't able to declare the foreign key, but "order_id" in "orderlines" should reference an "order_id" in "orders".

I insert an order that I'll use in my test and keep its value in the :order_id variable:

dsql=> insert into orders (order_id)
           values ('ffffffff-ffff-ffff-ffff-ffffffffffff')
           returning order_id
dsql-> \gset
INSERT 0 1

Here is a transaction that deletes a row from the "orders" table along with its child rows in the "orderlines" table:


delete from orderlines where order_id=:'deleted_order_id';
delete from orders     where order_id=:'deleted_order_id';

If a concurrent transaction inserts a row in "orderlines", the delete will not see it, but it may appear after the transaction is committed. The application must prevent this situation to avoid creating an orphean row.

Here is a safe transaction that inserts a row in "orderlines" only if the parent exists in "orders" (I'm using psql variables and conditional blocks to make it easy to reproduce):

begin;
\set exists false
select true as exists 
 from orders 
 where order_id =  :'new_order' 
 for update
\gset
\if :exists
insert into orderlines (order_id, line_id) 
 values ( :'new_order' , 1 );
\endif
commit;

The SELECT FOR UPDATE declares an exclusive lock intent that conflicts with a concurrent transaction that would delete it. If the application strictly uses these procedures to manipulate orders and order lines, it guarantees referential integrity without a foreign key.

Let's test it. In the first session, I check that the parent exists with a FOR UPDATE and insert the child row:

dsql=> begin;
BEGIN
dsql=*> \set exists false
dsql=*> select true as exists
             from orders
             where order_id =  'ffffffff-ffff-ffff-ffff-ffffffffffff'
             for update
dsql-*> \gset
dsql=*> \if :exists
dsql=*> insert into orderlines (order_id, line_id)
             values ( 'ffffffff-ffff-ffff-ffff-ffffffffffff' , 1 )
;
INSERT 0 1
dsql=*> \endif
dsql=*>

While the transaction is ongoing, as I haven't committed yet, another transaction deletes the parent row:

dsql=*> \! psql -c "delete from orders where order_id = 'ffffffff-ffff-ffff-ffff-ffffffffffff'" -c "delete from orderlines where order_id = 'ffffffff-ffff-ffff-ffff-ffffffffffff'" & sleep 1
DELETE 1
DELETE 0

This is successful because Aurora DSQL uses optimistic concurrency control, and the SELECT FOR UPDATE intent was not synchronized to be visible to the other transactions.

When I try to commit my inserting transaction, the conflict is detected, and it fails:

dsql=*> commit;
ERROR:  change conflicts with another transaction, please retry: (OC000)
dsql=>

By adding additional data logic to our business logic, we can enforce referential integrity and avoid race conditions that corrupt our database. But how does it scale?

In PostgreSQL-compatible databases, we may prefer using SELECT FOR SHARE rather than SELECT FOR UPDATE because multiple inserts for the same parent should be able to be processed concurrently. Let's say the order is a big one and shared by multiple users who add their order lines.

I insert the parent again:

dsql=> insert into orders (order_id)
           values ('ffffffff-ffff-ffff-ffff-ffffffffffff')
           returning order_id
dsql-> \gset
INSERT 0 1

A first session inserts a line 1:

dsql=> begin;
BEGIN
dsql=*> select true as exists
             from orders
             where order_id =  'ffffffff-ffff-ffff-ffff-ffffffffffff'
             for update
;
 exists
--------
 t
(1 row)

dsql=*> insert into orderlines (order_id, line_id)
             values ( 'ffffffff-ffff-ffff-ffff-ffffffffffff' , 1 )
;
INSERT 0 1

It hasn't been committed yet, while another session is inserting line 2:

dsql=> begin;
BEGIN

dsql=*> select true as exists
             from orders
             where order_id =  'ffffffff-ffff-ffff-ffff-ffffffffffff'
             for update
;
 exists
--------
 t
(1 row)

dsql=*> insert into orderlines (order_id, line_id)
             values ( 'ffffffff-ffff-ffff-ffff-ffffffffffff' , 2 )
;
INSERT 0 1

dsql=*> commit;
COMMIT

It was committed, but when the first session was committed, it fails:

postgres=*> commit;
ERROR:  change conflicts with another transaction, please retry: (OC000)

The application must attempt a retry without any business justification, as inserting two rows with the same parent should be permitted. Consider using the same method to reference the product catalog: only one product can be ordered simultaneously in the system.

Ironically, Optimistic Concurrency Control, designed to enhance scalability, restricts throughput. SELECT FOR SHARE could solve this, allowing concurrent transactions. However, at least in the preview version, it is unavailable in Aurora DSQL.

Write skew is usually prevented by using a serializable isolation level. For databases that do not support it, it is possible to modify the data model and use SELECT FOR UPDATE to achieve the same result, transforming the write skew problem into a repeatable read one. Here is another example:

The Doctor's On-Call Shift example and a Normalized Relational Schema to Avoid Write Skew

Franck Pachot for YugabyteDB ・ Oct 18 '24

#yugabytedb #postgres #oracle #database

Note

(*) Many distributed databases lack support for foreign keys due to implementation challenges. This situation arises with database sharding, as not all SQL features function with cross-shard transactions. MySQL-compatible databases have often neglected referential integrity, similar to how MySQL has traditionally ignored foreign key declarations. Distributed SQL databases offering PostgreSQL-compatible concurrency control, such as YugabyteDB, support foreign keys.

aurora Article's

30 articles in total

How to generate a Aurora Postgresql cluster with all auto explain enabled