Logo

dev-resources.site

for different kinds of informations.

Referential integrity In The Absence Of Foreign Key

Published at
12/19/2024
Categories
aws
aurora
dsql
database
Author
franckpachot
Categories
4 categories in total
aws
open
aurora
open
dsql
open
database
open
Author
12 person written this
franckpachot
open
Referential integrity In The Absence Of Foreign Key

Like some distributed databases (*), Aurora DSQL does not support foreign keys. How can you maintain the referential integrity of your data?

It's often stated that the application can handle this rather than the database, but the application lacks one critical capability: the ability to see what ongoing concurrent transactions are doing.
To prevent database corruption during a race condition, verifying that the parent key exists and has not been concurrently deleted is essential. Similarly, when you remove a key and confirm there are no child rows, it’s vital to ensure that no one is currently adding one.

While the database can observe the activities of other transactions, the application cannot due to transaction isolation. How can you address this? There are four solutions that I described in Declarative vs. Application side Foreign Key referential integrity:

  • Declare the Foreign Key in the database. â›” Aurora DSQL doesn't support Foreign Keys.
  • Use a serializable isolation level to get an error when conflicting concurrent operations occur. â›” Aurora DSQL doesn't support Serializable.
  • Lock the parent or child table when inserting or deleting from the other. â›” Aurora DSQL doesn't support LOCK TABLE.
  • Lock the parent key when inserting a row in the child table ✅.

Another solution employs advisory locks instead of rows or tables to lock a value. In this approach, inserts into the child table require a shared lock on the foreign key value, while deletes on the parent key necessitate an exclusive lock. â›” Aurora DSQL doesn't support advisory locks.

Let's explore the possibility of parent row locking.

  • SELECT FOR SHARE is sufficient to prevent a concurrent deletion. â›” Aurora DSQL doesn't support LOCK FOR SHARE.
  • One solution remains: SELECT FOR UPDATE on the parent row when inserting into the child table ✅.

So, one possibility remains. I'll use the "orders"/"orderlines" schema that I created in a previous post:

dsql=> \d orders
                              Table "public.orders"
   Column   |           Type           | Collation | Nullable |      Default
------------+--------------------------+-----------+----------+-------------------
 order_id   | uuid                     |           | not null | gen_random_uuid()
 country    | text                     |           |          |
 created_at | timestamp with time zone |           |          | now()
Indexes:
    "orders_pkey" PRIMARY KEY, btree_index (order_id) INCLUDE (country, created_at)
    "orders_country_created_at_idx" btree_index (country, created_at)

dsql=> \d orderlines
                              Table "public.orderlines"
         Column          |           Type           | Collation | Nullable | Default
-------------------------+--------------------------+-----------+----------+---------
 order_id                | uuid                     |           | not null |
 line_id                 | integer                  |           | not null |
 product                 | text                     |           |          |
 denormalized_country    | text                     |           |          |
 denormalized_created_at | timestamp with time zone |           |          |
Indexes:
    "orderlines_pkey" PRIMARY KEY, btree_index (order_id, line_id) INCLUDE (product, denormalized_country, denormalized_created_at)
    "orderlines_denormalized_country_product_denormalized_create_idx" btree_index (denormalized_country, product, denormalized_created_at)

dsql=>

Enter fullscreen mode Exit fullscreen mode

I wasn't able to declare the foreign key, but "order_id" in "orderlines" should reference an "order_id" in "orders".

I insert an order that I'll use in my test and keep its value in the :order_id variable:

dsql=> insert into orders (order_id)
           values ('ffffffff-ffff-ffff-ffff-ffffffffffff')
           returning order_id
dsql-> \gset
INSERT 0 1
Enter fullscreen mode Exit fullscreen mode

Here is a transaction that deletes a row from the "orders" table along with its child rows in the "orderlines" table:


delete from orderlines where order_id=:'deleted_order_id';
delete from orders     where order_id=:'deleted_order_id';
Enter fullscreen mode Exit fullscreen mode

If a concurrent transaction inserts a row in "orderlines", the delete will not see it, but it may appear after the transaction is committed. The application must prevent this situation to avoid creating an orphean row.

Here is a safe transaction that inserts a row in "orderlines" only if the parent exists in "orders" (I'm using psql variables and conditional blocks to make it easy to reproduce):

begin;
\set exists false
select true as exists 
 from orders 
 where order_id =  :'new_order' 
 for update
\gset
\if :exists
insert into orderlines (order_id, line_id) 
 values ( :'new_order' , 1 );
\endif
commit;
Enter fullscreen mode Exit fullscreen mode

The SELECT FOR UPDATE declares an exclusive lock intent that conflicts with a concurrent transaction that would delete it. If the application strictly uses these procedures to manipulate orders and order lines, it guarantees referential integrity without a foreign key.

Let's test it. In the first session, I check that the parent exists with a FOR UPDATE and insert the child row:

dsql=> begin;
BEGIN
dsql=*> \set exists false
dsql=*> select true as exists
             from orders
             where order_id =  'ffffffff-ffff-ffff-ffff-ffffffffffff'
             for update
dsql-*> \gset
dsql=*> \if :exists
dsql=*> insert into orderlines (order_id, line_id)
             values ( 'ffffffff-ffff-ffff-ffff-ffffffffffff' , 1 )
;
INSERT 0 1
dsql=*> \endif
dsql=*>
Enter fullscreen mode Exit fullscreen mode

While the transaction is ongoing, as I haven't committed yet, another transaction deletes the parent row:

dsql=*> \! psql -c "delete from orders where order_id = 'ffffffff-ffff-ffff-ffff-ffffffffffff'" -c "delete from orderlines where order_id = 'ffffffff-ffff-ffff-ffff-ffffffffffff'" & sleep 1
DELETE 1
DELETE 0
Enter fullscreen mode Exit fullscreen mode

This is successful because Aurora DSQL uses optimistic concurrency control, and the SELECT FOR UPDATE intent was not synchronized to be visible to the other transactions.

When I try to commit my inserting transaction, the conflict is detected, and it fails:

dsql=*> commit;
ERROR:  change conflicts with another transaction, please retry: (OC000)
dsql=>
Enter fullscreen mode Exit fullscreen mode

By adding additional data logic to our business logic, we can enforce referential integrity and avoid race conditions that corrupt our database. But how does it scale?

In PostgreSQL-compatible databases, we may prefer using SELECT FOR SHARE rather than SELECT FOR UPDATE because multiple inserts for the same parent should be able to be processed concurrently. Let's say the order is a big one and shared by multiple users who add their order lines.

I insert the parent again:

dsql=> insert into orders (order_id)
           values ('ffffffff-ffff-ffff-ffff-ffffffffffff')
           returning order_id
dsql-> \gset
INSERT 0 1
Enter fullscreen mode Exit fullscreen mode

A first session inserts a line 1:

dsql=> begin;
BEGIN
dsql=*> select true as exists
             from orders
             where order_id =  'ffffffff-ffff-ffff-ffff-ffffffffffff'
             for update
;
 exists
--------
 t
(1 row)

dsql=*> insert into orderlines (order_id, line_id)
             values ( 'ffffffff-ffff-ffff-ffff-ffffffffffff' , 1 )
;
INSERT 0 1

Enter fullscreen mode Exit fullscreen mode

It hasn't been committed yet, while another session is inserting line 2:

dsql=> begin;
BEGIN

dsql=*> select true as exists
             from orders
             where order_id =  'ffffffff-ffff-ffff-ffff-ffffffffffff'
             for update
;
 exists
--------
 t
(1 row)

dsql=*> insert into orderlines (order_id, line_id)
             values ( 'ffffffff-ffff-ffff-ffff-ffffffffffff' , 2 )
;
INSERT 0 1

dsql=*> commit;
COMMIT
Enter fullscreen mode Exit fullscreen mode

It was committed, but when the first session was committed, it fails:

postgres=*> commit;
ERROR:  change conflicts with another transaction, please retry: (OC000)
Enter fullscreen mode Exit fullscreen mode

The application must attempt a retry without any business justification, as inserting two rows with the same parent should be permitted. Consider using the same method to reference the product catalog: only one product can be ordered simultaneously in the system.

Ironically, Optimistic Concurrency Control, designed to enhance scalability, restricts throughput. SELECT FOR SHARE could solve this, allowing concurrent transactions. However, at least in the preview version, it is unavailable in Aurora DSQL.

Write skew is usually prevented by using a serializable isolation level. For databases that do not support it, it is possible to modify the data model and use SELECT FOR UPDATE to achieve the same result, transforming the write skew problem into a repeatable read one. Here is another example:


Note

(*) Many distributed databases lack support for foreign keys due to implementation challenges. This situation arises with database sharding, as not all SQL features function with cross-shard transactions. MySQL-compatible databases have often neglected referential integrity, similar to how MySQL has traditionally ignored foreign key declarations. Distributed SQL databases offering PostgreSQL-compatible concurrency control, such as YugabyteDB, support foreign keys.

aurora Article's
30 articles in total
Favicon
How to generate a Aurora Postgresql cluster with all auto explain enabled
Favicon
Amazon Aurora DSQL: The New Era of Distributed SQL
Favicon
Real-Time Data Integration Techniques Using PostgreSQL Foreign Data Wrapper (FDW)
Favicon
Comparing AWS RDS and Amazon Aurora: Which Managed Database Service is Right for You?
Favicon
Referential integrity In The Absence Of Foreign Key
Favicon
Multi-Region Distributed SQL Transaction Latency
Favicon
Scaling to Zero with Amazon Aurora Serverless v2
Favicon
Joins, Scale, and Denormalization
Favicon
Aurora Serverless v2 scales to zero.. but how fast?
Favicon
Aurora DSQL - Simple Inserts Workload from an AWS CloudShell
Favicon
Data inconsistency in AWS Amazon Aurora Postgres solved with Local Write Forwarding?
Favicon
Unlocking Aurora DSQL with AWS Lambda: A Seamless Solution for Serverless, Scalable, and Event-Driven Architectures
Favicon
Aurora Limitless - Creation
Favicon
Aurora Limitless - Connection
Favicon
Cách thay đổi bảng MySQL lớn bằng Percona Toolkit trên EC2 và AWS Aurora !!
Favicon
Migrating from SQLServer to Aurora PostgreSQL
Favicon
Migrating from AWS RDS to Aurora
Favicon
Best Places to See the Northern Lights: A Guide to Unforgettable Auroras
Favicon
Relational Databases on AWS: Comparing RDS and Aurora
Favicon
Automated Database Horizontal Scaling with Amazon Aurora Limitless Database
Favicon
Test your mySQL / Aurora database with Lambda
Favicon
Level Up Your Data Management Game: Unleash Performance in Aurora Limitless
Favicon
Level Up Your Data Management Game: Unleash Performance in Aurora Limitless
Favicon
Aurora vs. RDS: How to Choose the Right AWS Database for 2024
Favicon
Embracing the Future of Database Management: A Deep Dive into Amazon Aurora Limitless Database
Favicon
Version up test against Aurora MySQL 2 EOL with SQL test tool
Favicon
Amazon Aurora PostgreSQL shared buffers and cache
Favicon
Amazon Aurora
Favicon
Databases in AWS: RDS
Favicon
Spring Boot with AWS Aurora read replica

Featured ones: