Logo

dev-resources.site

for different kinds of informations.

Secrets Behind Deleting or Updating Records in PostgreSQL

Published at
1/4/2025
Categories
codeproject
devops
postgresqldatabasesq
Author
anh_trntun_4732cf3d299
Author
22 person written this
anh_trntun_4732cf3d299
open
Secrets Behind Deleting or Updating Records in PostgreSQL

1. Understanding PostgreSQL's MVCC (Multi-Version Concurrency Control)

Image

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to handle data consistency and isolation. Unlike databases that use locks for concurrency, MVCC allows multiple transactions to occur simultaneously without conflicts. Here's how it works:

1.1 What Happens Internally During an Update?

Image

When you update a record in PostgreSQL, the database doesn't immediately modify the existing data. Instead, it creates a new version of the record with the updated data while keeping the old version intact. This approach ensures that other transactions can still access the previous data version until they are committed or rolled back.

UPDATE employees SET salary = 60000 WHERE id = 5;
Enter fullscreen mode Exit fullscreen mode

This query creates a new version of the employee record with id = 5 having a salary of 60000. The old version remains in the database for any transactions that started before the update.

1.2 Benefits of MVCC in Updates

  • Concurrent Access : Transactions can run concurrently without blocking each other, providing better performance in multi-user environments.
  • Consistency : Ensures data consistency for all ongoing transactions, as they only see the data state when they started.

1.3 Drawbacks of MVCC in Updates

  • Increased Storage : Each update creates a new row version, which can lead to increased storage requirements.
  • Vacuuming Overhead : Old versions of rows remain in the database, requiring periodic cleanup using the VACUUM operation to free up space and prevent table bloat.

2. The Mechanics of Deleting a Record

Deleting a record in PostgreSQL follows a similar approach to updating. When a record is deleted, it isn't immediately removed from the disk. Instead, it is marked as "dead" and becomes invisible to future transactions.

2.1 What Happens Internally During a Delete?

When a DELETE command is executed, PostgreSQL marks the tuple (row) as "dead," and it is left for a future vacuuming process to physically remove it from the disk.

DELETE FROM employees WHERE id = 5;
Enter fullscreen mode Exit fullscreen mode

The row with id = 5 is marked as dead. Until a vacuum is performed, this space is not reclaimed.

2.2 Benefits of This Approach

  • Transaction Safety : The delete operation is transaction-safe, meaning that it will only be visible to transactions that occur after the commit.
  • Quick Deletes : Since no immediate disk operation occurs, deletes are generally faster.

2.3 Drawbacks of This Approach

  • Table Bloat : Like updates, deletes can lead to table bloat, where dead rows accumulate, reducing query performance over time.
  • Maintenance Overhead : Requires regular VACUUM and ANALYZE operations to manage space and keep the database optimized.

3. Performance

3.1 Without Indexes

  • When there are no indexes on the columns involved in DELETE or UPDATE operations, PostgreSQL must perform a sequential scan. This means it scans every row in the table to find the rows to delete or update. As a result, these operations can be very slow, especially for large tables.
  • Moreover, both DELETE and UPDATE generate dead tuples in PostgreSQL. After a DELETE operation, the rows are not physically removed but marked as dead. Similarly, an UPDATE operation is treated as a combination of DELETE and INSERT , which leaves behind dead tuples. Without regular VACUUM processes, these dead tuples accumulate, leading to increased table size and reduced performance.

3.2 With Indexes:

  • When indexes are present on the columns involved in DELETE or UPDATE operations, PostgreSQL uses the index to quickly locate the relevant rows, significantly reducing the time required. However, updating or deleting indexed rows requires additional work to maintain the indexes, which can increase the cost of these operations.
  • For DELETE , every index associated with the table must be updated to remove the index entries corresponding to the deleted rows.
  • For UPDATE , if the indexed columns are updated, the index itself must be modified, which adds overhead. If the updated data does not fit in the same page, PostgreSQL may need to allocate new pages, leading to potential page splits and additional I/O.

3.3 Why Can Indexed Queries Be Slower

Cost of Maintaining Indexes:

  • Every time a row is inserted, deleted, or updated, PostgreSQL must also update the relevant indexes. For DELETE and UPDATE operations, this can mean significant overhead if there are multiple indexes on the table.
  • For large batch operations (bulk DELETE or UPDATE ), the cost of maintaining indexes can outweigh the benefits provided by them, causing a slowdown.

Index Bloat:

  • Over time, as INSERT , UPDATE , and DELETE operations are performed, indexes can become bloated with dead entries. This bloat makes the index less efficient, leading to slower search times and higher maintenance costs.
  • Regular maintenance through VACUUM and REINDEX commands is necessary to prevent index bloat and ensure optimal performance.

Query Planner and Cost Estimation

  • PostgreSQL’s query planner decides whether to use an index or not based on a cost estimation model. In some cases, the planner may choose not to use an index even when one is available, if it estimates that a sequential scan would be more efficient (e.g., when a query is expected to return a large percentage of the rows).
  • If the statistics are outdated or not accurate, the query planner may make suboptimal decisions, resulting in slower queries.

4. Optimizing PostgreSQL Performance When Deleting or Updating

Understanding the implications of deletes and updates can help you optimize your PostgreSQL database for better performance and efficiency.

4.1 Strategies for Handling Frequent Updates

Use VACUUM Regularly: Regularly schedule VACUUM operations to clean up dead rows and prevent table bloat.

VACUUM employees;
Enter fullscreen mode Exit fullscreen mode

Employ VACUUM FULL When Necessary: In cases of severe bloat, use VACUUM FULL to reclaim space, although it locks the table for its duration.

4.2 Techniques for Managing Deletes

Partition Large Tables : Partitioning can help manage large datasets by splitting them into more manageable chunks, reducing the impact of deletes.

Use RETURNING Clause : When deleting records, use the RETURNING clause to get feedback about the rows being deleted, which can be helpful for auditing purposes.

DELETE FROM employees WHERE salary < 20000 RETURNING *;
Enter fullscreen mode Exit fullscreen mode

5. Conclusion

While deleting or updating records in PostgreSQL may seem like simple operations, understanding the underlying processes is crucial for database optimization. The benefits of MVCC, such as concurrent access and consistency, come with trade-offs like storage overhead and maintenance complexity. By employing the right strategies and being aware of PostgreSQL’s mechanics, you can maximize your database’s performance and maintain data integrity.

Feel free to comment below if you have any questions or need further clarification on how these processes work in PostgreSQL.

Read posts more at : Secrets Behind Deleting or Updating Records in PostgreSQL

codeproject Article's
30 articles in total
Favicon
Default Methods in Java
Favicon
Secrets of Java's String Pool
Favicon
Reasons Why Docker Runs Natively on Linux but Needs Virtualization on Windows
Favicon
Secrets of URIs, URLs, and URNs: Understanding Their Differences and Uses
Favicon
When Should You Use Threads in Java?
Favicon
Secrets of Distributed Locks
Favicon
Understanding the High Water Mark
Favicon
Ways to Speed Up Spring Boot Application Startup Time
Favicon
Mastering Java Generics: A Comprehensive Guide with Code Examples
Favicon
Understanding and Managing Tablespace in MySQL
Favicon
Handling the "Forwarded" Header in Spring Boot Applications
Favicon
Secrets Behind Deleting or Updating Records in PostgreSQL
Favicon
Strategies to Optimize PostgreSQL Performance Using VACUUM
Favicon
Techniques for Storing JSON Documents in PostgreSQL
Favicon
Methods for Storing Boolean Values in MySQL
Favicon
Relational Database Partition with Example
Favicon
Techniques for Leveraging ReBAC: A Comprehensive Guide
Favicon
How Does CAPTCHA Work?
Favicon
Secure User Passwords in a Database
Favicon
Tips to Avoid NullPointerException in Java
Favicon
Methods for Efficient Large File Processing in Spring Boot
Favicon
Master Local Variable Type Inference in Java: A Guide to Using var Effectively
Favicon
Understanding JWE: Structure, Operations, Advantages, Disadvantages, and How to Create One
Favicon
Reasons Why Rainbow Table Attacks Are Dangerous and How Salting Passwords Protects Against Them
Favicon
OSI Model Overview
Favicon
Understanding Runnable and Callable in Java: Examples and Code Demos
Favicon
Methods to Optimize Database Queries Using Query Hints in Spring Data JPA
Favicon
Techniques for Managing Session, Cookie, JWT, and SSO
Favicon
Using CountDownLatch in Java: A Deep Dive with Code Examples and Demos
Favicon
Tips for Saving Redis Memory: Essential Techniques for Efficient Resource Management

Featured ones: