dev-resources.site
for different kinds of informations.
Amazon Aurora DSQL: Which PostgreSQL Service Should I Use on AWS ?
The Amazon Aurora innovations announced at re:Invent 2024 may leave you feeling overwhelmed when choosing the right database. If you're looking for a PostgreSQL-compatible option, you're making a sound choice since PostgreSQL is one of the most popular databases. Building your application with PostgreSQL provides access to a large pool of developers. It allows you to choose among on-premises, self-managed, or cloud-managed services from various vendors.
You might be wondering if there's a one-size-fits-all solution for your application or if you need to explore the various options available. These include RDS PostgreSQL, RDS Aurora (PostgreSQL-compatible), Aurora Serverless v2, Aurora Limitless, and the new Aurora DSQL. Additionally, you can run PostgreSQL as a self-managed option on EC2 or opt for a PostgreSQL fork available in the marketplace, such as YugabyteDB.
The capabilities of these services vary, and each aspect plays a crucial role:
- The degree of PostgreSQL compatibility determines whether the system suits your existing applications or is only appropriate for new ones. It also impacts how easily your application can run on other platforms if you switch providers.
- The scope of scalability will determine how well the system can evolve to support multi-AZ or multi-region configurations without sacrificing features, consistency, and performance.
- The class of resilience will influence your application's uptime throughout the infrastructure's lifecycle: failures, maintenance operations, or upgrades.
- The level of serverless deployment will impact the overall cost based on your expected workloads and desired performance predictability.
PostgreSQL on EC2
If you want full compatibility with all features, you must self-manage the open-source PostgreSQL from the community repository. This allows you to do almost anything, as you have superuser access. You can also compile it with different options and install various extensions. However, it is not designed for cloud environments. High availability relies on disaster recovery, and failover typically results in complete downtime, often lasting several minutes. No incremental backups are available, and point-in-time recovery is not straightforward. Additionally, several operations, such as database upgrades, can lead to downtime.
For a managed service, the cloud vendor must restrict certain features to ensure clear segregation of responsibilities, often involving a customized version of PostgreSQL.
RDS PostgreSQL
The managed service that closely resembles the community version of PostgreSQL is RDS PostgreSQL. It is a fork with minimal changes, allowing you to replicate the same behavior using the same version of community PostgreSQL. However, this means there are no additional features within the database engine itself, only supplementary features around it. For instance, RDS PostgreSQL supports encryption and multi-AZ replication at the storage level.
RDS PostgreSQL competes with various managed PostgreSQL offerings from providers like Aiven, Google Cloud SQL, Heroku, DigitalOcean, EDB, Crunchy, and StackGres.
RDS Aurora PostgreSQL-Compatible
This service uses a fork of PostgreSQL running on a single-writer instance with disaggregated storage. The shared buffers in the compute instance are ephemeral. The WAL is sent to storage servers distributed in the region's availability zones, and those servers can serve the pages that are not in the compute instance shared buffers. This storage is resilient to failures and provides additional features like snapshots and point-in-time recovery. Additional compute instances are available for failover, reducing downtime during planned maintenance or failures. These instances can also serve stale reads, receiving the WAL from the primary to invalidate their cache and reading pages for the distributed storage.
RDS Aurora with PostgreSQL compatibility is used for applications that need a high PostgreSQL compatibility but with more operational efficiency than the community PostgreSQL.
The competition for RDS Aurora is the similar architecture, Google AlloyDB or Neon.
RDS Aurora PostgreSQL-Compatible Serverless
Serverless v2 is similar to RDS Aurora but runs the compute instances with dynamic resource allocation. You define a minimum and maximum ACU (Aurora Capacity Unit). The hypervisor dynamically scales CPU and RAM based on demand. This is cost-efficient for databases idle for a long time or with short peaks of high activity, and you favor cost reduction over performance predictability.
The competition for Aurora Serverless is mostly Neon.
The services mentioned above use the same monolithic architecture as PostgreSQL for the compute instance, which provides full PostgreSQL compatibility from the application's point of view but is limited to a single instance. All consistent reads and writes must go through a single instance, with one shared buffer memory segment protected by a single stream of Write-Ahead Logging (WAL). The consistency of those services, with all ACID properties, is guaranteed by running on a single server so that all sessions share the same memory structures and system clock.
Data must be distributed to multiple nodes to scale horizontally. Two techniques are used: database sharding and distributed SQL. A bounded clock from Amazon Time Sync ensures the reliability of these services to get consistent reads and writes with all ACID properties. This clock provides precise time with minimal and guaranteed clock skew. What differentiates the following services is their PostgreSQL compatibility for global transactions (transactions reading or writing to more than one shard)
RDS Aurora PostgreSQL-Compatible Limitless
When an application is designed to operate on multiple databases using a sharding key, Aurora Limitless simplifies the management of sharding to facilitate operations. It is deployed across multiple compute instances that utilize Aurora Serverless hypervisor. You connect to routers that offer a global view of the database, and queries are distributed to the relevant shards containing data based on your defined sharding key. For single-shard queries, PostgreSQL compatibility is equivalent to that of Aurora. This service is ideal for applications that are already designed for sharding.
The competition for RDS Aurora Limitless is Citus managed services, or Nile.
RDS Aurora DSQL
AWS has announced a new Aurora service to offer SQL developers a truly serverless experience. This service aims to have all the advantages of DynamoDB (no downtime, no upgrades, on-demand pricing) but with SQL features. It uses the PostgreSQL dialect, which closely adheres to the SQL standard. However, runtime behavior differs, relying on optimistic concurrency control and IAM authentication instead of traditional user-password mechanisms.
DSQL is the recommended service for new applications designed for serverless and running on AWS only: no connection pool, optimistic concurrency control with retry logic.
The competition for DSQL is distributed SQL with a PostgreSQL dialect, like Google Spanner, CockroachDB, or the NoSQL services Microsoft Cosmos and Amazon DynamoDB. It is a trade-off between both approaches.
I do not consider YugabyteDB a competitor because, while it is also a Distributed SQL-managed service, YugabyteDB is a PostgreSQL fork with PostgreSQL compatibility, including concurrency control behavior, is open source and multi-cloud, so those are often different use cases.
YugabyteDB in the marketplace
YugabyteDB is a fork of PostgreSQL. It is runtime-compatible with PostgreSQL (same behavior) and license-compatible as an open-source database. It is available as a managed service in the AWS marketplace. Compared with the other services, it provides higher PostgreSQL compatibility than the horizontally scalable services (Limitless and DSQL) while distributed to availability zones or regions like Aurora DSQL. It is a good choice for a drop-in replacement of PostgreSQL without changing the application. YugabyteDB is optimized for OLTP systems requiring predictable performance, which requires keeping servers up with a warm cache, so it doesn't compete with Aurora Serverless.
Three distributed databases use the AWS precision time clock bound to provide high performance with strong consistency: Aurora Limitless, Aurora DSQL, and YugabyteDB, and Amazon exposes this to provide the most efficient distributed databases, managed my Amazon or third-parties:
Here is another great comparison by Rehan Van Der Merwe
Featured ones: