dev-resources.site
for different kinds of informations.
Partitions in Azure Cosmos DB: A Common Discussion with Customers
Partitions in Azure Cosmos DB are a topic I discuss frequently with my customers—at least twice a month! It's a fundamental concept that often raises important questions, such as:
- What are physical partitions, logical partitions, and documents?
- When is a physical partition created?
- How can we determine how many physical partitions are in use?
In this post, I’ll address these questions specifically for the Cosmos DB SQL API, helping you better understand and manage partitions in your applications.
What are physical partitions, logical partitions, and documents?
Physical Partitions
Physical partitions are the underlying storage units that enable Azure Cosmos DB to scale horizontally by adding more storage and throughput capacity. These partitions are fully managed by Azure Cosmos DB, so you don’t have to worry about their internal implementation or maintenance.
Logical Partitions
A logical partition is a grouping of documents that share the same partition key. Logical Partitions are mapped to physical partitions and play a key role in distributing data evenly and improving query performance.
Documents
Documents are the individual data units stored in Cosmos DB, represented in JSON format. They contain both your application data and associated metadata.
Maximum Size Limits (as of September 2022)
The following are the maximum size limits in Cosmos DB (subject to change in the future):
Physical Partition: Maximum size of 50 GB.
Logical Partition: Maximum size of 20 GB.
Document: Maximum size of 2 MB.
When is a Physical Partition Created?
A new physical partition is created when one of the following thresholds is reached:
Size: When the total storage in a physical partition exceeds 50 GB.
Throughput (RUs): When the provisioned throughput for a single partition exceeds 10,000 request units (RUs).
These theoretical maximums are not isolated, here are other factors that need to be considered.
What is the number of RUs defined in our Container or Database? (For this explanation, I will focus exclusively on containers, not databases).
That information can be checked into the container as shown below
Then in the Scale section we can see the number of RUs assigned to this Container
Based on the configuration presented in the image, we can see that the Maximum RUs is 40,000, it is higher than the 10,000 (Theoretical Maximum)
40,000 RUs > 10,000 RUs - What Does It Mean?
If your container is provisioned with 40,000 RUs, and each physical partition supports a maximum of 10,000 RUs, Cosmos DB will distribute the provisioned throughput across multiple physical partitions.
How Does This Work in Practice?
For example, with 40,000 RUs, Cosmos DB will automatically create 4 physical partitions, each capable of supporting up to 10,000 RUs. This partitioning ensures that the system can handle the throughput demand efficiently.
In this case the distribution may be something like this:
Each partition size is less than 50 GB (maximum theorical size) but there are 4 partitions due to the RUs consumption.
In scenarios involving heavy read operations, such as running queries frequently or at high volume, Cosmos DB will create additional physical partitions to distribute the workload. This partitioning is based on the provisioned RUs defined in your container.
What Happens in a Heavy Writing Scenario?
In heavy writing scenarios, two critical factors must be considered:
Provisioned RUs: The throughput allocated to the container.
Partition Size: The amount of data stored, as physical partitions have a maximum size of 50 GB.
Example: Writing Large Data Volumes
Let’s consider a scenario where your container holds 360 GB of data:
Partition Requirement: To support this volume, you would need:
Number of Physical Partitions = Total Data Size / Physical Partition Max Size
Substituting the values:
Number of Physical Partitions = 360 GB / 50 GB = 7.2
Since partitions must be whole numbers, Cosmos DB will round up to 8 physical partitions.
Provisioned RU Distribution: If you provision 40,000 RUs, these RUs will be distributed equally across the 8 physical partitions:
RUs per Physical Partition = Total RUs / Number of Partitions
Substituting the values:
RUs per Physical Partition = 40,000 RUs / 8 = 5,000 RUs per Partition
Implications for Heavy Writing Workloads
Even Data Distribution: Data written to the container will be distributed across the 8 physical partitions, based on the partition key. Uneven distribution can lead to “hot” partitions, where a few partitions receive a disproportionate number of writes, causing performance bottlenecks.
Partition Key Importance: Choosing an appropriate partition key is essential in heavy write scenarios. A well-designed key ensures that data and write operations are evenly distributed across all partitions.
Scaling: If either the partition size exceeds 50 GB or the throughput for a single partition exceeds 10,000 RUs, additional physical partitions will be created dynamically.
In the above example, the provisioned 40,000 RUs are evenly distributed across the 8 physical partitions, resulting in 5,000 RUs per partition. This ensures that each partition has sufficient throughput to handle the workload.
Recent Features Enhancing Partition Management
Azure Cosmos DB has introduced features to further optimize partitioning:
Burst Capacity: This feature allows containers and databases to handle unexpected traffic spikes by utilizing unused throughput, enhancing performance during peak times.
Materialized Views: Materialized views enable precomputed views of your data, improving query performance by reducing the need for on-the-fly computations.
Best Practices for Partition Management
Monitor Partition Sizes: Regularly check the size of your logical partitions to ensure they remain within the 20 GB limit. Azure Monitor can be configured to alert you when a partition approaches this threshold.
Data Modeling Strategies: Design your data model to align with your partitioning strategy. For example, in a multi-tenant application, using tenant IDs as partition keys can help segregate and manage data efficiently.
Avoid Frequent Updates to Partition Key Values: Since partition keys are immutable, design your data model to minimize the need for updates to these values. If a change is necessary, you'll need to create a new item with the desired partition key and delete the old item.
How Do We Know How Many Physical Partitions Are Being Created?
This is one of the most common questions I receive, and it's essential to understand how to monitor your Cosmos DB partitioning.
Azure Cosmos DB provides built-in dashboards to help you visualize this information. You can find it under:
Monitoring → Insights
From there, you can view details about the number of physical partitions created, along with other metrics like throughput usage, partition key distribution, and RU consumption. These insights allow you to track partitioning behavior and make informed decisions about scaling and optimization.
Under throughput option in (image below)
There is a chart called Normalized RU Consumption, there you can see the number of partitions. In this case 2 partitions
Conclusion
Effective partitioning in Azure Cosmos DB is essential for building scalable and high-performance applications. By selecting appropriate partition keys, leveraging new features like burst capacity and materialized views, and adhering to best practices, you can optimize your database's performance and scalability.
What's Next?
In the next post of this series, I will dive deeper into Logical Partitions and explore how specific scenarios can impact the behavior and creation of physical partitions. Stay tuned for actionable insights and best practices!
References
Azure Cosmos DB Partitioning Overview
Comprehensive guide on partitioning in Azure Cosmos DB, including logical and physical partitions.Azure Cosmos DB SQL API Documentation
Official documentation for working with the SQL API in Cosmos DB.Best Practices for Partitioning in Azure Cosmos DB
Tips and strategies for selecting partition keys and managing partitions effectively.Azure Cosmos DB Insights - Monitoring and Metrics
Explanation of how to use the Monitoring → Insights dashboard to track partition behavior and other metrics.Azure Cosmos DB Request Units (RUs)
Detailed information on RUs, their usage, and how they impact partitioning and performance.Azure Cosmos DB FAQ - Scaling and Partitioning
Frequently asked questions related to scaling, partitions, and throughput in Cosmos DB.
Featured ones: