Logo

dev-resources.site

for different kinds of informations.

Partitions in Azure Cosmos DB: A Common Discussion with Customers

Published at
12/20/2024
Categories
azure
database
nosql
architecture
Author
martin_pi
Categories
4 categories in total
azure
open
database
open
nosql
open
architecture
open
Author
9 person written this
martin_pi
open
Partitions in Azure Cosmos DB: A Common Discussion with Customers

Partitions in Azure Cosmos DB are a topic I discuss frequently with my customers—at least twice a month! It's a fundamental concept that often raises important questions, such as:

  • What are physical partitions, logical partitions, and documents?
  • When is a physical partition created?
  • How can we determine how many physical partitions are in use?

In this post, I’ll address these questions specifically for the Cosmos DB SQL API, helping you better understand and manage partitions in your applications.

What are physical partitions, logical partitions, and documents?

Image description

Physical Partitions

Physical partitions are the underlying storage units that enable Azure Cosmos DB to scale horizontally by adding more storage and throughput capacity. These partitions are fully managed by Azure Cosmos DB, so you don’t have to worry about their internal implementation or maintenance.

Logical Partitions

A logical partition is a grouping of documents that share the same partition key. Logical Partitions are mapped to physical partitions and play a key role in distributing data evenly and improving query performance.

Documents

Documents are the individual data units stored in Cosmos DB, represented in JSON format. They contain both your application data and associated metadata.

Maximum Size Limits (as of September 2022)

The following are the maximum size limits in Cosmos DB (subject to change in the future):

Physical Partition: Maximum size of 50 GB.
Logical Partition: Maximum size of 20 GB.
Document: Maximum size of 2 MB.

When is a Physical Partition Created?

A new physical partition is created when one of the following thresholds is reached:

Size: When the total storage in a physical partition exceeds 50 GB.
Throughput (RUs): When the provisioned throughput for a single partition exceeds 10,000 request units (RUs).

These theoretical maximums are not isolated, here are other factors that need to be considered.

What is the number of RUs defined in our Container or Database? (For this explanation, I will focus exclusively on containers, not databases).

That information can be checked into the container as shown below
Image description

Then in the Scale section we can see the number of RUs assigned to this Container

Image description

Based on the configuration presented in the image, we can see that the Maximum RUs is 40,000, it is higher than the 10,000 (Theoretical Maximum)

40,000 RUs > 10,000 RUs - What Does It Mean?

If your container is provisioned with 40,000 RUs, and each physical partition supports a maximum of 10,000 RUs, Cosmos DB will distribute the provisioned throughput across multiple physical partitions.

How Does This Work in Practice?

For example, with 40,000 RUs, Cosmos DB will automatically create 4 physical partitions, each capable of supporting up to 10,000 RUs. This partitioning ensures that the system can handle the throughput demand efficiently.

In this case the distribution may be something like this:

Image description
Each partition size is less than 50 GB (maximum theorical size) but there are 4 partitions due to the RUs consumption.

In scenarios involving heavy read operations, such as running queries frequently or at high volume, Cosmos DB will create additional physical partitions to distribute the workload. This partitioning is based on the provisioned RUs defined in your container.

What Happens in a Heavy Writing Scenario?

In heavy writing scenarios, two critical factors must be considered:

Provisioned RUs: The throughput allocated to the container.
Partition Size: The amount of data stored, as physical partitions have a maximum size of 50 GB.

Example: Writing Large Data Volumes

Let’s consider a scenario where your container holds 360 GB of data:

Partition Requirement: To support this volume, you would need:

Number of Physical Partitions = Total Data Size / Physical Partition Max Size
Enter fullscreen mode Exit fullscreen mode

Substituting the values:

Number of Physical Partitions = 360 GB / 50 GB = 7.2
Enter fullscreen mode Exit fullscreen mode

Since partitions must be whole numbers, Cosmos DB will round up to 8 physical partitions.

Provisioned RU Distribution: If you provision 40,000 RUs, these RUs will be distributed equally across the 8 physical partitions:

RUs per Physical Partition = Total RUs / Number of Partitions
Enter fullscreen mode Exit fullscreen mode

Substituting the values:

RUs per Physical Partition = 40,000 RUs / 8 = 5,000 RUs per Partition
Enter fullscreen mode Exit fullscreen mode

Implications for Heavy Writing Workloads

Even Data Distribution: Data written to the container will be distributed across the 8 physical partitions, based on the partition key. Uneven distribution can lead to “hot” partitions, where a few partitions receive a disproportionate number of writes, causing performance bottlenecks.

Partition Key Importance: Choosing an appropriate partition key is essential in heavy write scenarios. A well-designed key ensures that data and write operations are evenly distributed across all partitions.

Scaling: If either the partition size exceeds 50 GB or the throughput for a single partition exceeds 10,000 RUs, additional physical partitions will be created dynamically.

Image description

In the above example, the provisioned 40,000 RUs are evenly distributed across the 8 physical partitions, resulting in 5,000 RUs per partition. This ensures that each partition has sufficient throughput to handle the workload.

Recent Features Enhancing Partition Management

Azure Cosmos DB has introduced features to further optimize partitioning:

Burst Capacity: This feature allows containers and databases to handle unexpected traffic spikes by utilizing unused throughput, enhancing performance during peak times.

Materialized Views: Materialized views enable precomputed views of your data, improving query performance by reducing the need for on-the-fly computations.

Best Practices for Partition Management

Monitor Partition Sizes: Regularly check the size of your logical partitions to ensure they remain within the 20 GB limit. Azure Monitor can be configured to alert you when a partition approaches this threshold.

Data Modeling Strategies: Design your data model to align with your partitioning strategy. For example, in a multi-tenant application, using tenant IDs as partition keys can help segregate and manage data efficiently.

Avoid Frequent Updates to Partition Key Values: Since partition keys are immutable, design your data model to minimize the need for updates to these values. If a change is necessary, you'll need to create a new item with the desired partition key and delete the old item.

How Do We Know How Many Physical Partitions Are Being Created?

This is one of the most common questions I receive, and it's essential to understand how to monitor your Cosmos DB partitioning.

Azure Cosmos DB provides built-in dashboards to help you visualize this information. You can find it under:

Monitoring → Insights

Image description

From there, you can view details about the number of physical partitions created, along with other metrics like throughput usage, partition key distribution, and RU consumption. These insights allow you to track partitioning behavior and make informed decisions about scaling and optimization.

Under throughput option in (image below)
Image description

There is a chart called Normalized RU Consumption, there you can see the number of partitions. In this case 2 partitions

Image description

Conclusion

Effective partitioning in Azure Cosmos DB is essential for building scalable and high-performance applications. By selecting appropriate partition keys, leveraging new features like burst capacity and materialized views, and adhering to best practices, you can optimize your database's performance and scalability.

What's Next?

In the next post of this series, I will dive deeper into Logical Partitions and explore how specific scenarios can impact the behavior and creation of physical partitions. Stay tuned for actionable insights and best practices!

References

  1. Azure Cosmos DB Partitioning Overview

    Comprehensive guide on partitioning in Azure Cosmos DB, including logical and physical partitions.

  2. Azure Cosmos DB SQL API Documentation

    Official documentation for working with the SQL API in Cosmos DB.

  3. Best Practices for Partitioning in Azure Cosmos DB

    Tips and strategies for selecting partition keys and managing partitions effectively.

  4. Azure Cosmos DB Insights - Monitoring and Metrics

    Explanation of how to use the Monitoring → Insights dashboard to track partition behavior and other metrics.

  5. Azure Cosmos DB Request Units (RUs)

    Detailed information on RUs, their usage, and how they impact partitioning and performance.

  6. Azure Cosmos DB FAQ - Scaling and Partitioning

    Frequently asked questions related to scaling, partitions, and throughput in Cosmos DB.

nosql Article's
30 articles in total
Favicon
O que Ă© o Apache Cassandra e quando usar?
Favicon
Efficient Batch Writing to DynamoDB with Python: A Step-by-Step Guide
Favicon
SQL VS NoSQL
Favicon
MongoDB: How to setup replica sets
Favicon
Do you think schema flexibility justifies using NoSQL? Think twice.
Favicon
Series de tiempo en MongoDB
Favicon
What I Learned from the 'Amazon DynamoDB for Serverless Architectures' Course on AWS Skill Builder
Favicon
MongoDB Command Shortcuts: The Ultimate Guide
Favicon
MongoDB: Startup replica sets with a config file
Favicon
Azure Logs Analytics for CosmosDB
Favicon
Choosing the Right Database: A Simplified Guide
Favicon
Understanding the Differences Between NoSQL and SQL Databases
Favicon
Part 2 - CosmosDB Logical Partition and the Impact on Partition Key Choice
Favicon
Partitions in Azure Cosmos DB: A Common Discussion with Customers
Favicon
Database Sharding: Simplifying Data Scalability
Favicon
HTTP and GraphQL
Favicon
New possibilities with GraphQL
Favicon
NoSQL delivers quick value
Favicon
Navigating Databases: From SQL to NoSQL
Favicon
Selecting the Right Database for the Job
Favicon
NewSQL: Bridging the Gap Between SQL and NoSQL
Favicon
Weekly Updates - October 18, 2024
Favicon
Overcoming MongoDB Limitations with Fauna
Favicon
MongoDB Developer Day Manila 2024: A Recap - A Deep Dive into the Future of Data
Favicon
How to choose the right database?
Favicon
SQL vs. NoSQL: Key Differences, Use Cases, and Choosing the Right Database for Your Project
Favicon
Top 5 SQL questions asked in interviews
Favicon
Weekly Updates - Nov 8, 2024
Favicon
Plain Javascript Refresher for those feeling left behind or not knowing where to start w/ Functions, Arrays, Loops, JSON & NoSQL
Favicon
Mastering DynamoDB: Batch Operations Explained

Featured ones: