Logo

dev-resources.site

for different kinds of informations.

Do you think schema flexibility justifies using NoSQL? Think twice.

Published at
12/27/2024
Categories
nosql
database
mongodb
Author
ernestomar
Categories
3 categories in total
nosql
open
database
open
mongodb
open
Author
10 person written this
ernestomar
open
Do you think schema flexibility justifies using NoSQL? Think twice.

Introduction

In the world of software development, there is a common belief that implementing a NoSQL database is justified solely by its schema flexibility. However, this perspective can be misleading if we overlook fundamental aspects such as the CAP theorem and the differences between Schema on Write and Schema on Read. As Martin Kleppmann explains in his book Designing Data-Intensive Applications, the choice of a database should be made with a deep understanding of the requirements for consistency, availability, and partition tolerance, while keeping in mind the schema with which the data will be managed.

In this article, you will learn:

  • What the CAP theorem entails and how it influences your application's architecture.
  • Why eventual consistency is not always desirable, especially when handling critical information like financial transactions.
  • The role partition tolerance plays and the trade-offs in consistency within large-scale distributed systems.
  • The real difference between document-oriented and relational databases, and how Schema on Read and Schema on Write are not as distinct as they seem, even at the code level.

1. CAP Theorem

The CAP theorem states that in any distributed system, it is impossible to simultaneously guarantee Consistency, Availability, and Partition Tolerance. You can only prioritize two of these properties at a time:

  • Consistency: All nodes see the same data at the same time.
  • Availability: The system always responds, even if some nodes fail.
  • Partition Tolerance: The system continues to function despite communication failures between nodes.

When designing a large-scale application, you must decide which of these elements are critical and which can be sacrificed. This balance guides the choice between relational databases (emphasizing strong consistency) and many NoSQL databases (leaning toward availability and partition tolerance, but with eventual consistency).

2. Eventual Consistency and When It’s Not Enough

Eventual consistency means that in a distributed system, all nodes will eventually reach a consistent state over time. It works well for social networks or applications where slight delays in data updates do not compromise business integrity.

However, when handling money or banking operations, eventual consistency becomes a risk. In such cases, strong consistency is required, ensuring that every transaction is immediately reflected across the system without the possibility of temporary discrepancies.

Case Study

Imagine a financial institution using MongoDB configured for eventual consistency. If a user withdraws all the funds in their account, leaving it at zero, but that update doesn’t propagate to all nodes immediately, and a credit card payment is processed simultaneously, the customer’s balance could go negative.

For this reason, systems managing money typically require strong consistency as an essential prerequisite.

3. Large Data Volumes and Trade-offs for Partition Tolerance

When dealing with large volumes of data, many distributed architectures choose to sacrifice a certain degree of consistency to achieve high partition tolerance. If a node fails to respond or there are network issues, the application can still function with the remaining nodes.

This trade-off is essential for services with millions of simultaneous users or global systems spanning multiple geographic regions. But it’s not always the right choice: if your application demands absolute precision and cannot tolerate outdated data, adopting a model that sacrifices consistency can be detrimental.

4. Document-Oriented vs Relational: Schema on Read vs Schema on Write

In a document-oriented model (typically NoSQL), the term Schema on Read is used: the structure of documents is not rigidly defined when data is written, and the schema is validated during reading or processing. On the other hand, in a relational model, Schema on Write is used: a rigid schema is defined before any record is entered into the database.

While document-oriented systems offer more flexibility, the reality is that there is always a schema, in one form or another. The code processing the data must know what fields exist and how to interpret them. For example, if your application expects a "price" field to calculate a total, it cannot “guess” where that field is if it wasn’t predefined. Thus, the supposed freedom of schema doesn’t eliminate the need for a coherent design and careful evolution.

Schema Modification Example

In a Relational Database

If we need to add a new column to store the last update date, we must alter the table:

ALTER TABLE products
ADD COLUMN last_update_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP;
Enter fullscreen mode Exit fullscreen mode

Any application querying this table must account for this new column. If there’s a process using the last_update_date information, the code will need to be updated to read it.

In a Document-Oriented Database

Let’s imagine a products collection in MongoDB, where the last update date wasn’t previously stored. From a specific date, we decide to add this field. The new “schema” is handled in the code:

// Starting from 2024-01-01, we add new logic:
function processProduct(product) {
  const referenceDate = new Date('2024-01-01');

  if (new Date() >= referenceDate) {
    if (!product.last_update_date) {
      // Add the field to the document
      product.last_update_date = new Date();
    }
  }

  // Process the rest of the product
  // ...
}
Enter fullscreen mode Exit fullscreen mode

In this scenario, altering a formal structure isn’t required. But the code must handle the new field and, if it’s missing, generate a default behavior (e.g., creating it). In other words, Schema on Read is managed by the application, not by the database.

Conclusion

Choosing the right database model requires considering the CAP theorem and your application’s specific needs. If you need scalability and partition tolerance to handle large data volumes with eventual consistency, a document-oriented or NoSQL database may be the best option. On the other hand, if data accuracy and integrity are critical, relational databases with strong consistency often excel.

Don’t forget that other models, such as graph databases, are ideal for handling complex (many-to-many) relationships and exploring deep connections between entities. The type of relationship (one-to-many, many-to-many, etc.) and the required level of consistency should guide your decision. With this approach, you’ll build a reliable, scalable, and coherent system without falling into the misconception that schema flexibility is the only reason to use NoSQL.

Image from: https://www.commitstrip.com/en/2012/04/10/what-do-you-mean-its-oversized/?

mongodb Article's
30 articles in total
Favicon
🌐 Building Golang RESTful API with Gin, MongoDB đŸŒ±
Favicon
Construindo uma API segura e eficiente com @fastify/jwt e @fastify/mongodb
Favicon
Making a Todo API with FastAPI and MongoDB
Favicon
How to Create and Consume a REST API in Next.js
Favicon
Crudify: Automate Your Mongoose CRUD Operations in NestJS
Favicon
Utilizando la librerĂ­a Mongoose
Favicon
Full Stack Development (Mern && Flutter)
Favicon
Node.js Meets PostgreSQL and MongoDB in Docker: Docker Diaries
Favicon
Comprendre le Design Pattern MVC avec Node.js, Express et MongoDB
Favicon
Set up MongoDB primary and secondary with Docker.
Favicon
The Intricacies of MongoDB Aggregation Pipeline: Challenges and Insights from Implementing It with Go
Favicon
Test Post
Favicon
Containerizing a MERN Stack Application!
Favicon
MongoDB vs. Couchbase: Comparing Mobile Database Features
Favicon
6 Steps to Set Up MongoDB Atlas for Node.js Applications
Favicon
MongoDB: How to setup replica sets
Favicon
To Dockerize a Node.js and MongoDB CRUD app
Favicon
Day 39: Deploying Stateful Applications with StatefulSets (MongoDB)
Favicon
Do you think schema flexibility justifies using NoSQL? Think twice.
Favicon
HadiDB: A Lightweight, Horizontally Scalable Database in Python
Favicon
A Simple Guide for Choosing the Right Database
Favicon
Integrating MongoDB Atlas Alerts with Lark Custom Bot via AWS Lambda
Favicon
🔍 MongoDB Data Modeling: Embedding vs. Referencing - A Strategic Choice!
Favicon
Unique Index on NULL Values in SQL & NoSQL
Favicon
Embedding vs. Referencing - A Strategic Choice!
Favicon
Series de tiempo en MongoDB
Favicon
I want to write a code for POS sales output interface - and import to mongoDb for sale analysis- however is POS agnostic interface, should work with all POS
Favicon
Hello,help review my fullstack website stack : nestjs,mongodb and reactjs. https://events-org-siiv.vercel.app/
Favicon
Implementing an Express-based REST API in TypeScript with MongoDB, JWT-based Authentication, and RBAC
Favicon
It's a Security Thing.

Featured ones: