dev-resources.site
for different kinds of informations.
When NOT to use Atlas Search
Design reviews are one-on-one meetings where MongoDB experts deliver advice on data modeling best practices and application design challenges. In this series, we are going to explore common real-life scenarios where design reviews helped developers achieve meaningful success with MongoDB. - How to Align Your Data Model With Your Application Needs When Migrating From RDBMS to MongoDB | by NĂ©stor Daza
Note: I’m taking the opportunity to link liberally, sometimes loony-ily. I love the serendipity of following interesting links. I had fun researching and reminding myself of oldies but goodies. Here’s to at least some of the shiny paths followed being entertaining and educational to you too.
We’ve got to start with a couple of assumptions for this article to best fit:
- You’ve got documents in MongoDB Atlas.
- The documents need to be findable.
If your documents aren’t in Atlas, then Atlas Search doesn’t (yet) apply, and thus the rest of this write-up is moot.
If anything, I’m pragmatic and agile. Duct tape, twine, or a card catalog—use what works for the job. Whether you use MongoDB or not, the document model is a good way to think about data challenges and worth having handy when the time is right. Do consider Atlas for your future data needs, as it’s a platform that provides a lot of necessary and powerful capabilities. Just sayin’.
Findability is one such necessary database capability. If you can’t find your content, it may as well not exist.
Atlas Search enables powerful, scalable, and relevant search features. Its strength primarily stems from one little, old Java library. There’s a potent elixir in that .jar! And it has been The Solution to All The Challenges for the bulk of my career. One rather fun aspect of my life at MongoDB is tackling Design Reviews that involve some aspect of search. I excel at, and enjoy, solving concrete search problems. These reviews typically are with folks using Atlas Search and wanting to dig in deeper to get a bit more nuance to relevancy tuning, or folks using MongoDB $match
and $regex
and exploring if and how to leverage Atlas Search instead. Here’s a story about a recent Design Review with a customer already well versed in Atlas Search and using it effectively… to a point.
Atlas Search matching as expected
Here’s the use case presented to me by a customer during a design review session:
We have a service built on MongoDB Atlas that needs to rapidly match identity requests using only a few fields of exact (though case-insensitive) values, such as an ID, e-mail address, and phone number.
Case-insensitive matching over a few fields? Definitely a problem that Atlas Search can solve handily! Take a few compound.should
clauses and call us in the morning.
And unsurprisingly, the customer reports that:
Atlas Search matching works as expected...
But not so fast
Literally, and unfortunately,
… However, the time to “eventual consistency” in order to match recently updated documents is too long for the required SLA.
And, to work around that,
A [third-party key-value] caching mechanism was implemented for a first pass lookup.
Both of these topics warrant a bit of a deeper dive, so that we can understand how best to help this customer.
- Eventual consistency
- Key lookup using indexes
Eventual consistency
Yes, Atlas Search is awesome! It can slice, dice, and do all sorts of groovy things, yet it obediently stays within the laws of physics. Data, being what it is, always will be adding, updating, and deleting from the database and its replica set. The Atlas Search process (mongot
) handles the database change stream and updates the underlying Lucene index. This process, by default, runs co-located with the database processes themselves on the same hardware, though ideally should run on its own hardware nearby.
Coupled Architecture
Dedicated Search Nodes
Atlas Search is eventually consistent. These machinations involve shards, replicas, CPU, disk, memory, network, and a bit of time. Changes to the database will, eventually, be reflected in associated search indexes. But it isn’t instantaneous, and there are many variables that affect the lag between a database change and search requests finding documents by the modified criteria: rate of data changes, complexity of index mapping configuration, deployment architecture/capabilities, resource contentions, size of the index, query load, and maybe even solar flares.
Depending on the nature of the application, the eventually consistent lag time may be irrelevant or a critical aspect of consideration. An update to a book record in a library can get reindexed overnight without affecting operations. However, this identity request for a record that just got updated failing to match the latest value in the database is unacceptable.
The trade-off of a search index being eventually consistent is to not delay, or interfere, with database-level updates and transactions. A search index update has so many variables involved and can change over time in complexity; a change to the index configuration could cause vastly more terms or documents to be indexed. An Atlas Search index is an index configuration and its corresponding Lucene index. This word “index” is a great one, but actually a Lucene index is really a collection of special purpose data structures, one for each field (and multi) defined. Each field “type” has its own optimized index data structure. Lexicographically ordered inverted indexes with posting lists complete with term, document, and corpus-level statistics power string
mapped fields and queries. This is the heart of relevancy computations.
Key value lookup using indexes
Finding by _id
(every document's unique key) is a given (so we throw that one in for free!). What about finding your data by other exact match types of criteria, such as all products in a specific category? Or all documents modified by a particular username? No doubt this type of findability is crucial too. MongoDB is really good at looking up documents by a value, provided the value is indexed.
This particular application needs exact, case-insensitive value lookup over a few fields. Let’s push the case-insensitivity issue to the application, and simply have it lowercase the field value any time it is being written or queried, so now it’s fully an exact match situation on the MongoDB side of things. Following indexing best practices such as the ESR rule, a few single-field B-Tree index definitions are all the customer needs to satisfy their performance SLA. These indexes don’t come for free, either, but are managed in the database process quickly and handled synchronously with every document update, so consistency is guaranteed.
And to be sure, key/value lookup in Lucene (via Atlas Search) is very fast. It’s the eventual consistency lag that drives the design recommendation here. If the use case had been querying across dozens of fields in any combination for exact values and eventual consistency was an acceptable trade-off, Atlas Search would be the better approach here. With a lot of fields to intersect, B-Tree index configuration would be arduous and resource-intensive, whereas an Atlas Search index configured for multiple-field intersection would be quite efficient and performant.
Design recommendation: B-Tree pragmatism
For this case of a few fields of exact value matching, with no full-text fuzzy search needed, the clear winner is leveraging the in-process, consistent, and quick B-Tree index capabilities.
When queries are exact field matches and the eventual consistency time lag is a critical blocker, consider using classic MongoDB B-Tree indexes rather than Atlas Search. Atlas Search indexes are updated in a separate process, maybe even on separate hardware via a network hop, whereas B-Tree index updates happen within the scope of database update transactions and are immediately usable after an update completes. Note that _id
is implicitly indexed in this fashion and can be used for domain values if appropriate. With B-Tree-based lookups, a front-end cache is not needed as this is already a fast key/value lookup from a RAM-based index.
Be sure to learn about data modeling and schema design for Atlas Search so that you’re ready for the problems for which it shines!
Featured ones: