Logo

dev-resources.site

for different kinds of informations.

Syncing Data in Near Real-Time: Integrate DynamoDB and OpenSearch

Published at
11/16/2024
Categories
dynamodb
opensearch
aws
lambda
Author
malikidrees
Categories
4 categories in total
dynamodb
open
opensearch
open
aws
open
lambda
open
Author
11 person written this
malikidrees
open
Syncing Data in Near Real-Time: Integrate DynamoDB and OpenSearch

Syncing Data in Near Real-Time: Integrate DynamoDB and OpenSearch

Synchronizing data between DynamoDB and OpenSearch can yield incredible performance for fast searches and complex queries on structured, nested data. In this article, we’ll cover an architecture that facilitates near real-time data synchronization from DynamoDB to OpenSearch using a single-table design. We’ll explore how this setup enhances search functionality, optimizes costs, and supports complex, nested data updates.

This architecture is built on AWS DynamoDB, using its single-table design for efficient data storage, with OpenSearch integration to handle fast, scalable searches across complex data structures.

Why Use Single Table Design and OpenSearch Together?

DynamoDB and OpenSearch serve distinct purposes. While DynamoDB provides cost-efficient, fast storage and retrieval of data, OpenSearch is optimized for search operations. By using a single-table design in DynamoDB, we organize related entities under a single partition, improving efficiency and scalability. This design also simplifies the syncing process, as DynamoDB and OpenSearch are synchronized based on DynamoDB Streams.

The primary use cases for this integration include:

  • Near Real-Time Syncing: Reflecting changes from DynamoDB to OpenSearch with minimal latency.

  • Efficient Handling of Array and Nested Structures: Structuring data in OpenSearch to accommodate nested and array-based data stored in DynamoDB, without unnecessary overwrites.

  • Optimized Search Performance: Using OpenSearch for fast searches on complex, nested, and array-based data with minimal impact on DynamoDB performance.

System Design Overview: Key Components

Our system design leverages several AWS services to ensure data is consistently and accurately synchronized. Below, we outline each component’s role, along with key aspects of the single-table DynamoDB design and how data is transformed for OpenSearch compatibility.

Data Model (Single Table) Overview

In a single-table design, related entities share a common partition, with each row differentiated by sort keys. Here’s an example setup for a logistics management system:

Single Table Design

This table allows for efficient querying by PK, retrieving all related entities, such as order details and items, under a single partition.

Architecture Flow

Here’s a high-level view of how the system processes data updates:

  1. DynamoDB Streams: Captures changes to DynamoDB records and triggers an AWS Lambda function.

  2. Lambda Event Processor:
    The Lambda function processes each stream event through the BaseEntityProcessor class, which maps DynamoDB’s single-table structure to OpenSearch’s flat document structure.
    This transformation ensures each Order or Shipment is stored as a single document in OpenSearch, accommodating array-based fields such as items under a unified document.

  3. OpenSearch Service:
    OpenSearch stores order items, shipments, and customers in a flat document structure for faster query performance, enabling complex queries on nested data, such as orderItems.

The diagram below illustrates the flow of data from DynamoDB to OpenSearch via AWS Lambda:

Generated with Napkin AI

Detailed Component Breakdown

  1. DynamoDB Stream and Lambda Event Processing **DynamoDB Streams notify the system whenever there’s a data change (insert, update, or delete). The Lambda function processes each event individually, with the **BaseEntityProcessor mapping DynamoDB records to OpenSearch fields:
  • Array Transformation: In DynamoDB, arrays like items are stored as separate rows. In OpenSearch, they are transformed into a single document structure where all items fall under orderItems.

  • Data Syncing: When an item is added or removed from items in DynamoDB, the entire array is updated in OpenSearch, ensuring it reflects the latest state.

  1. OpenSearch Bulk and Partial Updates

To efficiently sync large data sets, we use OpenSearch’s bulk API for batch updates. Here’s how the OpenSearch service processes updates:

  • Flat Document Structure: Each Order **document in OpenSearch contains a complete array of **orderItems. When any item in DynamoDB’s items row is modified, OpenSearch is updated to reflect the full set of items under orderItems.

  • Handling Nested Updates: Information like shipment is stored within specific sections (e.g., shipmentInfo in OpenSearch). Updates to these nested fields are applied directly to their respective sections, preserving document structure and avoiding overwrites of unrelated data.

Implementation and Code Structure

This project’s file structure is organized for efficient management of different entities, event processing, and OpenSearch operations. Here’s an outline of the key directories and files:

📦stream
 ┣ 📂handlers
 ┃ ┣ 📜StreamBatchHandler.ts
 ┣ 📂processors
 ┃ ┣ 📂entity-processors
 ┃ ┃ ┣ 📜OrderEntityProcessor.ts
 ┃ ┃ ┗ 📜UserEntityProcessor.ts
 ┃ ┣ 📂order-processors
 ┃ ┃ ┗ 📜ShipmentProcessor.ts
 ┃ ┣ BaseEntityProcessor.ts
 ┃ ┣ BatchProcessor.ts
 ┃ ┣ EntityProcessorFactory.ts
 ┃ ┣ EventProcessor.ts
 ┣ 📂services
 ┃ ┣ 📜OpenSearchBulkService.ts
 ┣ 📂utils
 ┃ ┗ 📜Logger.ts
 ┣ 📜README.md
 ┣ 📜event.json
 ┣ 📜index.ts
Enter fullscreen mode Exit fullscreen mode

Each directory plays a specific role:

  • handlers: Contains Lambda functions to process DynamoDB streams.

  • processors: Contains BaseEntityProcessor, defining generic methods, abstract functions, and shared variables for various entities. Key processors in entity-processors extend BaseEntityProcessor:
    OrderEntityProcessor.ts processes core order details, including nested attributes and array-based structures like items.
    UserEntityProcessor.ts handles user-related data processing.

The order-processors subfolder provides specialized processors, such as ShipmentProcessor.ts, that handle shipment details, which are called by OrderEntityProcessor for deeper, order-specific transformations for OpenSearch.

Optimizing DynamoDB and OpenSearch Sync for Real-World Applications

Some optimization tips to keep in mind when implementing this system in production:

  1. Use Cached Index Checks: Caching index checks improves system efficiency, as checking for an index only happens once per entity, reducing load on OpenSearch.

  2. Consider Retry Mechanisms: Implement retry logic in OpenSearch operations to handle intermittent network failures or throttling.

  3. Optimized Partial Updates: With targeted updates on nested structures like orderItems by unique IDs, the system avoids full overwrites in OpenSearch. This method keeps updates focused and minimizes data transfer, making syncing more efficient.

Conclusion

This architecture demonstrates how integrating DynamoDB (single-table design) with OpenSearch enables scalable, near real-time data synchronization. Using a single-table structure in DynamoDB optimizes data organization and retrieval, minimizing duplicated data while centralizing related entities within a single partition. OpenSearch’s document model complements this by supporting flexible, fast searches on large, complex datasets.

With a reliable syncing mechanism, this system handles nested and array-based data, ensuring that all changes in DynamoDB are immediately accessible for search and analytics via OpenSearch. This solution is broadly applicable across e-commerce, logistics, and other high-demand industries.
GitHub - idrsdev/dynamo-opensearch-sync

dynamodb Article's
30 articles in total
Favicon
Efficient Batch Writing to DynamoDB with Python: A Step-by-Step Guide
Favicon
Advanced Single Table Design Patterns With DynamoDB
Favicon
DynamoDB GUI in local
Favicon
Stream Dynamo para RDS Mysql
Favicon
Alarme Dynamo Throttle Events - Discord
Favicon
What I Learned from the 'Amazon DynamoDB for Serverless Architectures' Course on AWS Skill Builder
Favicon
A Novel Pattern for Documenting DynamoDB Access Patterns
Favicon
Making dynamodb queries just a little bit easier.
Favicon
Replicate data from DynamoDB to Apache Iceberg tables using Glue Zero-ETL integration
Favicon
Deploying a Globally Accessible Web Application with Disaster Recovery
Favicon
Practical DynamoDB - Locking Reads
Favicon
Databases - Query Data with DynamoDB
Favicon
Database Management in the cloud (RDS, DynamoDB)
Favicon
Securing Customer Support with AWS Services and Slack Integration
Favicon
Help! I Think I Broke DynamoDB – A Tale of Three Wishes 🧞‍♂️
Favicon
Mastering DynamoDB in 2025: Building Scalable and Cost-Effective Applications
Favicon
Syncing Data in Near Real-Time: Integrate DynamoDB and OpenSearch
Favicon
DynamoDB and the Art of Knowing Your Limits 💥When Database Bites Back 🧛‍♂️
Favicon
DynamoDB Transactions with AWS Step Functions
Favicon
Create an API to get data from your DynamoDB Database using CDK
Favicon
AWS Database - Part 2: DynamoDB
Favicon
Building event-driven workflows with DynamoDB Streams
Favicon
Setting up a REST API in Python for DynamoDB
Favicon
How to Create a (Nearly) Free Serverless Rate Limiter on AWS
Favicon
Query DynamoDB with SQL using Athena - Leveraging EventBridge Pipes and Firehose (2/2)
Favicon
Schema Validation in Amazon DynamoDB
Favicon
Use Cases for DynamoDB in Provisioned Mode vs. Auto Scaling: Advantages and Disadvantages
Favicon
Mastering DynamoDB: Batch Operations Explained
Favicon
Not only dynamoDb migration or seed scripting
Favicon
Query DynamoDB with SQL using Athena - Leveraging DynamoDB Exports to S3 (1/2)

Featured ones: