dev-resources.site

for different kinds of informations.

Syncing Data in Near Real-Time: Integrate DynamoDB and OpenSearch

Published at

11/16/2024

Syncing Data in Near Real-Time: Integrate DynamoDB and OpenSearch

Synchronizing data between DynamoDB and OpenSearch can yield incredible performance for fast searches and complex queries on structured, nested data. In this article, we’ll cover an architecture that facilitates near real-time data synchronization from DynamoDB to OpenSearch using a single-table design. We’ll explore how this setup enhances search functionality, optimizes costs, and supports complex, nested data updates.

This architecture is built on AWS DynamoDB, using its single-table design for efficient data storage, with OpenSearch integration to handle fast, scalable searches across complex data structures.

Why Use Single Table Design and OpenSearch Together?

DynamoDB and OpenSearch serve distinct purposes. While DynamoDB provides cost-efficient, fast storage and retrieval of data, OpenSearch is optimized for search operations. By using a single-table design in DynamoDB, we organize related entities under a single partition, improving efficiency and scalability. This design also simplifies the syncing process, as DynamoDB and OpenSearch are synchronized based on DynamoDB Streams.

The primary use cases for this integration include:

Near Real-Time Syncing: Reflecting changes from DynamoDB to OpenSearch with minimal latency.
Efficient Handling of Array and Nested Structures: Structuring data in OpenSearch to accommodate nested and array-based data stored in DynamoDB, without unnecessary overwrites.
Optimized Search Performance: Using OpenSearch for fast searches on complex, nested, and array-based data with minimal impact on DynamoDB performance.

System Design Overview: Key Components

Our system design leverages several AWS services to ensure data is consistently and accurately synchronized. Below, we outline each component’s role, along with key aspects of the single-table DynamoDB design and how data is transformed for OpenSearch compatibility.

Data Model (Single Table) Overview

In a single-table design, related entities share a common partition, with each row differentiated by sort keys. Here’s an example setup for a logistics management system:

This table allows for efficient querying by PK, retrieving all related entities, such as order details and items, under a single partition.

Architecture Flow

Here’s a high-level view of how the system processes data updates:

DynamoDB Streams: Captures changes to DynamoDB records and triggers an AWS Lambda function.
Lambda Event Processor:
The Lambda function processes each stream event through the BaseEntityProcessor class, which maps DynamoDB’s single-table structure to OpenSearch’s flat document structure.
This transformation ensures each Order or Shipment is stored as a single document in OpenSearch, accommodating array-based fields such as items under a unified document.
OpenSearch Service:
OpenSearch stores order items, shipments, and customers in a flat document structure for faster query performance, enabling complex queries on nested data, such as orderItems.

The diagram below illustrates the flow of data from DynamoDB to OpenSearch via AWS Lambda:

Detailed Component Breakdown

DynamoDB Stream and Lambda Event Processing **DynamoDB Streams notify the system whenever there’s a data change (insert, update, or delete). The Lambda function processes each event individually, with the **BaseEntityProcessor mapping DynamoDB records to OpenSearch fields:

Array Transformation: In DynamoDB, arrays like items are stored as separate rows. In OpenSearch, they are transformed into a single document structure where all items fall under orderItems.
Data Syncing: When an item is added or removed from items in DynamoDB, the entire array is updated in OpenSearch, ensuring it reflects the latest state.

OpenSearch Bulk and Partial Updates

To efficiently sync large data sets, we use OpenSearch’s bulk API for batch updates. Here’s how the OpenSearch service processes updates:

Flat Document Structure: Each Order **document in OpenSearch contains a complete array of **orderItems. When any item in DynamoDB’s items row is modified, OpenSearch is updated to reflect the full set of items under orderItems.
Handling Nested Updates: Information like shipment is stored within specific sections (e.g., shipmentInfo in OpenSearch). Updates to these nested fields are applied directly to their respective sections, preserving document structure and avoiding overwrites of unrelated data.

Implementation and Code Structure

This project’s file structure is organized for efficient management of different entities, event processing, and OpenSearch operations. Here’s an outline of the key directories and files:

📦stream
 ┣ 📂handlers
 ┃ ┣ 📜StreamBatchHandler.ts
 ┣ 📂processors
 ┃ ┣ 📂entity-processors
 ┃ ┃ ┣ 📜OrderEntityProcessor.ts
 ┃ ┃ ┗ 📜UserEntityProcessor.ts
 ┃ ┣ 📂order-processors
 ┃ ┃ ┗ 📜ShipmentProcessor.ts
 ┃ ┣ BaseEntityProcessor.ts
 ┃ ┣ BatchProcessor.ts
 ┃ ┣ EntityProcessorFactory.ts
 ┃ ┣ EventProcessor.ts
 ┣ 📂services
 ┃ ┣ 📜OpenSearchBulkService.ts
 ┣ 📂utils
 ┃ ┗ 📜Logger.ts
 ┣ 📜README.md
 ┣ 📜event.json
 ┣ 📜index.ts

Each directory plays a specific role:

handlers: Contains Lambda functions to process DynamoDB streams.
processors: Contains BaseEntityProcessor, defining generic methods, abstract functions, and shared variables for various entities. Key processors in entity-processors extend BaseEntityProcessor:
OrderEntityProcessor.ts processes core order details, including nested attributes and array-based structures like items.
UserEntityProcessor.ts handles user-related data processing.

The order-processors subfolder provides specialized processors, such as ShipmentProcessor.ts, that handle shipment details, which are called by OrderEntityProcessor for deeper, order-specific transformations for OpenSearch.

Optimizing DynamoDB and OpenSearch Sync for Real-World Applications

Some optimization tips to keep in mind when implementing this system in production:

Use Cached Index Checks: Caching index checks improves system efficiency, as checking for an index only happens once per entity, reducing load on OpenSearch.
Consider Retry Mechanisms: Implement retry logic in OpenSearch operations to handle intermittent network failures or throttling.
Optimized Partial Updates: With targeted updates on nested structures like orderItems by unique IDs, the system avoids full overwrites in OpenSearch. This method keeps updates focused and minimizes data transfer, making syncing more efficient.

Conclusion

This architecture demonstrates how integrating DynamoDB (single-table design) with OpenSearch enables scalable, near real-time data synchronization. Using a single-table structure in DynamoDB optimizes data organization and retrieval, minimizing duplicated data while centralizing related entities within a single partition. OpenSearch’s document model complements this by supporting flexible, fast searches on large, complex datasets.

With a reliable syncing mechanism, this system handles nested and array-based data, ensuring that all changes in DynamoDB are immediately accessible for search and analytics via OpenSearch. This solution is broadly applicable across e-commerce, logistics, and other high-demand industries.
GitHub - idrsdev/dynamo-opensearch-sync

dynamodb Article's

30 articles in total

Efficient Batch Writing to DynamoDB with Python: A Step-by-Step Guide