dev-resources.site
for different kinds of informations.
Implementing Change Data Capute (CDC) with Aurora Serverless v2
Aurora Serverless v2 is now available after a long wait and it claim to solve many issues and limitations from its predecessor.
One of the limitations of Aurora Serverless v1 is the impossibility of using CDC (Change Data Capture) with AWS DMS (Database Migration Service) but now it is possible using the v2.
In this guide we will review step by step the configuration and implementation of CDC with Aurora Serverless v2 and AWS DMS.
Actually, the configuration is pretty much the same as configuring CDC on a provisioned Aurora DB Cluster. Also consider that all the configurations are based on us-east-1 (N. Virginia) region.
Creating the Database
As first step, we need to create the Parameter Group, Subnet Group and the Database cluster. I am using almost all the default parameters, changing only the required ones to enable CDC (those parameters are in italic letters).
Parameter Group
Click on the 'Create Parameter Group' button:
Select:
Parameter Group family: aurora-mysql8.0
(Currently it is the only one compatible with Aurora Serverless v2)
Type:
DB Cluster Parameter Group
Group Name:
pg-servelessv2-cdc
Description:
Enable change data capture
Once the Parameter group is created, select it and click on 'Edit parameters':
We need to change these parameters:
binlog_format: ROW
binlog_checksum: NONE
Subnet group
Using subnet group we can specify in which subnets the database will be deployed. So in this case I will select public subnets to be able to connect to the DB easily.
Click on the 'Create DB subnet group' button:
Name:
sg-db-publicaccess
Description:
Using public subnets for demo only
VPC:
your_VPC_id
Availability Zones:
az_ids (is mandatory to select at least 2)
And now the subnet groups is created:
Database
Click on the 'Create Database' button:
Database creation method: Standard create
Engine type: Amazon Aurora
Edition: Amazon Aurora MySQL-Compatible Edition
Replication features: Single-master (default)
Engine version: Aurora MySQL 3.02.0 (compatible with MySQL 8.0.23)
Templates: Dev/Test
DB cluster identifier: db-test-cdc
Master username: admin
Master password: supersecretpassword
DB instance class: Serverless v2 - new
Capacity range - Minimum ACUs: 0.5
Capacity range - Maximum ACUs: 1
Multi-AZ deployment: Don't create an Aurora Replica
Virtual private cloud (VPC): your_VPC_id
Subnet group: sg-db-publicaccess (the subnet group previously created)
Public access: Yes
VPC security group: Create new
New VPC security group name: rds-publicaccess
Availability Zone: No preference
Database port: 3306
Database authentication options: Password authentication
Initial database name: sampledb
DB cluster parameter group: pg-servelessv2-cdc (the parameter group previously created)
DB parameter group: default.aurora-mysql8.0
Click on Create database (it takes approximately 15 minutes to create completely):
Also modifiy the created Security Group (rds-publicaccess) to allow inboud traffic on port 3306 from 0.0.0.0/0
Creating the S3 bucket
The next step is to create an S3 bucket which will be used as the target destination for the full load and change data capture task.
It does not required any special configuration, we can use a bucket with default creation parameters:
Name: test-dms-s3-target
Creating the DMS Replication Instance
We need to create the DMS replication subnet group. Similar to database subnet group, it helps to specify the subnets that will be used. In this case I am selecting private subnets but can be any with proper connectivity to the database instance and S3 bucket.
Name: sg-dms
Additionally, create the Security group that will be attached to the DMS replication instance. It requires allow outbound traffic.
And create the DMS Replication Instance:
Name: dms-instance
Instance class: dms.t3.small
Engine version: 3.4.6
Allocated storage: 20GB
VPC: The same VPC as the one RDs instance belongs to
Multi AZ: Dev or test worload
Publicly accessible: No
Replication subnet group: sg-dms (the previously created)
Availability zone: No preference
VPC security group(s): dms-replication-instance (the previously created)
KMS key: Default
Inserting sample data into the database
Before create the DMS task I am going to insert sample data into the database to validate the task works as expected. The database imported is from MySQL sample, specifically, the Sakila sample database. These are 2 scripts to execute so it is very quick to replicate.
Connect to the database using the user and password configured and setting the Database endpoint properly (in case of connection errors check the SG allow traffic on port 3306 for the IP range):
Then execute the downloaded scripts: sakila-schema.sql and sakila-data.sql
And now we have populated the database:
Creating the DMS Tasks
Create the Source Endpoint to connect to the database and the Target Endpoint to connect to the S3 bucket:
Before create the Target Endpoint for the S3 bucket, we need to create an IAM role granting access to put/delete objects on the bucket:
Select DMS as the AWS service:
Don't select any policies for now:
Select an proper name and create the role:
And Add permissions -> Create inline policy
The policy is as following (change the bucket name for the corresponding value):
{
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Action" : [
"s3:PutObject",
"s3:DeleteObject"
],
"Resource" : "arn:aws:s3:::test-dms-s3-target/*"
},
{
"Effect" : "Allow",
"Action" : "s3:ListBucket",
"Resource" : "arn:aws:s3:::test-dms-s3-target"
}
]
}
Now, we can create the Target endpoint:
To create the Database Migration Task we use the resources previously created:
Task identifier: full-load-and-cdc-auroraserverless
For Task setting in this demo we can leave the default parameters except for the Enable CloudWatch logs which will be helpful to validate everything is Ok:
For Table mappings we only need to specify the schema (sakila in this case) and leave the other parameters with default values:
And finally click on Create task:
Validating the CDC feature
Lets keep the task running for some minutes untill the state is "Load complete, replication ongoing":
Check from table statistics that the data was loaded successfully:
The relevant fields are Full load rows and Total rows:
If you navigate through the bucket objects, you will notice there is a folder for each table:
And for the first load is structured as follows:
To check the CDC in action, lets insert some rows:
INSERT INTO `sakila`.`customer` (`customer_id`, `store_id`, `first_name`, `last_name`, `email`, `address_id`, `active`, `create_date`, `last_update`) VALUES ('600', '2', 'JOHN', 'SMITH', '[email protected]', '605', '1', '2006-02-14 22:04:37', '2006-02-15 04:57:20');
INSERT INTO `sakila`.`customer` (`customer_id`, `store_id`, `first_name`, `last_name`, `email`, `address_id`, `active`, `create_date`, `last_update`) VALUES ('601', '2', 'WILL', 'CARTER', '[email protected]', '605', '1', '2006-02-14 22:04:37', '2006-02-15 04:57:20');
INSERT INTO `sakila`.`customer` (`customer_id`, `store_id`, `first_name`, `last_name`, `email`, `address_id`, `active`, `create_date`, `last_update`) VALUES ('602', '2', 'DONALD', 'JACKSON', '[email protected]', '605', '1', '2006-02-14 22:04:37', '2006-02-15 04:57:20');
And also delete some rows from the customer table:
DELETE FROM `sakila`.`customer` WHERE (`customer_id` = '9');
DELETE FROM `sakila`.`customer` WHERE (`customer_id` = '15');
DELETE FROM `sakila`.`customer` WHERE (`customer_id` = '4');
DELETE FROM `sakila`.`customer` WHERE (`customer_id` = '433');
DELETE FROM `sakila`.`customer` WHERE (`customer_id` = '599');
After a couple of minutes (it could take 2 or 3 minutes), the changes should be reflected on both the task statistics and S3 bucket:
Conclusion
With the new version of Aurora serverless there is not limitation on setting up DMS migration tasks with Change Data Capture. It allows to take advantage of the serverless model and also the configuration is very similar to provisioned version.
Featured ones: