Logo

dev-resources.site

for different kinds of informations.

Starting a YugabyteDB lab cluster with AWS CLI

Published at
11/11/2024
Categories
yugabytedb
aws
ec2
multiregion
Author
franckpachot
Categories
4 categories in total
yugabytedb
open
aws
open
ec2
open
multiregion
open
Author
12 person written this
franckpachot
open
Starting a YugabyteDB lab cluster with AWS CLI

Here are some command lines to start a YugabyteDB cluster across multiple AWS regions using the AWS CLI. Please note that this setup is for lab purposes only and features no security measures (all ports are open, and the data is transmitted over the public internet without encryption). I use this configuration solely for quick tests.

I set some environment variables, the most important being zones, which list the zones where I will start a node.

export AWS_PAGER=""
KEY_NAME=id_rsa.pub
KEY_NAME=lab.pub
INSTANCE_TYPE=m7i.large
VOLUMESIZE=500

zones="us-east-1a us-east-1b us-east-1c"

Enter fullscreen mode Exit fullscreen mode

For each zone, I create an SSH key and a security group and launch an instance using the environment variables mentioned above.

for zone in $zones
do
 export AWS_REGION=${zone%?}
 ZONE=${zone}
 # Import the key to ssh
 aws ec2 import-key-pair --key-name $KEY_NAME --public-key-material "$(base64 ~/.ssh/id_rsa.pub)"
 # Security group all open (put your network)
 aws ec2 create-security-group \
 --group-name lab-public \
 --description "Security group that allows all traffic"
 aws ec2 authorize-security-group-ingress \
    --protocol -1 \
    --port all \
    --cidr 0.0.0.0/0 \
    --group-id $(
  aws ec2 describe-security-groups \
   --filters "Name=group-name,Values=lab-public" \
   --query "SecurityGroups[*].[GroupId]" \
   --output text
  )
# run an instance
 aws ec2 run-instances \
 --count 1 \
 --instance-type $INSTANCE_TYPE \
 --key-name $KEY_NAME \
 --associate-public-ip-address \
 --placement "AvailabilityZone=${ZONE}" \
 --block-device-mappings "DeviceName=/dev/sda1,Ebs={VolumeSize=${VOLUMESIZE}}" \
 --image-id $(
 aws ec2 describe-images --owners 'aws-marketplace' \
  --filters "Name=name,Values=AlmaLinux OS 8*" "Name=architecture,Values=x86_64" \
  --query "Images | sort_by(@, &CreationDate) | [-1].[ImageId]" \
  --output text | tee /dev/stderr
  ) \
 --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=yb},{Key=Environment,Value=lab}]' \
 --security-group-ids $(aws ec2 describe-security-groups \
  --filters "Name=group-name,Values=lab-public" \
  --query "SecurityGroups[*].[GroupId]" \
  --output text | tee /dev/stderr ) \
  --user-data '#!/bin/bash
  sudo dnf update -y
  ' \
  --output text
done

Enter fullscreen mode Exit fullscreen mode

You can ignore the errors about already existing key and security group:

An error occurred (InvalidKeyPair.Duplicate) when calling the ImportKeyPair operation: The keypair already exists
An error occurred (InvalidGroup.Duplicate) when calling the CreateSecurityGroup operation: The security group 'lab-public' already exists for VPC
An error occurred (InvalidPermission.Duplicate) when calling the AuthorizeSecurityGroupIngress operation: the specified rule "peer: 0.0.0.0/0, ALL, ALLOW" already exists
Enter fullscreen mode Exit fullscreen mode

I use the tag Environment=lab to identify the instances. I list the instances running under this tag in each zone.

# describe
for zone in $zones
do
 export AWS_REGION=${zone%?}
 aws ec2 describe-instances \
 --filters "Name=tag:Environment,Values=lab" "Name=instance-state-name,Values=running" \
 --query "Reservations[*].Instances[*].[Tags[?Key=='Name'].Value|[0] , InstanceId, State.Name, Placement.AvailabilityZone, PublicDnsName ]" \
 --output text | awk '{$NF="http://"$NF":15433";print}'
done | sort -u

Enter fullscreen mode Exit fullscreen mode

Image description

I used a different zones environment variable for this lab to start a multi-region cluster (I used zones="us-east-1a ap-northeast-1a ap-southeast-5a"). I've realized that I have some instances that I started before and forgot to terminate. Don't make the same mistake if you care about your cloud credits. I will show you how to terminate them at the end.

Currently, nothing is running. Here is how I start YugabyteDB on all nodes.

# ssh to install and start YugabyteDB (uses the SSH key)
join="" # will be set to join the previous node to attach to the cluster
for zone in $zones
do
 export AWS_REGION=${zone%?}
 for host in $(
 aws ec2 describe-instances \
 --filters "Name=tag:Environment,Values=lab" "Name=availability-zone,Values=${zone}" "Name=instance-state-name,Values=running" \
 --query "Reservations[*].Instances[*].[ PublicDnsName ]" \
 --output text | tee /dev/stderr
 ) ; do ssh -o StrictHostKeyChecking=no ec2-user@$host bash -c '
# install python and YugabyteDB
cd ~
sudo dnf install -y python39
[ -f ~/yugabyte/bin/yugabyted ] || {
# from https://download.yugabyte.com/#linux
curl -Ls https://downloads.yugabyte.com/releases/2.23.1.0/yugabyte-2.23.1.0-b220-linux-x86_64.tar.gz | tar xzvf -
ln -s $(ls -d yugabyte-*) yugabyte
yugabyte/bin/post_install.sh
}
# Set up TimeSync synchronization 
sudo yum install -y chrony
sudo bash yugabyte/bin/configure_ptp.sh
sudo bash yugabyte/bin/configure_clockbound.sh
# find placement info from EC2 metadata
HOST=$(sudo cloud-init query ds.meta-data.public-hostname)
ZONE=$(sudo cloud-init query ds.meta-data.placement.availability-zone)
REGION=$(sudo cloud-init query ds.meta-data.placement.region)
CLOUD=$(sudo cloud-init query ds.meta-data.services.partition)
# start YugabyteDB
cd ~/yugabyte
set -x
./bin/yugabyted start --enhance_time_sync_via_clockbound --enable_pg_parity_early_access --advertise_address=$HOST --cloud_location=$CLOUD.$REGION.$ZONE ' "$join" '
./bin/yugabyted status
./bin/yugabyted connect ysql <<<"select version() ; select host, cloud, region, zone from yb_servers()"
'
  # this host can be used for the next to join to
  join="--join=$host"
 done
done

Enter fullscreen mode Exit fullscreen mode

I have specified the YugabyteDB version I want to use. It downloads the binaries and starts with yugabyted, adding a --join for the nodes after the first one is started.

I configured the PTP Hardware Clock and started Yugabytedb with --enhance_time_sync_via_clockbound. Compiling/installing clock bound takes some time but allows safe precision time and increases performance.

When they are started, you can access the UI on port 15433.
Image description

To terminate the instances, ensure that the zones environment variable is set.

# terminate
for zone in $zones
do
 export AWS_REGION=${zone%?}
aws ec2 terminate-instances --instance-ids $(
aws ec2 describe-instances \
 --filters "Name=tag:Environment,Values=lab" \
 --query "Reservations[*].Instances[*].InstanceId" \
 --output text --no-paginate | tee /dev/stderr
) --output text
done

Enter fullscreen mode Exit fullscreen mode

If you modify the zones or the tags, ensure consistency when listing the instances. I provided those commands to start a lab quickly. On all nodes, you can check the UI console on port 15433 and connect to the PostgreSQL endpoint on port 5433.

yugabytedb Article's
30 articles in total
Favicon
PostgreSQL plan_cache_mode
Favicon
Index Filtering in PostgreSQL and YugabyteDB (Index Scan instead of Index Only Scan)
Favicon
Unique Index on NULL Values in SQL & NoSQL
Favicon
More details in pg_locks for YugabyteDB
Favicon
Large IntentsDB MemTable with Many Small SST Files
Favicon
Aurora DSQL: How it Compares to YugabyteDB
Favicon
Document data modeling to avoid write skew anomalies
Favicon
When to replace IN() with EXISTS() - correlated and uncorrelated subqueries
Favicon
2024.2: Faster with Shared Memory Between PostgreSQL and TServer Layers
Favicon
Aurora DSQL - Simple Inserts Workload from an AWS CloudShell
Favicon
Amazon Aurora DSQL: Which PostgreSQL Service Should I Use on AWS ?
Favicon
YugabyteDB MVCC and Updates: columns vs. JSON
Favicon
No Gap Ordered Numbering in SQL: A Unique Index to Serialize In Read Committed
Favicon
Starting a YugabyteDB lab cluster with AWS CLI
Favicon
Speeding Up Foreign Key Constraints During Migrations
Favicon
Indexing for a Scalable Serialization Isolation Level
Favicon
The Doctor's On-Call Shift example and a Normalized Relational Schema to Avoid Write Skew
Favicon
You Probably Don't Need Serializable Isolation
Favicon
A brief example of an SQL serializable transaction
Favicon
YugabyteDB as a Graph database with PuppyGraph
Favicon
Native GLIBC instead of Linuxbrew since 2.21
Favicon
pgSphere and Q3C on Distributed SQL
Favicon
IN() Index Scan in PostgreSQL 17 and YugabyteDB LSM Tree
Favicon
Frequent Re-Connections improved by Connection Manager
Favicon
Maintaining Throughput With Less Physical Connections
Favicon
YugabyteDB Connection Manager: a Database Resident Connection Pool with Shared Processes
Favicon
ERROR: index row size 3056 exceeds btree version 4 maximum 2704 for index
Favicon
Write Buffering to Reduce Raft Consensus Latency in YugabyteDB
Favicon
Asynch replication for Disaster Recovery, Read Replicas, and Change Data Capture
Favicon
Fast PITR and MVCC reads with Key-Value LSM Tree

Featured ones: