Logo

dev-resources.site

for different kinds of informations.

Starting a YugabyteDB lab cluster with AWS CLI

Published at
11/11/2024
Categories
yugabytedb
aws
ec2
multiregion
Author
franckpachot
Categories
4 categories in total
yugabytedb
open
aws
open
ec2
open
multiregion
open
Author
12 person written this
franckpachot
open
Starting a YugabyteDB lab cluster with AWS CLI

Here are some command lines to start a YugabyteDB cluster across multiple AWS regions using the AWS CLI. Please note that this setup is for lab purposes only and features no security measures (all ports are open, and the data is transmitted over the public internet without encryption). I use this configuration solely for quick tests.

I set some environment variables, the most important being zones, which list the zones where I will start a node.

export AWS_PAGER=""
KEY_NAME=id_rsa.pub
KEY_NAME=lab.pub
INSTANCE_TYPE=m7i.large
VOLUMESIZE=500

zones="us-east-1a us-east-1b us-east-1c"

Enter fullscreen mode Exit fullscreen mode

For each zone, I create an SSH key and a security group and launch an instance using the environment variables mentioned above.

for zone in $zones
do
 export AWS_REGION=${zone%?}
 ZONE=${zone}
 # Import the key to ssh
 aws ec2 import-key-pair --key-name $KEY_NAME --public-key-material "$(base64 ~/.ssh/id_rsa.pub)"
 # Security group all open (put your network)
 aws ec2 create-security-group \
 --group-name lab-public \
 --description "Security group that allows all traffic"
 aws ec2 authorize-security-group-ingress \
    --protocol -1 \
    --port all \
    --cidr 0.0.0.0/0 \
    --group-id $(
  aws ec2 describe-security-groups \
   --filters "Name=group-name,Values=lab-public" \
   --query "SecurityGroups[*].[GroupId]" \
   --output text
  )
# run an instance
 aws ec2 run-instances \
 --count 1 \
 --instance-type $INSTANCE_TYPE \
 --key-name $KEY_NAME \
 --associate-public-ip-address \
 --placement "AvailabilityZone=${ZONE}" \
 --block-device-mappings "DeviceName=/dev/sda1,Ebs={VolumeSize=${VOLUMESIZE}}" \
 --image-id $(
 aws ec2 describe-images --owners 'aws-marketplace' \
  --filters "Name=name,Values=AlmaLinux OS 8*" "Name=architecture,Values=x86_64" \
  --query "Images | sort_by(@, &CreationDate) | [-1].[ImageId]" \
  --output text | tee /dev/stderr
  ) \
 --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=yb},{Key=Environment,Value=lab}]' \
 --security-group-ids $(aws ec2 describe-security-groups \
  --filters "Name=group-name,Values=lab-public" \
  --query "SecurityGroups[*].[GroupId]" \
  --output text | tee /dev/stderr ) \
  --user-data '#!/bin/bash
  sudo dnf update -y
  ' \
  --output text
done

Enter fullscreen mode Exit fullscreen mode

You can ignore the errors about already existing key and security group:

An error occurred (InvalidKeyPair.Duplicate) when calling the ImportKeyPair operation: The keypair already exists
An error occurred (InvalidGroup.Duplicate) when calling the CreateSecurityGroup operation: The security group 'lab-public' already exists for VPC
An error occurred (InvalidPermission.Duplicate) when calling the AuthorizeSecurityGroupIngress operation: the specified rule "peer: 0.0.0.0/0, ALL, ALLOW" already exists
Enter fullscreen mode Exit fullscreen mode

I use the tag Environment=lab to identify the instances. I list the instances running under this tag in each zone.

# describe
for zone in $zones
do
 export AWS_REGION=${zone%?}
 aws ec2 describe-instances \
 --filters "Name=tag:Environment,Values=lab" "Name=instance-state-name,Values=running" \
 --query "Reservations[*].Instances[*].[Tags[?Key=='Name'].Value|[0] , InstanceId, State.Name, Placement.AvailabilityZone, PublicDnsName ]" \
 --output text | awk '{$NF="http://"$NF":15433";print}'
done | sort -u

Enter fullscreen mode Exit fullscreen mode

Image description

I used a different zones environment variable for this lab to start a multi-region cluster (I used zones="us-east-1a ap-northeast-1a ap-southeast-5a"). I've realized that I have some instances that I started before and forgot to terminate. Don't make the same mistake if you care about your cloud credits. I will show you how to terminate them at the end.

Currently, nothing is running. Here is how I start YugabyteDB on all nodes.

# ssh to install and start YugabyteDB (uses the SSH key)
join="" # will be set to join the previous node to attach to the cluster
for zone in $zones
do
 export AWS_REGION=${zone%?}
 for host in $(
 aws ec2 describe-instances \
 --filters "Name=tag:Environment,Values=lab" "Name=availability-zone,Values=${zone}" "Name=instance-state-name,Values=running" \
 --query "Reservations[*].Instances[*].[ PublicDnsName ]" \
 --output text | tee /dev/stderr
 ) ; do ssh -o StrictHostKeyChecking=no ec2-user@$host bash -c '
# install python and YugabyteDB
cd ~
sudo dnf install -y python39
[ -f ~/yugabyte/bin/yugabyted ] || {
# from https://download.yugabyte.com/#linux
curl -Ls https://downloads.yugabyte.com/releases/2.23.1.0/yugabyte-2.23.1.0-b220-linux-x86_64.tar.gz | tar xzvf -
ln -s $(ls -d yugabyte-*) yugabyte
yugabyte/bin/post_install.sh
}
# Set up TimeSync synchronization 
sudo yum install -y chrony
sudo bash yugabyte/bin/configure_ptp.sh
sudo bash yugabyte/bin/configure_clockbound.sh
# find placement info from EC2 metadata
HOST=$(sudo cloud-init query ds.meta-data.public-hostname)
ZONE=$(sudo cloud-init query ds.meta-data.placement.availability-zone)
REGION=$(sudo cloud-init query ds.meta-data.placement.region)
CLOUD=$(sudo cloud-init query ds.meta-data.services.partition)
# start YugabyteDB
cd ~/yugabyte
set -x
./bin/yugabyted start --enhance_time_sync_via_clockbound --enable_pg_parity_early_access --advertise_address=$HOST --cloud_location=$CLOUD.$REGION.$ZONE ' "$join" '
./bin/yugabyted status
./bin/yugabyted connect ysql <<<"select version() ; select host, cloud, region, zone from yb_servers()"
'
  # this host can be used for the next to join to
  join="--join=$host"
 done
done

Enter fullscreen mode Exit fullscreen mode

I have specified the YugabyteDB version I want to use. It downloads the binaries and starts with yugabyted, adding a --join for the nodes after the first one is started.

I configured the PTP Hardware Clock and started Yugabytedb with --enhance_time_sync_via_clockbound. Compiling/installing clock bound takes some time but allows safe precision time and increases performance.

When they are started, you can access the UI on port 15433.
Image description

To terminate the instances, ensure that the zones environment variable is set.

# terminate
for zone in $zones
do
 export AWS_REGION=${zone%?}
aws ec2 terminate-instances --instance-ids $(
aws ec2 describe-instances \
 --filters "Name=tag:Environment,Values=lab" \
 --query "Reservations[*].Instances[*].InstanceId" \
 --output text --no-paginate | tee /dev/stderr
) --output text
done

Enter fullscreen mode Exit fullscreen mode

If you modify the zones or the tags, ensure consistency when listing the instances. I provided those commands to start a lab quickly. On all nodes, you can check the UI console on port 15433 and connect to the PostgreSQL endpoint on port 5433.

Featured ones: