dev-resources.site
for different kinds of informations.
Self-host - Part 2 - Zero-Downtime Deployment using Docker Swarm
This blog will be the second in the three-part series (maybe more, we will see) of self-hosting. In the first part, we have explained how to start and secure your self-hosted server. This second part will address zero-downtime deployment using Docker Swarm. The third part will discuss backing up our databases without downtime.
Why zero-downtime deployment? And why should we worry about it that much?
Let's answer these questions as straightforward as possible.
You don't want users looking at some error page while you are deploying your application updates.
Namely, you will have updates in your application, and regardless of those updates, you don't want users to be affected. Ideally, you want them to simply switch to a more updated version when ready, and that is exactly what we are going to work on here with the help of Docker Swarm.
Although there are multiple options for zero-downtime deployment, with one of the most common called "Blue-green deployment", most of those require multiple servers to be obtained. Basically, for blue-green deployment, you need to have two servers with the same environment behind some load balancer, and once you upload your updates to one server, the load balancer should switch all traffic from one server to the other. And if something goes wrong, there should be logic for reverting to the old server (which can get quite complex and is easier said than done, especially when there are databases involved).
As we are in the process of making our application which still doesn't generate revenue, we don't want to pay for multiple servers, and that is why we want all of that logic to be on one server in virtual environments, while we want to achieve same behavior as with blue-green deployments. That is where Docker Swarm comes into the picture, as it is quite simple to use and has zero-downtime deployment implemented out of the box.
In this article, we will explore how to automate zero-downtime deployment using Docker Swarm and shell scripts, so we don't have to pay for already existing cloud solutions, at least in the beginning until our super application starts generating meaningful revenue.
The goal here is to allow only certain machines and users to deploy application updates. If we decide, we can allow some new users that started working on our application code (the project is growing) to only be allowed to commit to git, and when we decide (once a week, or once per two weeks for example) we deploy all of that code from our machine that is allowed to deploy. This gives us a sense of control over what is live on the server and what is still in the making. Remember, we have 2FA implemented in the previous article, and we will work with it in this article too. The end goal that we want to achieve: type the command ./deploy.sh
from your machine, and type code from 2FA and once it is done, new changes should just magically appear in production.
To achieve the goal of automatic deployment, we will turn to shell scripting and we will have the following steps that we need to complete:
- Initiate Docker Swarm on the server
- Read credentials file and clone repositories from GitHub
- Write down versions for current deployment in a txt file that will be synced with the remote server
- Build all containers in our local machine and save them with proper versions
- Transfer built containers from our local machine to the remote server
- Start containers on the remote server with zero-downtime deployment
- Cleanup, both remote server and local machine from builds and versions
We will tackle each of these steps in the following text, but at this moment, let's get familiar with what the file structure will look like:
-
deploy
-
input-data
- credentials.txt
- repositories.txt
- repo-list.txt
- envs.zip
- repositories (will be populated once a script is run)
-
scripts
- build_containers.sh
- cleanup.sh
- clone_repos.sh
- start_containers.sh
- transfer_containers.sh
- write_versions.sh
- output (will be populated once a script is run)
- deploy.sh
-
input-data
docker-compose.prod.yaml
Let's break down what these files and directories represent:
-
docker-compose.prod.yaml is a compose file that is created with Swarm in mind for the production, it will have a
deploy
key in it and it will be transferred eventually to the remote server. - deploy directory contains all the logic that is required to automate automatic deployment
- input-data is the directory that contains txt files that will be loaded in shell scripts from the scripts directory. Those text files contain all the necessary information to clone the repository, connect to the remote server, and conduct deployment. Note that envs is a zip format, and that is basically because of encryption as a layer of security. Once deployment starts, the user will be prompted with input to decrypt environment variables to be used.
- The repositories directory is a temporary directory and will be cleaned up once deployment is completed. That directory contains all cloned repositories to read data from and build Docker images for the current deployment.
- scripts directory contains all necessary scripts to automate deployment. We will talk about each of those separately.
Now that we have moved these out of the way, we can proceed with each step where we will explain everything in a bit more detail.
Initiate Docker Swarm Cluster
Okay, so first things first, you need to make sure you have Docker installed, and once that is completed, we need to initialize the docker swarm cluster in our remote server. The command to do it is quite straightforward:
sudo docker swarm init
Also, type sudo password (remember from part 1, we don't allow usage of docker without sudo). Once you have done that, you will get something like this message:
Swarm initialized: current node ({**node id**}) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join --token {**some token**} {**some port**}
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
You should copy this text and save it somewhere safe if by any means we need to add more remote servers to the cluster. But for now, we don't need that, we have this node as the manager, namely, this server is the manager and can make decisions on how many instances/replicas can be in worker nodes (basically nodes that listen blindly to manager). As we only have one server, in that case, manager manages itself.
As we have initialized the Swarm node in our remote server, we can now proceed with deployment, and primarily, you should start thinking about what services you have in docker-compose, how many replicas you need, what is the strategy for those replicas (for example, if something goes wrong when updating version). These questions and answers vary based on the project and separate requirements. I will show you my example of Rust service, for development and production environments.
This is an example of a Rust service for development and doesn't have any deployment strategy:
api:
build:
context: ../api
ports:
- 3001:3001
volumes:
- ../api/migrations:/usr/api/migrations
- ../api/src:/usr/api/src
env_file:
- ../api/.env
depends_on:
db: // Postgres service named "db"
condition: service_healthy
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:3001/api/health"]
interval: 10s
timeout: 10s
retries: 5
start_period: 30s
Now here is the version for production, with a deployment section for Docker Swarm:
api:
image: "api:0.0.${API_VERSION}"
restart: always
env_file:
- ./envs/api.env
healthcheck:
test: ["CMD", "/health_check"]
interval: 10s
timeout: 10s
retries: 5
start_period: 5s
deploy:
mode: replicated
replicas: 2
update_config:
parallelism: 2
order: start-first
failure_action: rollback
networks:
- pg-network
- main-network
As you can see, now we have deploy config which determines how we want our service to behave when in production. Namely, we have a mode which is replicated and replicas which we stated as 2, which means that we want at any given moment, to have 2 running replicas of our API service. We have also added a strategy for replicas when being updated with newer versions, and we stated that we want to have these updated in parallel (two instances can be updated at the same time) where we want the new version to start before stopping old versions (both updated and non-updated version can be running at the same time until the new version is considered healthy and Swarm doesn't shut down previous version). We also stated that we want to roll back to the previous version if something goes wrong. For more information about options to suit your specific project needs, visit this page and see the configs that might suit you.
Note the part ${API_VERSION}
in the image key. We will discuss that part in detail in the section about transferring containers to remote servers.
Note also that there is no depends_on key in Swarm mode because it is not supported by the Swarm cluster. It is considered that the db service will always be up and running in the cluster.
Now that we have clarified the difference between the Swarm cluster and pure docker-compose, we can proceed with the actual deployment script.
Deploy.sh
Let's begin with the actual deployment script, its steps, and flow. Contents of the deploy.sh file are:
# Get the directory of the current script
script_dir=$(dirname "$0")
# Clone repositories
"$script_dir/scripts/clone_repos.sh"
cd "$(pwd)"
# Write new versions
"$script_dir/scripts/write_versions.sh"
cd "$(pwd)"
# Build docker images
"$script_dir/scripts/build_containers.sh"
cd "$(pwd)"
"$script_dir/scripts/transfer_containers.sh"
cd "$(pwd)"
"$script_dir/scripts/start_containers.sh"
cd "$(pwd)"
"$script_dir/scripts/cleanup.sh"
Now, let's go step by step through this script and explain what all of these scripts have as their content and purpose.
Cloning repositories
Command from deploy.sh:
"$script_dir/scripts/clone_repos.sh"
The purpose of this script is to clone all repositories listed in input-data txt files and make them available in repositories directory. Contents of this script are the following:
# Get the directory of the current script
script_dir=$(dirname "$0")
# Clone repositories
source "$script_dir/../input-data/repositories.txt"
source "$script_dir/../input-data/repo-list.txt"
if [ -d "$script_dir/../repositories" ]; then
rm -rf "$script_dir/../repositories"
echo "All files in repositories have been removed."
else
mkdir -p "$script_dir/../repositories"
fi
# Loop through each variable
for var in "${repositories[@]}"; do
# Get the value of the current variable
value="${!var}"
REPO_NAME=$(basename -s .git "$value")
git clone "$value" "$script_dir/../repositories/$REPO_NAME"
done
Notice these two lines in clone_repos.sh
:
source "$script_dir/../input-data/repositories.txt"
source "$script_dir/../input-data/repo-list.txt"
These lines represent the txt files that we will source as our variables. Contents of these files are:
repositories.txt
API_REPO={URL to your repository}
repo-list.txt
repositories=("API_REPO")
URLs from repositories.txt are used to clone those repositories to repositories directory. Note that you should have ssh configured or oauth2 and include the key in the URL to work properly without a username/password.
repo-list.txt basically follows repositories.txt contents. It is made solely for convenience to loop through repositories while cloning them in the repositories directory.
Now that we have all repositories cloned, we can proceed to the next step.
Writing versions
Command from deploy.sh:
"$script_dir/scripts/write_versions.sh"
The aim of this script is straightforward. Go into each cloned repository, get its version, and write it down in output/versions.txt file (which will be used on the remote server). The version is determined based on the number of commits. For this flow, the accepted and most optimal versioning of Docker images is basically by changing the path version of the built image to be equal to the total number of commits to the main branch in a git repository. That way, we have an automatic versioning system to the latest code change from git.
Contents of this script are the following:
script_dir="$(pwd)/scripts"
repositories_path="$script_dir/../repositories"
output_file="$script_dir/../output/versions.txt"
if [ -f "$output_file" ]; then
# Remove the file
rm "$output_file"
fi
# We are going to a directory that is cloned from the previous script
cd "$repositories_path/api"
# This command gets us the total number of commits for the main
# branch
API_VERSION=$(git rev-list --count main)
echo "API_VERSION: $API_VERSION"
# Make output directory and write versions.txt file
mkdir -p "$script_dir/../output"
touch "$output_file"
# Write API version to a txt file
echo "API_VERSION=$API_VERSION" >> $output_file
# Version COMPOSE file
echo "COMPOSE_VERSION=$(($(date +%s%N)/1000000))" >> $output_file
echo "SWARM_STACK_NAME=demo_stack" >> $output_file
unzip "$script_dir/../input-data/envs.zip" -d "$script_dir/../output/"
From the code above, we can see that, apart from writing the API version in this case in the versions.txt file, we are also writing the compose file version and Swarm stack name, as well as copying all env variables to the output directory (for easier sync with remote server). The reason we are writing the compose file version is in case we change something to our compose file, we want to deploy to the Swarm cluster based on the latest version, while if we had a static name, we could deploy with the previous compose version, which we are trying to avoid. The Swarm stack name is something we should have control over as if we ever decide to change it, we should be able to do that from deployment script.
Now that we have the latest versions of our code and compose file written down, we can proceed with the actual build of our Docker images.
Building Docker images
Command from deploy.sh:
"$script_dir/scripts/build_containers.sh"
This script only builds images based on the specified repository and saves them to the output/images directory as .tar file. The goal of saving these images in the specified output directory is to transfer those as .tar files to our remote server and load them from there. Take a look at saving images and loading images with Docker. The contents of this script are the following for our API image and repository:
script_dir="$(pwd)/scripts"
source "$script_dir/../output/versions.txt"
repositories_path="$script_dir/../repositories"
# API
cd "$repositories_path/psych-api"
API_IMAGE="api:0.0.$API_VERSION"
docker build -f Dockerfile.prod -t $API_IMAGE .
cd "$script_dir"
images_path="$(pwd)/../output/images"
if [ -d "$images_path" ]; then
rm -f "$images_path"/*
echo "All files in $images_path have been removed."
else
mkdir -p "$images_path"
fi
docker save -o $(pwd)/../output/images/api.tar $API_IMAGE
If you have more images to build, just repeat the code from the # API
line. Also, note that we are using $API_VERSION
to version our Docker image in the script. That way we will be able to instruct our compose file to use the proper image version, once loaded on the server.
Transfer the output directory to the remote server
Command from deploy.sh:
"$script_dir/scripts/transfer_containers.sh"
You might be wondering why we are transferring this image (or images if you have more applications) to a remote server and not to some container registry like DockerHub or Digital Ocean registry. The answer is simple and quite expected, and it's... you guessed it... because of costs. Now one might argue that container registries are quite cheap and there are free tiers available, and yes, that is correct, but since we are at the beginning of our application, we want to have full control and have essentially cheap services. This way, by transferring images to a remote server and loading them there, we essentially skipped the part where we have container registries and the necessity to pay for those (or to worry if we will surpass the free tier). This way, we control everything, and I would always advocate this way if you are using only one server with a small team and not yet profitable application.
Contents of this script are the following:
script_dir="$(pwd)/scripts"
source "$script_dir/../input-data/credentials.txt"
output_folder="$script_dir/../output"
input_folder="$script_dir/../input-data"
source "$output_folder/versions.txt"
cp "$script_dir/../../docker-compose.prod.yaml" "$output_folder/docker-compose.$COMPOSE_VERSION.yaml"
rsync -arvzP --rsh="ssh -p $remote_port" "$output_folder/" "$script_dir/../input-data/credentials.txt" "$remote_username@$remote_ip:/app"
Note that in this script, we are using a previously written versions.txt file from the output directory. We are using those versions to use $COMPOSE_VERSION
that we have already written down there. That way, our current docker-compose will be unique on the server that we can start.
We are also using the rsync command to transfer everything written in the output directory to /app on the server, as well as credentials.txt (because in that directory we have written sudo password so we can automatically run commands in remote server without being prompted to write sudo password; remember, we need sudo to run Docker commands on the remote server). For more information about the rsync command and flags used visit this page. Note that our output directory now has the following contents:
- envs
- api.env
- images
- api.tar
- docker-compose.1721118119140.yaml (where the number is version/timestamp)
- versions.txt
If you have 2FA set on the remote server (as we do), remember that at this moment you will be prompted to enter your one-time code from the application.
Okay, so we have transferred everything that we need to the server. It is time to connect to the server and deploy a new version of our image.
Start containers in production
Command from deploy.sh:
"$script_dir/scripts/start_containers.sh"
The goal of this script is to connect to a remote server and execute commands to load all Docker images on the server and deploy them to the Swarm cluster. Contents of this script are as follows:
script_dir="$(pwd)/scripts"
# Load all variables from credentials.txt
source "$script_dir/../input-data/credentials.txt"
# Connect to the remote server and run the bash script defined
# below
ssh $remote_username@$remote_ip -p $remote_port "bash -s" <<
# From here, the server bash script begins
'ENDSSH'
# Go to /app directory on the server
cd /app
# Load variables that were previously transferred to the server
source "./versions.txt"
source "./credentials.txt"
# Define a path to the images directory
IMAGE_DIR="/app/images"
# Loop through each .tar file in the directory
for tar_file in "$IMAGE_DIR"/*.tar; do
if [ -f "$tar_file" ]; then
echo "Loading image from $tar_file..."
# Load the .tar file as an image on the server
echo "$pass" | sudo -S docker load -i "$tar_file"
else
echo "No .tar files found in $IMAGE_DIR"
fi
done
echo "All images have been loaded."
# Define API_VERSION to be used in docker-compose
export API_VERSION="${API_VERSION}"
# This is where we deploy all versions to Swarm cluster
echo "$pass" | sudo -S -E docker stack deploy --prune --detach=false -c "docker-compose.$COMPOSE_VERSION.yaml" "$SWARM_STACK_NAME"
# Removing all the files from the /app directory
rm -f docker-compose.*.yaml
rm -rf "/app/images"
rm -rf "/app/envs"
rm "/app/credentials.txt"
rm "/app/versions.txt"
# Wait to consolidate deployment before continuing
sleep 60
# Prune all that is not used (previous versions of images and
# volumes) so we clean after our deployment and do not bloat
# server with unused image versions and volumes
echo "$pass" | sudo -S -E docker system prune -f
echo "$pass" | sudo -S -E docker image prune -a -f
echo "$pass" | sudo -S -E docker volume prune -a -f
# End executing remote server script
ENDSSH
Note that all parts of the script that are echo "$pass" |
are us automating writing the sudo password before executing the docker command with sudo. That $pass
variable is sourced from credentials.txt.
As you can see, the goal is to simply execute the command to deploy everything to the Swarm cluster by reading our current version of the compose file. Understand that compose file as instructions for Swarm cluster to re-deploy stuff that is changed. Note that the command export API_VERSION="${API_VERSION}"
is us introducing an environment variable to have in the compose file before Swarm reads it. That variable is working in conjunction with this line in the compose file image: "api:0.0.${API_VERSION}"
. That way we will always read the latest written version and deploy it to the Swarm stack.
We are using --detach=false
because we want to block our current terminal until deployment is completed. Note that we have removed all files from the /app directory. It is, after cleaning, literally blank, including envs. You might wonder why is that, and here is the thing, environment variables that are used by our containers in the Swarm cluster are loaded in memory, and we don't need to have them written in a file on the server. Therefore, as soon as we deploy to Swarm stack with compose instructions and effectively load env variables from the file for the first time, we are free to delete those afterward (yes, even if Swarm needs to restart the containers in production or start more, those env variables will remain in place in memory and are not needed to be written in a file).
Now that deployment is successful, and we see changes in production, it is time to run one last script before wrapping up the deployment process.
Clean everything from the current deployment
Command from deploy.sh:
"$script_dir/scripts/cleanup.sh"
This is the last command that we are running in the deployment process, and it is one of the simpler ones. Contents of this script are as follows:
script_dir="$(pwd)/scripts"
rm -rf "$script_dir/../output"
rm -rf "$script_dir/../repositories"
docker volume prune -a -f
docker image prune -a -f
We are removing output and repositories directories that were previously generated by the current deployment. Afterward, we prune all docker-related stuff from our local machine (so we don't bloat our memory with multiple deployments, this way we are keeping it clean). Note that we are pruning all with flag -a in this case, but you can specify exactly which images and volumes you want to remove based on your current project setup.
After cleaning up, we have completed our automated deployment process. Now, if we type ./deploy.sh
from deploy directory, it should all work as a charm (well, it should, at least on my Ubuntu 22.04 machine).
Wrapping up
We have successfully deployed our API application using the Docker Swarm cluster and shell scripting on our remote server. The benefits of this approach, and why I am advocating it, are mainly costs, as this way we are offloading everything to our local machine, no need to pay for registries and CI/CD Actions (or worrying if we'll surpass free tier) as we can do with our machine whatever and however long we want. Another big benefit is that we are in full control as to who can deploy and who can contribute. So, if we start adding people to our team when our application starts generating revenue, we can decide that our machine can be the only one that can connect to a remote server and complete deployment, while others can only contribute to Git and write code. Also, we can decide on multiple machines to deploy, it is up to us, and that is the main point, we want to decide, and we don't have to pay for every decision. Once our application starts to scale to multiple containers, then we can think about more complex tools like Kubernetes, or paying for container registry. For starters, this is enough, let's not over-engineer it (more shell scripting, please!) and overpay without practical necessity.
As mentioned at the beginning of this article, stay tuned for the third and final part (maybe) of this series, where we will tackle both Postgres and MySQL databases' automatic backing up to our local hard drive, and restoring backed-up data (more shell scripting, of course!).
Let me know in the comments if I should share the GitHub repository of the code presented here, or if there is something that is not properly explained and clarified. Your feedback means a lot.
Featured ones: