Logo

dev-resources.site

for different kinds of informations.

Mitigating disruption during Amazon EKS cluster upgrade with blue/green deployment

Published at
6/27/2024
Categories
aws
eks
upgrade
Author
haintkit
Categories
3 categories in total
aws
open
eks
open
upgrade
open
Author
8 person written this
haintkit
open
Mitigating disruption during Amazon EKS cluster upgrade with blue/green deployment

Co-author @coangha21

Table of Contents

  • In-place and blue/green upgrade strategies
  • Upgrade cluster process
    • Prerequisite
    • Update manifests
    • Bootstrap new cluster
    • Re-deploy add-ons and third-party tools with compatible version
    • Re-deploy workloads
    • Verify workloads
    • DNS switchover
  • Stateful workloads migration
  • Conclusion

Introduction
Upgrading your Amazon EKS cluster version is necessary for security, performance optimization, new features, and long-term support. Nowadays, Amazon EKS introduces extended support plan for Kubernetes version that will cost you remarkably. The upgrade is never a easy game and can feel like a business continuity nightmare. Some may feel tempted to postpone the inevitable. In this blog, we will walk you through our upgrade process using the Blue/Green deployment strategy.

We’ll demonstrate this on an EKS cluster with EC2 instances as worker nodes. This strategy can be also applied the same for Fargate, and we'll leverage the popular AWS Retail Store sample application to demonstrate the steps. For the code, head over to the AWS repository. By the end of this blog, you'll have a clear understanding of what an EKS upgrade entails and how to navigate it with confidence.

In-Place vs. Blue/Green upgrade strategies
Upgrading a cluster can be a balance between cost and risk. There are two common strategies that be widely used: in-place and blue/green upgrades.

  • In-Place Upgrades: Simpler and more cost-effective. This strategy will modify your existing cluster directly. While this minimizes resource usage, it carries the risk of downtime and limits upgrades to single versions at a time. Additionally, rolling back requires extra steps.
  • Blue/Green Upgrades: This strategy prioritizes zero downtime by creating a brand new, upgraded cluster (the "green" environment) alongside the existing one (the "blue" environment). Here, you can migrate workloads individually, enabling upgrades across multiple versions. However, blue/green deployment requires managing two clusters simultaneously, which can be costly and strain regional resource capacity. Additionally, API endpoints and authentication methods change, requiring updates to tools like kubectl and CI/CD pipelines.

In-place upgrade method is ideal for cost-sensitive scenarios where downtime is less critical or where the two versions don’t have breaking changes. For situations demanding high availability or the ability to jump multiple versions, the blue/green strategy provides a safer solution but is also more resource-intensive and costly. Thoroughly consider your specific needs, resource constraints, and infra cost to determine the best suitable upgrade method for your cluster.

Upgrade cluster process

1. Prerequisite

  • Explore your cluster: Before diving into your cluster upgrade, system inventory is a mandatory step in order to have insight of what is running in your cluster. Note down your cluster version, add-on versions, and the number of services and applications running. This intel helps you choose the right upgrade strategy, identify potential compatibility issues, and plan a smooth migration for all your workloads. It's like gathering intel before a mission - the more you know, the smoother the upgrade!

The current cluster’s version is 1.24 and it is running on extended support.n

The current cluster’s version is 1.24 and it running on extended support

Image description

Currently 04 adds-on are running.

Image description

The cluster is using EC2 instances as worker nodes

Image description

Karpenter adds-on for node autoscaling.

Image description

Around 12 services found

Image description

The application UI

Image description

Deprecated APIs found by kubent

2. Update manifests
When we have deprecated APIs in our hands. For next steps, we need to update those API version by manually or tools such as “kubectl convert” that actually depends on number of deprecated APIs. We recommend you to update the API version manually to avoid any unforeseen error. For example, based on above kubent result, we see that our HPA apiVersion will be removed since version 1.26. This is original HPA manifest in the current EKS cluster v1.24 and updated HPA manifest in new version, respectively:

Image description

Old version

Image description

New version

3. Bootstrap new cluster
There are some typical options for a new Amazon EKS cluster deployment with your desired Kubernetes version such as AWS Management Console, eksctl tool, or Terraform. In this blog, we have deployed a new cluster, namely "green-eks", using version v1.29 and EC2 worker nodes.

Image description

New EKS cluster

Image description

EC2 worker nodes

4. Re-deploy add-ons and third-party tools with compatible version
Once the "green-eks" cluster is ready, we've re-deployed required custom add-ons and third-party tools. It's crucial to ensure those adds-on and third-party tools version are compatible with new cluster. For instance, this document shows us the suggested version of the Amazon VPC CNI add-on to use for each cluster version.

Image description

EKS adds-on in new cluster

5. Re-deploy workloads
Now that the foundation is laid, we can begin redeploying our workloads to the new "green-eks" cluster.

Image description

Application deployment in new cluster

6. Verify workloads
Once our workloads are deployed successfully in the "green-eks" cluster, it's verification time! The specific tests you run will depend on your application development process. You might opt for smoke test, integration test, manual test, or even a simple UI check like we did in this blog for demo purpose only. The key purpose is to ensure everything functions as intended in the new environment.

Image description

Application in new cluster

We also would check EKS adds-on operation. For example, Karpenter works well by scaling node as expected.

Image description

Karpenter deployment logs

7. DNS Switchover
When application is ready to serve the client requests, the final step is to switch traffic over to the "green-eks" cluster. We achieved this by updating our DNS records in DNS management such as Amazon Route 53 or any other DNS provider. Amazon Route 53 provides weighted routing policy, so we can initially direct a small percentage of users to the new cluster. This allows us perform a staged rollout and verify everything functions smoothly before migrating all traffic.

Image description

Weighted routing policy (source)

Stateful workloads migration

During workload deployments to new Kubernetes clusters, specific considerations arise for stateful workloads. These workloads, such as Solr databases or monitoring stacks like Prometheus and Grafana, require data persistence and careful migration strategies. One proven and reliable migration approach for ensuring data integrity is the backup and restore method. We shared our experience in Solr database migration between EKS cluster in previous blog. The blog serves as a reference guide for migrating your stateful workloads.

Conclusion
By leveraging the Blue/Green deployment strategy, we've successfully navigated our EKS upgrade with minimal disruption. This approach offers several benefits:

  • Reduced Downtime: Since you maintain a fully functional "blue" cluster while deploying the upgrade on "green," user traffic experiences minimal interruption.
  • Phased Rollout: Weighted routing policy with Amazon Route 53 allows for a staged rollout, letting you test the new cluster with a small percentage of users before fully traffic migration.
  • Rollback: If any issues arise in the new environment, you can easily switch traffic back to the "blue" cluster with minimum overhead.

This blog provides a high-level guideline for EKS upgrade process using blue/green deployment to mitigate system disruption. Remember to tailor the specific steps to your application and infrastructure. Through a well-prepared planning and execution, blue/green deployment can make your EKS upgrade a breeze!

upgrade Article's
30 articles in total
Favicon
Upgrading Grails from Version 3.1.9 to 5.3.6
Favicon
OpenBSD 7.5 を 7.6 へ アップグレード
Favicon
The Importance of Upgrading to Magento 2
Favicon
OpenBSD Upgrade 7.5 to 7.6
Favicon
Hyva release version: 1.3.10 for performance and User experience
Favicon
อัพเดทเวอร์ชั่น Container runtime Colima
Favicon
Unleashing the Power of Oracle: The Advantages of Transitioning from Oracle EBS 12.1 to 12.2
Favicon
Upgrading to Ladybug 🐞, led to loads of bugs 🐛! (flutter)
Favicon
Mastering the Oracle EBS 12.2.13 Upgrade: Effective Testing Strategies
Favicon
TYPO3 v13.3 - Feature Freeze Fun
Favicon
Gradle upgrade
Favicon
OpenBSD 7.4 を 7.5 へ アップグレード
Favicon
PostgreSQL on OpenBSD: Upgrade 15 to 16 with pg_upgrade
Favicon
OpenBSD Upgrade 7.4 to 7.5
Favicon
Provide Odoo 17 transition with expert maintenance and customization
Favicon
Navigating the Cloud: My Journey through AWS Certification
Favicon
Mitigating disruption during Amazon EKS cluster upgrade with blue/green deployment
Favicon
Error after upgrade on Ubuntu 24.04 "Oh no! Something went wrong"
Favicon
Does Centos 6.9 support postgres 14 ?
Favicon
Diving Into Testing Strategies for Oracle EBS Upgrade
Favicon
Oracle EBS Upgrade: Accelerate It With Test Automation
Favicon
Ensure Seamless Oracle EBS Upgrade With Test Automation
Favicon
From Test Plans to Success: Mastering Oracle EBS Upgrade
Favicon
OpenBSD 7.3 を 7.4 へ アップグレード
Favicon
Ruby on Rails 4.2.X upgrade issue with meta_request — undefined method normalize_key
Favicon
How to Upgrade Kafka from 1.1.1 with Zero-Downtime: An Applicable Approach
Favicon
Why do We Need to Upgrade the Odoo 16 to 17?
Favicon
What Are the Key Advantages of Upgrading to Oracle R12
Favicon
Angular 17 Upgrade Guide with SSR
Favicon
Jenkins Upgrade from 2.1x to 2.4x

Featured ones: