Logo

dev-resources.site

for different kinds of informations.

How to Upgrade Kafka from 1.1.1 with Zero-Downtime: An Applicable Approach

Published at
3/29/2024
Categories
kafka
devops
upgrade
dataplatform
Author
anvaari
Categories
4 categories in total
kafka
open
devops
open
upgrade
open
dataplatform
open
Author
7 person written this
anvaari
open
How to Upgrade Kafka from 1.1.1 with Zero-Downtime: An Applicable Approach

As a data engineer or, more specifically, data platform engineer, a service with high dependency may be handed over to you. Upgrading such a service is a terrifying process. Suppose that service is Kafka, and it's the main component of your data stack at the company. However, the solution isn't ignoring the complexity because every bug fix or new feature can save you from downtime and help you increase the performance of the services. So, what is the solution? How can we ensure all services that depend on Kafka work fine after the upgrade? In this post, I will share my experience through this process.

Main concerns

When we talk about services like Kafka, we know many producers and consumers are in between. So, what happens to them after an upgrade? Do they continue to produce/consume? What about the schema registry and other components that depend on Kafka? So, one of the main concerns is the healthiness of the dependent element.
Also, we want to upgrade Kafka for two significant versions; how should we check deprecated configs? Should I read all the changelogs one by one? There is a better approach that minimizes the time spent and the probability of downtime.

Proposed approach

Honestly, every time I think about Docker, I wonder what a beautiful tool this is :D You know? Amazingly, you can independently set up a whole stack in a separate network with tools like docker-compose.

A better approach is to use Docker to simulate production services in a safe environment. We can set up a whole stack with the same configs but fewer resources, simulate upgrades, and check each component's behavior.

Applied approach for Kafka

To simulate the upgrade process for Kafka, I am supposed to create a stack including these components:

  1. Zookeeper Instances -> Coordinator for Kafka Cluster
  2. Kafka Instances -> Main component
  3. Schema Registry -> Persist schema of produced messages
  4. Kafka UI -> Monitor Kafka cluster and see incoming messages in topics
  5. Producers -> Python code to produce data into Kafka topic in Avro format.
  6. Consumer -> Python code to consume data produced by Producer.
  7. Clickhouse -> Analytical database to store data coming from Kafka
  8. Postgres -> OLTP database stores transactional data
  9. Postgres Producer -> Python code, which Inserts one record every 0.1 seconds into the Postgres database
  10. Debezium -> Capture each change in Postgres and send it to the corresponding Kafka topic in Avro format.

Now, it's time to prepare the appropriate docker-compose.yaml

Implement detail

Some Extra Containers

  • kafka-setup-user

    • It uses the same image asĀ Kafka; it runs afterĀ kafka1Ā becomes healthy. Some users are created after this container runs (exit with status 0). See themĀ here
    • It needs one Kafka broker and also a Zookeeper cluster becauseĀ SCRAM-SHAĀ needs to persist on Zookeeper.
  • kafka-setup-topic

    • It uses the same image asĀ KafkaĀ and creates some topics. See the listĀ here
  • submit-connector

    • It useĀ curl imageĀ to submitĀ this connectorĀ intoĀ Debezium. The connector captures the changes inĀ Postgres, sends events toĀ Kafka, and thenĀ ClickhouseĀ consumes the data into appropriate tables.

Some Extra Notes:

  • The version of all containers defined in theĀ .envĀ file. You can change them from this file.

  • Container dependencies are defined accurately. So, if one container depends on another to come up, appropriateĀ healthcheckĀ andĀ depends_onĀ conditions are defined for it.

  • If you take a look at theĀ healthcheckĀ of containers, for example, kafka, you see this command:

   (echo > /dev/tcp/kafka1/9092) &>/dev/null && exit 0 || exit 1
Enter fullscreen mode Exit fullscreen mode

This shell script helps check the TCP port in a container withoutĀ telnet.

Simulation Process

To run the simulation, you can follow these steps.

Result

All tests were successful. By successful, I mean the producer can still produce messages without errors, and consumers can consume messages without errors. No other criteria were investigated; you can define your metrics for this simulation. Only one problem was seen in this process.

Problems:

  1. In Setup Kafka User: java.lang.ClassNotFoundException: kafka.security.auth.SimpleAclAuthorizer occurred
    1. It deprecated after 2.4.0. See here
    2. Doc recommends to use kafka.security.authorizer.AclAuthorizer instead. It's fully compatible with deprecated class, so it was replaced in docker-compose and it worked

Conclusion:

  • As there is the official document for upgrading from any version to 3.6.1 (and another previous version), there is no obstacle in this process. Also, our test shows this process works, and we can upgrade our Kafka to whatever version we want.

Conclusion

This article is a suggestion for the best approach for upgrading highly dependent services. We talked about the details of implementing this process, and then, as we saw in the Result section, one problem was found before upgrading so we can upgrade our Kafka cluster seamlessly, with zero-downtime :)

upgrade Article's
30 articles in total
Favicon
Upgrading Grails from Version 3.1.9 to 5.3.6
Favicon
OpenBSD 7.5 悒 7.6 ćø ć‚¢ćƒƒćƒ—ć‚°ćƒ¬ćƒ¼ćƒ‰
Favicon
The Importance of Upgrading to Magento 2
Favicon
OpenBSD Upgrade 7.5 to 7.6
Favicon
Hyva release version: 1.3.10 for performance and User experience
Favicon
ąø­ąø±ąøžą¹€ąø”ąø—ą¹€ąø§ąø­ąø£ą¹ŒąøŠąø±ą¹ˆąø™ Container runtime Colima
Favicon
Unleashing the Power of Oracle: The Advantages of Transitioning from Oracle EBS 12.1 to 12.2
Favicon
Upgrading to Ladybug šŸž, led to loads of bugs šŸ›! (flutter)
Favicon
Mastering the Oracle EBS 12.2.13 Upgrade: Effective Testing Strategies
Favicon
TYPO3 v13.3 - Feature Freeze Fun
Favicon
Gradle upgrade
Favicon
OpenBSD 7.4 悒 7.5 ćø ć‚¢ćƒƒćƒ—ć‚°ćƒ¬ćƒ¼ćƒ‰
Favicon
PostgreSQL on OpenBSD: Upgrade 15 to 16 with pg_upgrade
Favicon
OpenBSD Upgrade 7.4 to 7.5
Favicon
Provide Odoo 17 transition with expert maintenance and customization
Favicon
Navigating the Cloud: My Journey through AWS Certification
Favicon
Mitigating disruption during Amazon EKS cluster upgrade with blue/green deployment
Favicon
Error after upgrade on Ubuntu 24.04 "Oh no! Something went wrong"
Favicon
Does Centos 6.9 support postgres 14 ?
Favicon
Diving Into Testing Strategies for Oracle EBS Upgrade
Favicon
Oracle EBS Upgrade: Accelerate It With Test Automation
Favicon
Ensure Seamless Oracle EBS Upgrade With Test Automation
Favicon
From Test Plans to Success: Mastering Oracle EBS Upgrade
Favicon
OpenBSD 7.3 悒 7.4 ćø ć‚¢ćƒƒćƒ—ć‚°ćƒ¬ćƒ¼ćƒ‰
Favicon
Ruby on Rails 4.2.X upgrade issue with meta_request ā€” undefined method normalize_key
Favicon
How to Upgrade Kafka from 1.1.1 with Zero-Downtime: An Applicable Approach
Favicon
Why do We Need to Upgrade the Odoo 16 to 17?
Favicon
What Are the Key Advantages of Upgrading to Oracle R12
Favicon
Angular 17 Upgrade Guide with SSR
Favicon
Jenkins Upgrade from 2.1x to 2.4x

Featured ones: