Logo

dev-resources.site

for different kinds of informations.

Apache Kafka Connect Usage Patterns

Published at
1/30/2021
Categories
apachekafka
kafkaconnect
Author
rueedlinger
Categories
2 categories in total
apachekafka
open
kafkaconnect
open
Author
11 person written this
rueedlinger
open
Apache Kafka Connect Usage Patterns

Kafka Connect is a tool for streaming data between Apache Kafka and other systems like Oracle, DB2, JMS, Elasticsearch, MongoDB, etc. Teams can configure connectors that move large collections of data in and out of Kafka. As Kafka Connect user you donโ€™t have to write any piece of software when there is an existing connector implementation for your system. Depending on your load profile you can run multiple Connect workers which build an Connect cluster.

I had recently an interesting discussion how teams can or should use Apache Kafka Connect. We came up with two usage patterns for Apache Kafka Connect:

  • Shared infrastructure - All teams share the same Kafka Connect cluster.
  • โ€œMicroserviceโ€ or shared-nothing architecture - Every team has their own Kafka Connect cluster.

usage patterns

Note : I assume that you will run Apache Kafka Connect in distributed mode. This provides scalability and automatic fault tolerance for Kafka Connect.

Shared Infrastructure Usage Pattern

In this usage pattern the Kafka Connect cluster is shared between multiple teams and the platform team is responsible to run the cluster. This means that the resources (memory, logs, configurations, etc.) and runtime (JARโ€™s) are shared between different teams.

When you use the shared infrastructure usage pattern you have to consider the following topics:

Responsibilities:

  • Who gets notified when a connector is failing?
  • Who is responsible in fixing connector failures?
  • How to distinguish between infrastructure problems (memory, connectivity, etc.) and connector problems (schema / data mismatch, configuration errors, etc.)?

Boundaries / Isolation:

  • How do you enforce authentication, authorization and role-based access control?
  • How to ensure that teams can only deploy or modify their own connectors?
  • How to secure the access to sensitive configuration settings like credentials?

Coordination:

  • How to do you coordinate patches or rollouts of new versions with all the teams?
  • Can a team stop the rollout when there are some breaking changes?

โ€œMicroserviceโ€ or Shared-nothing Architecture Usage Pattern

In this usage pattern the platform team provides the right tools for the teams to to deploy and run a Kafka Connect cluster. Here we have clear boundaries between the teams and clear responsibilities.

With the microservice usage pattern you have to to consider the following topics:

Operational overhead:

  • Can you live with the operational overhead when every team runs their own Kafka Connect cluster?
  • Should you organize cluster also by domain or functionality?

Skill / Tools:

  • Does your teams have the right skills to run and operate Kafka Connect?
  • What are the right tools to facilitate the daily life with Kafka Connect?
  • How to automate rollouts and updates?

Conclusion

Shared Infrastructure has the advantages that the team does not have to care how to operate and run Kafka Connect. The biggest issue is that all teams share the same runtime and resources. This increases the complexity regarding security and responsibilities between the teams.

With the microservice usage patterns itโ€™s clear who is responsible and to blame when a error occurs (You build it, you run it!). The main concerns are that your team needs the right skills to run Kafka Connect and the operational overhead when every team runs their own Kafka Connect cluster.

We started with the shared infrastructure usage pattern and ended up with the microservice usage pattern. You should not underestimate the effort in provide the right tools and teach teams how they can run and operate Kafka Connect by themself.

kafkaconnect Article's
30 articles in total
Favicon
Kafka Connect: FileStreamSourceConnector in distributed mode
Favicon
What is Kafka Connect?
Favicon
Publish PostgresSQL Data Changes to React with KsqlDB and MQTT
Favicon
Empowering Your Kafka Connectors: A Guide to Connector Guardian
Favicon
Running Debezium On Kubernetes
Favicon
Constant Lag in CDC Pipeline (JDBC Sink Connector)
Favicon
Kafka Connect sink to OpenSearch/ElasticSearch: how to sink unix timestamps
Favicon
Kafka 2 CockroachDB via JDBC Sink Connector Blueprint
Favicon
KSQL with authenticated kafka connect
Favicon
8 tips to speed up Apache Kafkaยฎ Connect development
Favicon
Showcasing Change Data Capture with Debezium and Kafka
Favicon
Use your own connector with Twitter and Aiven for Apache Kafkaยฎ
Favicon
Manage Apache Kafka Connect connectors with kcctl
Favicon
Loading CSV data into Confluent Cloud using the FilePulse connector
Favicon
Using Kafka Connect JDBC Source: a PostgreSQL example
Favicon
Kafka Connect JDBC Sink deep-dive: Working with Primary Keys
Favicon
Kafka Connect: The Magic Behind Mux Data Realtime Exports
Favicon
An Overview About the Different Kafka Connect Plugins
Favicon
Heroku Error - H10 App Crashed
Favicon
Apache Kafka Connect Usage Patterns
Favicon
Vinted Search Scaling Chapter 1: Indexing
Favicon
Running a self-managed Kafka Connect worker for Confluent Cloud
Favicon
Streaming data into Kafka S01/E04 โ€” Parsing log files using Grok Expressions
Favicon
Kafka Connect - Deep Dive into Single Message Transforms
Favicon
๐ŸŽ„ Twelve Days of SMT ๐ŸŽ„ - Day 12: Community Transformations
Favicon
๐ŸŽ„ Twelve Days of SMT ๐ŸŽ„ - Day 11: Predicate and Filter
Favicon
๐ŸŽ„ Twelve Days of SMT ๐ŸŽ„ - Day 10: ReplaceField
Favicon
๐ŸŽ„ Twelve Days of SMT ๐ŸŽ„ - Day 9: Cast
Favicon
๐ŸŽ„ Twelve Days of SMT ๐ŸŽ„ - Day 8: TimestampConverter
Favicon
๐ŸŽ„ Twelve Days of SMT ๐ŸŽ„ - Day 7: TimestampRouter

Featured ones: