Logo

dev-resources.site

for different kinds of informations.

Apache Kafka

Published at
5/11/2024
Categories
apachekafka
Author
sachithmayantha
Categories
1 categories in total
apachekafka
open
Author
15 person written this
sachithmayantha
open
Apache Kafka

Apache Kafka is a distributed event streaming platform designed to handle real-time data feeds with high throughput and fault tolerance.

Note - Event streaming is continuous streams of event and process them as soon as a change happens.

To understand how Kafka works internally, let's break it down into its key components and their interactions

Producers - Producers are responsible for publishing data to Kafka. When a producer sends a message to Kafka, it specifies a topic and optionally a partition key. Kafka uses the partition key to determine which partition the message should be written to. In the event that no key is provided during the message production process, the producers will automatically transmit the message round-robin way, ensuring that each partition receives a message.

Consumers - Consumers read data from Kafka topics. They subscribe to one or more topics and receive messages when published. The consumers read the messages from the oldest to the newest using the offset (unique identifier). Kafka allows multiple consumers to read from the same topic in parallel, enabling horizontal scalability.

Basic Architecture https://www.cloudkarafka.com/blog/2016-11-30-part1-kafka-for-beginners-what-is-apache-kafka.html

Topics - Topic is type of data stream. Messages in Kafka is organized into topics, which are essentially feeds of messages. Each topic can have multiple partitions, which allows for parallelism and scalability. Topics are also highly fault-tolerant and can be replicated across multiple Kafka brokers.

Partitioning - When a message is produced to Kafka, it is assigned to a partition based on the specified partition key (if provided) or using a partitioning algorithm. Kafka guarantees that messages with the same partition key will always go to the same partition, ensuring order within a partition.

Brokers - Kafka brokers are the servers responsible for storing and managing the topic partitions. They are the core components of the Kafka cluster. Each broker can handle multiple partitions across different topics.

Basic Architecture https://www.cloudkarafka.com/blog/2016-11-30-part1-kafka-for-beginners-what-is-apache-kafka.html

Replication - Kafka provides fault tolerance through replication. Each partition can be replicated across multiple brokers. When a partition is replicated, one broker is designated as the leader and the others are followers. There can only be one leader per partition in a broker, and each partition of a topic has a leader. Their replicas will just sync the data from leader, and the leader is the only one who gets the messages. Because of the replicas, it will guarantee that a broker's data won't be lost even in the event of a broker failure. When a leader goes down, Zookeeper will automatically choose a leader.

ZooKeeper - Historically, Kafka used Apache ZooKeeper for cluster coordination, leader election, and metadata management. However, recent versions of Kafka have been moving away from ZooKeeper dependency towards internal metadata management.

Offsets - Every partition will contain a stream of data as well as ordered data. Each message within a partition will have an incremental ID, which represents the message's position within the partition. This particular ID is referred to as an offset.

Consumer Groups - Consumers are organized into consumer groups. Each message in a topic partition is consumed by only one member of a consumer group. This allows multiple consumers to work together to process a large volume of data in parallel.

Commit Logs - Kafka stores messages in a distributed commit log. Each partition is essentially an ordered, immutable sequence of messages. Messages are appended to the end of the log and assigned a sequential offset.

Message Retention - Kafka allows configuring retention policies for topics, specifying how long messages should be retained or how much data should be retained. This allows Kafka to handle use cases ranging from real-time processing to data storage and replay.

Overall, Kafka's architecture is designed for high scalability, fault tolerance, and real-time processing of streaming data. It's built to handle massive volumes of data with low latency and high throughput, making it a popular choice for building data pipelines, real-time analytics, and event-driven architectures.

apachekafka Article's
30 articles in total
Favicon
Mastering Apache Kafka: A Complete Guide to the Heart of Real-Time Data Streaming
Favicon
AIM Weekly for 11/11/2024
Favicon
Apache Kafka: A Simple Guide to Messaging and Streaming
Favicon
Design a real-time data processing
Favicon
Building a Scalable Data Pipeline with Apache Kafka
Favicon
Building a Scalable Data Pipeline with Apache Kafka
Favicon
Implementing AI with Scikit-Learn and Kafka: A Complete Guide
Favicon
Understanding the Importance of Kafka in High-Volume Data Environments
Favicon
How can i stop my kafka consumer from consuming messages ?
Favicon
Getting Started with Apache Kafka: A Beginner's Guide to Distributed Event Streaming
Favicon
🚀 Apache Kafka Cluster Explained: Core Concepts and Architectures 🌐
Favicon
WarpStream Newsletter #5: Dealing with Rejection, Schema Validation, and Time Lag
Favicon
Dealing with rejection (in distributed systems)
Favicon
Apache Kafka on Amazon Linux EC2
Favicon
Announcing WarpStream Schema Validation
Favicon
The Kafka Metric You’re Not Using: Stop Counting Messages, Start Measuring Time
Favicon
WarpStream Newsletter #4: Data Pipelines, Zero Disks, BYOC and More
Favicon
Integrating Apache Kafka with Apache AGE for Real-Time Graph Processing
Favicon
Integrating Apache Kafka with Apache AGE for Real-Time Graph Processing
Favicon
Multiple Regions, Single Pane of Glass
Favicon
FLaNK-AIM: 20 May 2024 Weekly
Favicon
Secure by default: How WarpStream’s BYOC deployment model secures the most sensitive workloads
Favicon
Zero Disks is Better (for Kafka)
Favicon
FLaNK AI-April 22, 2024
Favicon
Pixel Federation Powers Mobile Analytics Platform with WarpStream, saves 83% over MSK
Favicon
FLaNK AI - 15 April 2024
Favicon
WarpStream Newsletter #3: Always Be Shipping
Favicon
Introducing WarpStream Managed Data Pipelines for BYOC Clusters
Favicon
Apache Kafka
Favicon
FLaNK-AIM Weekly 06 May 2024

Featured ones: