dev-resources.site
for different kinds of informations.
A Bite-Sized Journey into Kafka, Arrow, and Go
CandyFlow isn’t an actual product—it’s a playful concept to showcase what happens when you combine Apache Kafka (scalable streaming), Apache Arrow (ultra-fast in-memory columnar data), and Go (efficient microservices with concurrency). By using these three technologies together, you can build a lean yet incredibly powerful data pipeline that can handle tens of thousands of requests per second at sub-millisecond latencies.
1. Why Kafka + Arrow + Go?
-
Kafka:
- A bulletproof message broker that ingests massive volumes of data and streams it in real time.
-
Arrow:
- A columnar in-memory format, perfect for zero-copy reads and near-instant analytics/queries.
-
Go:
- Offers excellent concurrency performance and a lightweight approach for building HTTP endpoints and consumers.
Putting them in Docker Compose means you can spin up a working prototype with minimal overhead, then scale out if you need bigger volumes in production.
2. Under the Hood (Conceptually)
- Producer (Go) → Publishes JSON “candy price” updates to Kafka.
-
Consumer (Go + Arrow) → Reads from Kafka, appends each message into an Arrow-based table in memory, then exposes an HTTP endpoint (
/cheapest
, etc.) to handle user queries instantly. - topic-init Container → Creates the Kafka topic automatically on startup.
- Zookeeper & Kafka → Provide the robust messaging backbone.
CandyFlow is purely an illustrative name; the “candy price” angle is just for fun. In reality, you could track e-commerce prices, sensor data, or any streaming events that need real-time lookups.
3. The Performance Numbers
Using k6 load tests, we hammered the consumer endpoint (/cheapest
):
- Ramping from 1k RPS to 10k RPS.
- Achieved a p(95) latency of ~0.4–0.5 ms.
- Zero HTTP errors across millions of requests.
- Only rare outliers around 200 ms, likely due to minor GC/network blips.
This level of throughput and sub-millisecond latency is exceptional and shows how Arrow’s columnar structure + Go’s concurrency + Kafka’s streaming capabilities come together seamlessly.
4. Not a Product, but a Teaching Tool
Remember: CandyFlow is not a real candy-price aggregator. It’s an example designed to:
- Demonstrate the synergy of Kafka (for ingestion), Arrow (for in-memory performance), and Go (for concurrency and HTTP).
- Prove you can achieve near real-time queries (sub-ms) under heavy loads (thousands to tens of thousands RPS).
- Inspire you to apply this same concept to e-commerce price trackers, IoT sensor data streams, or real-time analytics.
5. Closing Thoughts
- Cost-Effective & Scalable: The Docker Compose approach is quick to launch and test. You can expand partitions/replicas for bigger use cases.
- Minimal Complexity: A few containers, a small amount of Go code, and a straightforward Arrow schema are all it takes.
- Impressive Performance: Sub-millisecond latencies at 10k+ RPS without throwing specialized hardware or monstrous clusters at the problem.
CandyFlow stands as a sweet demonstration of what’s possible with Kafka, Arrow, and Go—and hopefully sparks ideas for your own real-world streaming and analytics needs!
Featured ones: