Logo

dev-resources.site

for different kinds of informations.

How Pinterest uses Kafka for Long-Term Data Storage

Published at
1/15/2025
Categories
programming
devops
career
learning
Author
the_infinity
Author
12 person written this
the_infinity
open
How Pinterest uses Kafka for Long-Term Data Storage

I spent hours diving into this so you don’t have to!

Here is what I learned:

  • Pinterest doesn't store all data on Kafka brokers forever.
  • Older data is moved to a remote storage like Amazon S3.
  • They built a tool called Segment Uploader to automate this process.
  • The Segment Uploader periodically transfers older data from Kafka brokers to remote storage.
  • Segment Uploader runs as a sidecar alongside the Kafka broker.
  • They also developed a specialized Consumer Library to fetch data intelligently.
  • The library fetches old data directly from remote storage and new data from Kafka brokers.

By combining Kafka’s real-time capabilities with cost-efficient remote storage, Pinterest ensures scalability, reliability, and efficient long-term data management.


PS - I recently published an article on my free Newsletter covering this case study in-depth with visuals: https://designsystemsweekly.substack.com/p/how-pinterest-leverages-kafka-for

devops Article's
30 articles in total
DevOps bridges the gap between development and operations, emphasizing collaboration, automation, and continuous delivery in software development.
Favicon
Day 04: Docker Compose: Managing multi-container applications
Favicon
AWS Certification Syllabus [Updated 2025]
Favicon
Research DevOps metrics and KPIs
Favicon
Kafka server with SASL_OAUTHBEARER
Favicon
Introduction to Terraform: Revolutionizing Infrastructure as Code
Favicon
Amazon S3 vs. Glacier: Data Archival Explained
Favicon
Be sure to check out our new bug bounty platform!
Favicon
Làm thế nào để quản lý secrets hiệu quả trên nhiều nền tảng chỉ với một công cụ?
Favicon
Как создать свой VPN и получить доступ ко всему?
Favicon
Building a Weather Data Collection System with AWS S3 and OpenWeather API
Favicon
Terraform input validation
Favicon
NXP i.MX8MP Platform Porting Driver Tutorial
Favicon
Stop Worrying About EC2 Patching – Automate It Like a Pro!
Favicon
How Pinterest uses Kafka for Long-Term Data Storage
Favicon
Something You Didn't Know About AWS Availability Zones
Favicon
Advanced Load Balancing with Traefik: An Introduction to Progressive Delivery, Mirroring, Sticky Sessions, and Health Checks
Favicon
Psychotherapy Technology Advancements
Favicon
Any recommendations of open source asset inventory ?
Favicon
AIOps : Investigation par l’IA dans Kubernetes avec HolmesGPT, Ollama et RunPod …
Favicon
How to Solve Common Kubernetes Multi-Cluster Deployment Issues
Favicon
Power Up Your AWS Game: Create EC2 Instances, Install Apache, and Connect with PowerShell
Favicon
Effortless vCluster Management with Sveltos: An Event-Driven Approach
Favicon
Docker vs kubernetes
Favicon
🚀 Week 3 Recap: Learning in Public – Software Engineering with DevOps 🚀
Favicon
HashiCorp Vault Setup Guide for NEAR Protocol Accounts
Favicon
Mastering Kubernetes Storage: A Deep Dive into Persistent Volumes and Claims
Favicon
Configuring Public IP addresses in Azure
Favicon
SPL: a database language featuring easy writing and fast running
Favicon
Cloud computing can be confusing, but it doesn't have to be! ☁️🤔 In the latest episode of Cloud in List of Threes (CiLoTs), I’m serving up easy-to-digest (pun intended 🤭) explanations analogy to explain Regions, Availability Zones, and Edge Locations
Favicon
[Boost]

Featured ones: