dev-resources.site

for different kinds of informations.

How Pinterest uses Kafka for Long-Term Data Storage

Published at

1/15/2025

Categories

programming

devops

career

learning

Author

the_infinity

Main Article

https://dev.to/the_infinity/how-pinterest-uses-kafka-for-long-term-data-storage-1b18

Categories

4 categories in total

Author

12 person written this

How Pinterest uses Kafka for Long-Term Data Storage

I spent hours diving into this so you don’t have to!

Here is what I learned:

Pinterest doesn't store all data on Kafka brokers forever.
Older data is moved to a remote storage like Amazon S3.
They built a tool called Segment Uploader to automate this process.
The Segment Uploader periodically transfers older data from Kafka brokers to remote storage.
Segment Uploader runs as a sidecar alongside the Kafka broker.
They also developed a specialized Consumer Library to fetch data intelligently.
The library fetches old data directly from remote storage and new data from Kafka brokers.

By combining Kafka’s real-time capabilities with cost-efficient remote storage, Pinterest ensures scalability, reliability, and efficient long-term data management.

PS - I recently published an article on my free Newsletter covering this case study in-depth with visuals: https://designsystemsweekly.substack.com/p/how-pinterest-leverages-kafka-for

devops Article's

30 articles in total

DevOps bridges the gap between development and operations, emphasizing collaboration, automation, and continuous delivery in software development.

Day 04: Docker Compose: Managing multi-container applications

AWS Certification Syllabus [Updated 2025]

Research DevOps metrics and KPIs

Kafka server with SASL_OAUTHBEARER

Introduction to Terraform: Revolutionizing Infrastructure as Code

Amazon S3 vs. Glacier: Data Archival Explained

Be sure to check out our new bug bounty platform!

Làm thế nào để quản lý secrets hiệu quả trên nhiều nền tảng chỉ với một công cụ?

Как создать свой VPN и получить доступ ко всему?

Building a Weather Data Collection System with AWS S3 and OpenWeather API

Terraform input validation

NXP i.MX8MP Platform Porting Driver Tutorial

Stop Worrying About EC2 Patching – Automate It Like a Pro!

How Pinterest uses Kafka for Long-Term Data Storage

currently reading

Something You Didn't Know About AWS Availability Zones

Advanced Load Balancing with Traefik: An Introduction to Progressive Delivery, Mirroring, Sticky Sessions, and Health Checks

Psychotherapy Technology Advancements

Any recommendations of open source asset inventory ?

AIOps : Investigation par l’IA dans Kubernetes avec HolmesGPT, Ollama et RunPod …

How to Solve Common Kubernetes Multi-Cluster Deployment Issues

Power Up Your AWS Game: Create EC2 Instances, Install Apache, and Connect with PowerShell

Effortless vCluster Management with Sveltos: An Event-Driven Approach

Docker vs kubernetes

🚀 Week 3 Recap: Learning in Public – Software Engineering with DevOps 🚀

HashiCorp Vault Setup Guide for NEAR Protocol Accounts

Mastering Kubernetes Storage: A Deep Dive into Persistent Volumes and Claims

Configuring Public IP addresses in Azure

SPL: a database language featuring easy writing and fast running

Cloud computing can be confusing, but it doesn't have to be! ☁️🤔 In the latest episode of Cloud in List of Threes (CiLoTs), I’m serving up easy-to-digest (pun intended 🤭) explanations analogy to explain Regions, Availability Zones, and Edge Locations

Featured ones:

abubakersiddique761