dev-resources.site
for different kinds of informations.
Google Pub/Sub
Introduction: The Power of Event-Driven Architectures
Have you ever wondered how large-scale systems like Google, Netflix, or Spotify manage to process millions of real-time events, such as sending notifications, updating dashboards, or recommending personalized content, all without missing a beat? The secret lies in event-driven architectures, and at the heart of such systems is a powerful tool: Google Cloud Pub/Sub.
In a world where businesses demand seamless integration and instant processing of data, traditional point-to-point messaging systems often fall short. Enter Google Pub/Sub, a fully managed messaging service designed to decouple applications, handle massive data streams, and enable real-time communication with unparalleled reliability and scalability.
Why Pub/Sub is Essential for Modern Systems
Modern applications require more than just static workflows; they thrive on dynamic, event-driven processes that can adapt and respond to real-time data. Here are some key reasons why Google Pub/Sub has become indispensable:
Real-Time Processing: From IoT devices generating millions of sensor readings to financial systems detecting fraudulent transactions, Pub/Sub ensures these events are captured and processed in real time.
Scalability: Pub/Sub seamlessly scales to handle millions of messages per second, supporting the ever-growing demands of modern businesses.
Decoupling of Services: It simplifies architectures by allowing independent components to communicate asynchronously, improving maintainability and flexibility.
Global Reach: Built on Google's global infrastructure, Pub/Sub offers low-latency message delivery across regions.
Integration-Friendly: It integrates seamlessly with other Google Cloud services like BigQuery, Dataflow, Cloud Functions, and Cloud Storage, making it a cornerstone for building robust, interconnected systems.
Google Cloud Pub/Sub is a fully managed messaging service designed for real-time communication, enabling independent applications to send and receive messages seamlessly. It's a powerful tool for building scalable and reliable systems.
In this tutorial, we'll delve into the core concepts of Pub/Sub, explore practical examples using the command-line interface (CLI) and Python SDK, and discuss various use cases.
Key Concepts
- Topic: A named resource to which messages are published.
- Subscription: A named resource that receives messages from a specific topic.
- Publisher: An application that sends messages to a topic.
- Subscriber: An application that receives messages from a subscription.
- Message: The core unit of communication in Pub/Sub, consisting of data (payload) and attributes (key-value metadata).
- Acknowledgment (Ack): Subscribers must confirm receipt of a message by acknowledging it. Unacknowledged messages are redelivered to ensure reliable processing.
- Push vs. Pull Subscriptions
- Pull: Subscribers explicitly fetch messages.
- Push: Pub/Sub pushes messages to an HTTP(S) endpoint.
- Dead Letter Queue (DLQ): A separate topic for routing undeliverable messages after a defined number of delivery attempts. Useful for troubleshooting.
- Retention Policy: Controls how long messages are stored if they remain unacknowledged or acknowledged. Default: 7 days.
- Filters: Enable subscribers to receive only messages that match specific criteria based on attributes.
- Ordering Keys: Ensures messages with the same ordering key are delivered in the order they were published.
- Exactly Once Delivery: Guarantees no duplicate messages are delivered when configured correctly.
- Flow Control: Helps subscribers manage the rate of message delivery to prevent being overwhelmed.
- IAM Permissions and Roles: Pub/Sub uses Google Cloud IAM to control access.
- Key roles:
- Publisher: roles/pubsub.publisher
- Subscriber: roles/pubsub.subscriber
- Viewer: roles/pubsub.viewer
- Message Deduplication: Prevents duplicate messages caused by retries or network issues.
- Message Expiration (TTL): Automatically drops messages from a subscription after a specified time-to-live, reducing storage costs.
- Backlog: The collection of unacknowledged messages in a subscription, enabling message replay if required.
- Schema Registry: Defines structured message formats (e.g., Avro, Protocol Buffers), ensuring consistent data validation.
- Batching: Groups multiple messages together for efficient publishing and delivery, improving performance.
- Regional Endpoints: Publish messages to region-specific endpoints for compliance and reduced latency.
- Cross-Project Topics and Subscriptions: Share Pub/Sub resources across projects for multi-project architectures.
- Monitoring and Metrics: Integrated with Cloud Monitoring, providing insights like throughput, acknowledgment latency, and backlog size.
- Snapshot: Captures the state of a subscription at a specific time, enabling message replay from that point.
- Message Encryption: Messages are encrypted in transit and at rest by default, with the option to use Customer-Managed Encryption Keys (CMEK).
Getting Started
Pre-requisites
- Set Up Your Google Cloud Project:
- Create a new Google Cloud project or select an existing one.
- Enable the Pub/Sub API.
-
Install the Google Cloud SDK:
- Download and install the SDK from the official website.
- Authenticate your Google Cloud account using
# using cli gcloud auth login
Creating a Topic and Subscription
# Create a topic using cli
gcloud pubsub topics create my-topic
# Create a subscription using cli
gcloud pubsub subscriptions create my-subscription --topic my-topic
# Using the Python SDK
from google.cloud import pubsub_v1
# Create a Publisher client
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path("your-project-id", "my-topic")
publisher.create_topic(topic_path)
# Create a Subscriber client
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path("your-project-id", "my-subscription")
subscriber.create_subscription(subscription_path, topic_path)
Publishing Messages
# using cli
gcloud pubsub topics publish my-topic \
--message "Hello, Pub/Sub!"
# using the Python SDK
# Publish a message
message = "Hello, Pub/Sub!"
future = publisher.publish(topic_path, message.encode("utf-8"))
print(future.result())
Subscribing to Messages
# Using the CLI
gcloud pubsub subscriptions pull my-subscription --auto-ack
# Using the Python SDK
def callback(message):
print("Received message: {}".format(message.data))
message.ack()
subscriber.subscribe(subscription_path, callback=callback)
# The subscriber will stay alive until interrupted
while True:
time.sleep(60)
Use Cases with Practical Examples
Triggers with Cloud Functions
Use Case: Automatically trigger a process whenever a new message is published to a Pub/Sub topic. For example, sending an email notification when a new user registers.
Solution: Use Cloud Functions to listen to Pub/Sub topics and handle messages.
Steps:
Pre-Requisites
-- Enable the apis run.googleapis.com, cloudbuild.googleapis.com, eventarc.googleapis.com
-- Grant the role roles/logging.logWriter
to the default compute servie account
gcloud projects add-iam-policy-binding $GCP_PROJECT \
--member="serviceAccount:<ID>[email protected]" \
--role="roles/logging.logWriter"
-
Create a Pub/Sub topic:
gcloud pubsub topics create user-registration
-
Write a Cloud Function:
def send_email_notification(event, context): import base64 message = base64.b64decode(event['data']).decode('utf-8') print(f"Sending email notification for: {message}")
-
Deploy the Cloud Function:
gcloud functions deploy sendEmailNotification \ --runtime python39 \ --trigger-topic user-registration \ --entry-point send_email_notification
-
Publish a message to test:
gcloud pubsub topics publish user-registration --message "New User: John Doe"
Result: The Cloud Function logs the email
notification process.
Streaming Data to BigQuery
Use Case: Process and analyze large volumes of event data, such as IoT sensor readings or transaction logs, by streaming messages from Pub/Sub to BigQuery.
Solution: Use a pre-built Dataflow template to ingest data into BigQuery.
Steps:
-
Create a Pub/Sub topic and subscription:
gcloud pubsub topics create sensor-data gcloud pubsub subscriptions create sensor-data-sub --topic=sensor-data
-
Enable BigQuery API and create a BigQuery dataset and table:
# schema.json [ { "name": "temperature", "type": "FLOAT", "mode": "NULLABLE" }, { "name": "humidity", "type": "FLOAT", "mode": "NULLABLE" } ]
```
bq mk sensor_dataset
bq mk --table sensor_dataset.sensor_table \
schema.json
```
-
Enable the Dataflow API and run the Dataflow job:
# enable dataflow api https://console.developers.google.com/apis/api/dataflow.googleapis.com/overview?project=your-project-id
```bash
# run the dataflow job
gcloud dataflow jobs run pubsub-to-bigquery \
--gcs-location gs://dataflow-templates-us-central1/latest/PubSub_to_BigQuery \
--region us-central1 \
--parameters inputTopic=projects/your-project-id/topics/sensor-data,outputTableSpec=your-project-id:sensor_dataset.sensor_table
```
-
Publish messages to the topic:
gcloud pubsub topics publish sensor-data --message '{"temperature": 25, "humidity": 60}'
Result: The message data should appear in the BigQuery table for further analysis.
Logging Events with Sinks
Use Case: Centralize and process logs by routing Cloud Logging events to a Pub/Sub topic for downstream processing, like alerting or archiving.
Solution: Create a logging sink targeting a Pub/Sub topic.
Steps:
- Create a Pub/Sub topic:
gcloud pubsub topics create log-events
- Create a logging sink:
gcloud logging sinks create log-sink \
pubsub.googleapis.com/projects/your-project-id/topics/log-events
- Set IAM permissions for the sink service account:
gcloud pubsub topics add-iam-policy-binding log-events \
--member=serviceAccount:service-account-email \
--role=roles/pubsub.publisher
- Create a subscriber to process the logs:
from google.cloud import pubsub_v1
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path('your-project-id', 'log-events-sub')
def callback(message):
print(f"Received log: {message.data.decode('utf-8')}")
message.ack()
subscriber.subscribe(subscription_path, callback=callback)
print("Listening for log events...")
Result: Log messages flow into Pub/Sub and can be processed by your custom subscriber.
Streaming Messages to Cloud Storage
Use Case: Archive real-time data, such as application usage metrics or user activity logs, to Cloud Storage for long-term storage.
Solution: Use Dataflow to write Pub/Sub messages to a Cloud Storage bucket.
Steps:
- Create a Pub/Sub topic and subscription:
gcloud pubsub topics create app-metrics
gcloud pubsub subscriptions create app-metrics-sub --topic=app-metrics
- Create a Cloud Storage bucket:
gsutil mb gs://your-bucket-name
- Run the Dataflow job:
gcloud dataflow jobs run pubsub-to-gcs \
--gcs-location gs://dataflow-templates-us-central1/latest/PubSub_to_Cloud_Storage_Text \
--region us-central1 \
--parameters inputTopic=projects/your-project-id/topics/app-metrics,outputDirectory=gs://your-bucket-name/data/,outputFilenamePrefix=metrics,outputFileSuffix=.json
- Publish messages:
gcloud pubsub topics publish app-metrics --message '{"event": "page_view", "user": "123"}'
Result: Messages are written as JSON files in the specified Cloud Storage bucket.
Conclusion
Google Pub/Sub is a versatile tool for building scalable and reliable applications. By understanding the core concepts and leveraging the CLI and Python SDK, you can effectively utilize Pub/Sub to solve a variety of real-world problems.
Featured ones: