Logo

dev-resources.site

for different kinds of informations.

The Adventures of Blink S2e9: Gathering Metrics with Prometheus and Grafana

Published at
11/7/2024
Categories
monitoring
flask
python
devops
Author
Ben Link
Categories
4 categories in total
monitoring
open
flask
open
python
open
devops
open
The Adventures of Blink S2e9: Gathering Metrics with Prometheus and Grafana

Hey friends, and welcome to the next Adventure of Blink! If you've been following along, we've done a ton of cool stuff this season:

  • We learned about Docker
  • We configured a MongoDB (in a Docker container with persistent storage)
  • We made a Flask API for the database (also in a Docker container)
  • We've explored Test-Driven Development practices with PyTest
  • We made our tests run every time we commit to our repository using GitHub Actions
  • We created a graphical interface for our program using Tkinter
  • We scanned our project for security vulnerabilities using Snyk

...suffice to say, we've been really busy. But we're not through yet! Today we're covering another oft-overlooked topic:

Observability!

Observability is a core component of the DevOps mindset... because it's a place where Dev and Ops can easily interact. Ops is usually on the receiving end of support tickets and user complaints... but it's hard to diagnose something like "the application is slow!" without firm evidence of what happened. But if your developers aren't considering metrics and observability behavior when they're coding, you're not going to have those metrics for Ops to confirm a user's complaint.

TL/DR: Youtube

When to apply metrics

The answer to this is... as early as possible! Build metrics into your code while you're writing it, and become accustomed to using them throughout the development process.

Why are they important?

Using metrics in the development process ensures that you understand from the beginning how the application behaves. You'll want to consider things like load testing as you complete your work, ensuring that you see how your code behaves when there are lots of users running it at once.

How metrics are created

We're going to introduce two products to our application environment: Prometheus and Grafana.

Prometheus is a collection mechanism for metrics that we establish in our code. It runs in a Docker container as part of our environment and listens for metrics to be sent by our code... yes, that means we have some code changes to make, but it should be pretty easy work.

What metrics are important to us?

In our Hangman game, there's not a lot of processing going on. As a result, adding metrics for the performance of the application itself? Probably not all that useful.

A place where metrics would be useful would be around the API calls. If anything's going to malfunction, it's going to be the data extraction code... after all, that's the place where multiple containers get involved and where data has to flow from one system to another seamlessly. So we'll add our metrics instrumentation to the API code.

Setting up the tools

First, let's add the Prometheus and Grafana containers to our docker-compose.yml file:

prometheus:
    image: prom/prometheus:latest
    volumes:
      # This prometheus.yml file we will create shortly 😉
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  mongo-exporter:
    image: bitnami/mongodb-exporter:latest
    environment:
      # Note the name of our mongo container here
      MONGODB_URI: "mongodb://mongo:27017" 
    depends_on:
      - mongo

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    depends_on:
      - prometheus

Next, let's build the prometheus.yml file that establishes the configuration for Prometheus:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'flask-api'
    # We'll have to create a /metrics endpoint in the API...
    metrics_path: '/metrics'  # Endpoint from Flask app
    static_configs:
      # This target is our api container and port
      - targets: ['hangman-api:5001']

  - job_name: 'mongo'
    metrics_path: '/metrics'  # Endpoint for the MongoDB Exporter
    static_configs:
      - targets: ['mongo-exporter:9216']  # Default port for MongoDB Exporter

That leads us to make our code changes in the API:

from flask import Flask, jsonify, request
# Adding in prometheus_client to help us build the metrics additions
from prometheus_client import generate_latest, Counter
from prometheus_client import multiprocess, CollectorRegistry, Gauge, Histogram
from prometheus_client import multiprocess
from pymongo import MongoClient
from pymongo.errors import PyMongoError
from bson.objectid import ObjectId
from datetime import datetime
import os

app = Flask(__name__)

# MongoDB connection
mongo_uri = os.getenv("MONGO_URI_API")
db_name = os.getenv("DB_NAME")
collection_name = os.getenv("COLLECTION_NAME")

# When testing locally, we bypass the .env and load the variables manually
# mongo_uri = "mongodb://blink:theadventuresofblink@localhost:27017/hangman?authSource=admin"
# db_name = "hangman"
# collection_name = "phrases"

client = MongoClient(mongo_uri)
db = client[db_name]
collection = db[collection_name]

# Here's where we set up the metrics objects we're going to need:
REQUEST_COUNT = Counter('flask_app_requests_total', 'Total number of requests to the app')
REQUEST_LATENCY = Histogram('flask_app_request_latency_seconds', 'Latency of requests to the app')

# This route is used by prometheus to extract the metrics.
# generate_latest() is a library method that knows how to get
# all prometheus_client objects and send them for the application
# to pick up.
@app.route('/metrics')
def metrics():
    return generate_latest()

@app.route('/getall', methods=['GET'])
def get_all_items():
    # This is an example of how to instrument a method.
    # Notice we increment the request count, and then
    # the request latency is measured by putting the entire
    # method's code inside a With statement that captures
    # its timing
    REQUEST_COUNT.inc()
    with REQUEST_LATENCY.time():
        try:
            # Find all records in the collection
            words = list(collection.find({}, {"_id": 0}))  # Exclude _id field from the response
            return jsonify(words), 200
        except Exception as e:
            return jsonify({"error": str(e)}), 500

For brevity's sake I didn't include the rest of the API code... but each route needs to be instrumented individually. You can add more metrics if you'd like to observe different behaviors separately.

Another note: make sure you add prometheus into the API's requirements.txt!

blinker==1.8.2
click==8.1.7
dnspython==2.7.0
Flask==3.0.3
itsdangerous==2.2.0
Jinja2==3.1.4
MarkupSafe==3.0.2
prometheus_client==0.21.0
pymongo==4.10.1
Werkzeug==3.1.1

Validating that it all works

Now that we've finished setup, we can start up our application:

# Windows/Unix
docker-compose up --build

# Mac
docker compose up --build

We can see our new containers on ports 9090 (Prometheus) and 3000 (Grafana). Let's start in Prometheus, and set up some metrics queries:

...

Then once we've got them created, we can head over to Grafana to visualize them:

...

Wrapping up

These examples are small and somewhat contrived, in keeping with our theme of exploring these concepts in a small app so we can see how they work without the distraction of scale and complexity. But hopefully you can see from even these examples how much power you have as a developer to see what's happening within your code! This may seem like a lot of work for something that doesn't actually make our game any better or more interesting to play, but the value of metrics is in being able to diagnose more easily when something isn't right.

I hope you've learned a lot this week! We are nearly to the end of Season 2, and I'll tell ya what... it has been such a ride. Our season finale is going to be the long-awaited AI integration - we're going to let a Large Language Model build hangman games for us to play! So tune in next week for another Adventure of Blink!

Featured ones: