Logo

dev-resources.site

for different kinds of informations.

What is Observability?

Published at
1/3/2025
Categories
opensource
metrics
prometheus
observability
Author
michgboxy
Author
9 person written this
michgboxy
open
What is Observability?

What is Observability?

Observability is the ability to understand a system's internal state and behavior by examining its external outputs, such as metrics, logs, and traces.

It can be likened to a doctor's ability to diagnose a patient based on the patient's complaints and symptoms.


Observability & the Doctor's analogy:

  1. The Patient's Complaints = External Outputs

    Like a patient shares symptoms, a system provides metrics, logs, and traces as its external outputs.

  2. The Doctor's Diagnosis = Understanding the System's Internal State

    A doctor uses the patient's symptoms to diagnose the kind of illness. In the same way observability tools are used to analyze system outputs to understand the internal state and behavior.

  3. Medical Tests = Observability Tools

    A doctor may use X-rays, scans, and a series of other tests to gather more data about the patient's underlying illness, just as we use monitoring tools, log aggregators, tracing, and visualization systems to get a more detailed insight into a system.


Why Observability Matters

The internal state of a system determines the behavior of the system. Observability gives us insight into the internal happenings of a system, so we can make sense of the system's behavior.


The Three Pillars of Observability

1. Metrics

System metrics gotten from Grafana showing different system measurable datapoint
These are the data points used to measure a system's performance and resource usage over time.

  • Examples are: CPU usage, memory utilization, and request latency.

Purpose: These are important data points that can greatly affect the performance of our system, and having this data gives us a pointer to diagnosing our system's performance-related issues.

2. Logs

Logs are the event watchers of a system, providing a detailed and time-stamped record of events that occurred in a system.

Examples:

  • "User X cannot add a project on trackmention.com at 10:12:15."
  • "trackmention.com database connection timeout at 15:05:17."

Purpose: These logs provide information about what happened in the system at a specific time, helping pinpoint issues.

3. Traces

Traces of a system showing the requests and the timestamp of occurrence.
Traces of a system request and the timestamp of occurrence

End to end trace of request journey and how long it took at each point

Traces provide the end-to-end record of system requests. For example:

An endpoint that stores user registration data has 3 layers: the handler, the controller/service, and the store layer. Tracing gives us a record of how the request:

  • Hits the route/handler layer.
  • Gets passed to the controller/service layer.
  • Finally reaches the store layer where it is saved in the database.

Purpose: The tracing record provides insights, such as how long it takes for each layer to process the request, so we can easily recognize the part of our system with performance bottlenecks.


Tools for Observability

  • Metrics: Prometheus, Datadog, CloudWatch
  • Logs: Loki, ELK stack (Elasticsearch, Logstash, Kibana)
  • Traces: Jaeger, OpenTelemetry

Each tool has its unique strengths. For instance:

  • Prometheus excels at real-time metrics collection.
  • The ELK stack is ideal for centralized log management.
  • Jaeger and OpenTelemetry specialize in distributed tracing.

Summary

Observability is essential for understanding the internal workings of a system. By using metrics, logs, and traces along with tools like Prometheus, ELK Stack, and OpenTelemetry, we can diagnose system issues effectivelyโ€”just like a doctor uses symptoms and tests to diagnose a patient.

metrics Article's
30 articles in total
Favicon
25 Questions About Event Success KPIs: The Event FAQ Guide
Favicon
What is Observability?
Favicon
How I obtained the true lines of code in my project.
Favicon
Building a Metric Program That Actually Works
Favicon
Rule of 40 - A Key Metric for Evaluating SaaS Companies
Favicon
DORA Metrics and Implementation in CircleCI
Favicon
[Open Source] Simplify Metrics Reporting in NestJS with a Zero-Dependency-Injection Global Reporter
Favicon
Google Log-Based Metrics
Favicon
Observability - 3(Prometheus Explanation)
Favicon
Observability - 2(Metrics, Monitoring & Prometheus)
Favicon
ECS Orchestration Part 4: Monitoring
Favicon
Observability - 4(Custom Metrics Instrumentation)
Favicon
Advanced Metrics Optimization: Filter, Reduce, and Aggregate
Favicon
Monitor EC2 memory and disk with CloudWatch Agent๐Ÿ•ต๐Ÿฝ
Favicon
Kubectl Top command:-Secrets behind scenes
Favicon
Long Frames and INP: Understanding Post-Load Performance
Favicon
Top 5 UX Frameworks to measure User Experience
Favicon
Is Modern Observability Enough?
Favicon
Understanding the 0.6-Second Detection Time for Full Outages
Favicon
Exposing Hazelcast Metrics with JMX and Prometheus
Favicon
Understanding CPU Performance Metrics
Favicon
Behavioral and Attitudinal UX metrics
Favicon
The Role of DORA Metrics in Accelerating Software Delivery
Favicon
OpenTelemetry Metrics meets Azure
Favicon
Build metrics and budgets with git-metrics
Favicon
Top KPIs for Your SaaS Venture Post-Launch
Favicon
Forget the Gym, Benchmark Testing is Here to Pump Up Your Software!
Favicon
The Adventures of Blink #23: Outcome-Based Measurement
Favicon
The Importance of Analytics in Mobile Application ๐Ÿ“Š
Favicon
Measurements with MetricKit

Featured ones: