dev-resources.site
for different kinds of informations.
Observability - 4(Custom Metrics Instrumentation)
Custom metrics instrumentation and alerting using Prometheus and Node.js are explored. The importance of instrumentation in observability is emphasized, alongside practical examples of metric types like counters and gauges. The video concludes with a demonstration of setting up alerts through the Alert Manager, showcasing real-time notifications for application crashes.
What are the different metric types in Prometheus?
Prometheus supports four main types of metrics:
Counter: A counter is a cumulative metric that increases over time. It is used to represent values that can only go up (e.g., the number of requests received, errors encountered). Counters cannot be decremented.
Gauge: A gauge is a metric that represents a single numerical value that can go up or down. It is used for values that can fluctuate over time (e.g., current temperature, memory usage).
Histogram: A histogram samples observations (e.g., request durations) and counts them in predefined buckets. It allows you to understand the distribution of the observed values, such as how many requests fall into different duration categories.
Summary: A summary also samples observations (like a histogram) but provides a total count, sum of all observed values, and configurable quantiles (e.g., 50th, 95th percentiles). It is useful for calculating the average or estimating distributions but is more resource-intensive than histograms.
Each metric type serves specific use cases, enabling better monitoring and observability of your applications.
How do you configure alerts in Alert Manager?
To configure alerts in Prometheus Alert Manager, follow these steps:
Create an Alert Manager Configuration File: This file defines the alerts you want to monitor. For example, you might want to set up alerts for high CPU usage or pod restarts.
Define Alert Rules: In the configuration file, specify the conditions under which alerts should be triggered. For instance, you can define an alert for when CPU usage exceeds a certain threshold.
Set Up Notification Channels: Specify where you want the alerts to be sent. This could be an email, Slack, or other notification services. For email alerts, you need to provide your email address and SMTP server details.
Configure Email Settings: If using Gmail, enable two-factor authentication and create an app password. This password will be used in the Alert Manager configuration to send emails.
Update the Configuration: Make sure to include your email address and any other necessary details in the Alert Manager configuration file.
Apply the Configuration: Use a command like
kubectl apply -k .
to apply the updated configuration to your Kubernetes cluster.Test the Alerts: You can test the alerts by triggering conditions that would cause them to fire. For example, you can intentionally crash a pod to see if the alert is sent.
Verify Alert Delivery: Check the configured notification channel (like your email) to confirm that alerts are being sent as expected.
By following these steps, you can effectively set up and manage alerts using Prometheus Alert Manager.
What is the role of instrumentation in observability?
Instrumentation plays a foundational role in observability by enabling systems to emit the necessary data for understanding their internal state. Observability relies on three main pillars: metrics, logs, and traces. Instrumentation involves embedding code within applications or using external agents to capture these data types, which are essential for identifying and resolving issues, tracking performance, and understanding system behavior.
Hereβs a breakdown of how instrumentation supports observability:
Metrics Collection: Metrics provide quantitative measurements (like response times, CPU usage, or error rates) that help monitor system health and detect anomalies. Instrumentation collects these metrics by embedding counters, gauges, and histograms within the code, allowing Prometheus or similar tools to track these metrics in real-time.
Logging Context: Logs offer detailed context on events within a system, such as errors, warnings, or informational messages. Instrumenting code to generate structured, meaningful logs enables observability platforms to reconstruct what led up to an issue, offering insights into specific errors or unusual behaviors.
Tracing for Request Flow: Tracing follows individual requests across different services, giving a holistic view of how a request flows through the system. By instrumenting code for distributed tracing (using tools like OpenTelemetry), developers can identify performance bottlenecks, dependencies, and the impact of microservices on each other.
Data for Proactive Monitoring: Instrumentation facilitates proactive monitoring, where systems automatically track key metrics and alert teams to issues before users are affected. This is particularly valuable for scaling, as more complex systems require automated insights to maintain performance and reliability.
Enhanced Debugging and Root Cause Analysis: Instrumentation adds valuable context to observability data, making it easier to pinpoint the root cause of issues. By strategically placing instrumentation in areas likely to fail, developers can enhance their observability systemβs ability to surface relevant data for faster diagnosis and resolution.
In short, instrumentation is the process that enables observability systems to gather the necessary data, which ultimately allows for effective monitoring, troubleshooting, and optimization of applications.
Featured ones: