Logo

dev-resources.site

for different kinds of informations.

How to Configure a Remote Data Store for Prometheus

Published at
12/21/2024
Categories
prometheus
monitoring
devops
sre
Author
talonx
Categories
4 categories in total
prometheus
open
monitoring
open
devops
open
sre
open
Author
6 person written this
talonx
open
How to Configure a Remote Data Store for Prometheus

Introduction

The Prometheus monitoring tool can store its metrics either locally or remotely. You can configure a remote data store using the remote_write configuration. This article describes the various data store options available as well as how to set up a remote store.

Overview of Remote Storage

By default, Prometheus stores data locally wherever it is installed. The data directory can be configured by using the --storage.tsdb.path command line option when starting Prometheus.
In practice you can use a separate disk for higher performance attached to the machine where Prometheus is running.

However, this may not be possible or optimal in all situations as you might want a data store that is more suited for time series data, and has larger storage capabilities for higher data retention. Prometheus would usually run in a standalone VM or a Kubernetes pod or a Docker container, and it would not have access to such data stores by default.

A remote store can add these capabilities to Prometheus. The remote storage option can be set by using the remote_write key in the Prometheus configuration YAML file.

Remote Store Architecture

Prometheus remote write architecture

Remote Store Configuration

Basic Syntax

A very simple configuration for a remote store that accepts unauthenticated connections would look like this:

remote_write:
- url: "http://192.168.23.4/api/v1/write"
  name: "production-metrics"
Enter fullscreen mode Exit fullscreen mode

You can have multiple remote_write sections in the same Prometheus configuration.

Based on your requirements and the features supported by the remote write server you can configure other options. Let us look at them one by one.

Security and Authentication

To protect your metrics data in transit whether it is traveling via your internal network or through the internet, you can enable both TLS as well as authentication. The remote store server should
support these options.

# Remote write configuration for Prometheus
remote_write:
- url: "https://prometheus-data-store.mydb.io/api/v1/write"
  name: "production-metrics"

  headers:
    Authorization: "Bearer <token>"

  basic_auth:
    username: "prometheus"
    password: "secret-password"

  tls_config:
    insecure_skip_verify: false
    ca_file: "/path/to/ca.pem"
    cert_file: "/path/to/cert.pem"
    key_file: "/path/to/key.pem"
Enter fullscreen mode Exit fullscreen mode

This sample configuration does the following:

  • Adds a Bearer token for authentication, as well as basic auth options. In practice you would use only one of these.
  • Adds a tls_config assuming you have a custom CA which has issued the certificates for the remote store's server. If it's a certificate issued by a well-known CA, you would not have to configure this. This option would come in handy when you have a private CA.

You can also create a separate authorization section for more options while setting the Authorization header. Note that the options below are mutually exclusive - the example is only for illustration.

# Example 1: Default Bearer type with direct credentials
authorization:
  type: Bearer
  credentials: "eyJhbGciOiJIPoI1NiIsInR5cCI6IkpXVCJ9..."

# Example 2: Bearer type with credentials from file. This is mutually exclusive with credentials_file
authorization:
  type: Bearer
  credentials_file: "/etc/prometheus/token.txt"

# Example 3: Custom type with direct credentials
authorization:
  type: CustomAuth
  credentials: "secret-token-123"
Enter fullscreen mode Exit fullscreen mode

Remote Write Protocol Configuration

As of this writing, the remote write specification is undergoing a change.
You probably don't have to worry about this section unless you are optimizing for very specific cases. You can configure the protobuf_message object that Prometheus uses when sending metrics.
This depends on what your remote store server supports.

remote_write:
- url: "http://192.168.23.4/api/v1/write"
  name: "production-metrics"

  protobuf_message: prometheus.WriteRequest
Enter fullscreen mode Exit fullscreen mode

Network Configuration

Based on the properties of your remote store server, you can tune some functional settings.

The remote_timeout key sets the timeout for requests to the remote write endpoint. The default value is 30s. You would not need to set this unless you have a noisy network, or there are shorter timeouts in the network path between your Prometheus server and the remote store server.

If your remote store is behind a proxy server, you can configure the proxy details in the YAML.

remote_write:
- url: "http://192.168.23.4/api/v1/write"
  remote_timeout: 45s
  name: "production-metrics"

  # Proxy configuration
  proxy_url: "http://proxy.internal:4200"
  proxy_connect_header:
    "Proxy-Authorization": ["Basic xxxxxxxxxxxxxxxxxxxx"]
    "X-Custom-Proxy-Header": ["app1", "app2"]
  proxy_from_environment: false

  follow_redirects: true
  enable_http2: true
Enter fullscreen mode Exit fullscreen mode

Metrics Configuration

You can use a relabel_config key to modify or drop specific metrics before they are written to the remote store. The relabel syntax is identical to that used in the scrape_config section. You might want to do this if:

  • You have multiple remote stores and want specific metrics to go to specific stores to avoid unnecessary storage costs.
  • You have one remote store but don't want certain metrics to be written there but let them remain with Prometheus' local storage.
remote_write:
  write_relabel_configs:
    - source_labels: [__name__]
      regex: 'test_metric.*'
      action: drop
    - source_labels: [environment]
      regex: 'staging'
      action: drop
Enter fullscreen mode Exit fullscreen mode

Queue Configuration

The queue_config has settings to fine tune the queue that is used to write to remote storage. Prometheus creates an internal queue for each remote write server. As it collects metrics, Prometheus maintains a write-ahead log (WAL) that it can replay if there's a crash. Each remote destination queue picks up metrics data from the WAL and sends it to the remote store server. Each queue can also have multiple shards, which can be used to configure the amount of parallelism for each queue.

You will have to to tune the queue settings only if you have a very high volume of data and/or are facing issues with the remote store struggling to keep up with your Prometheus server.

You can check out these great writeups on tuning the queue settings for remote_write.

Remote Storage Options

A non-exhaustive list of software that supports the Prometheus remote write protocol includes:

  • Thanos
  • VictoriaMetrics
  • Splunk
  • OpenTSDB
  • Kafka
  • InfluxDB
  • Google BigQuery

Troubleshooting

Prometheus failing to write to the remote storage

This can be caused by a number of issues:

Network connectivity between Prometheus and the remote store

Check if you can reach the remote store using ping or curl.

If there is a proxy in between, it might be dropping packets or might not be running

Check if the proxy is running. Verify that the proxy configuration as well as the Prometheus remote_write proxy settings are correct. Check the proxy server's logs for any errors. The proxy might be blocking large packets.

Requests are timing out due to network issues

Run a traceroute from your Prometheus server to the remote store to see if packets are being dropped.

Requests are timing out due to the remote store not being able to keep up

Tune the queue configuration. If this happens suddenly, it's important to find out the root cause.

  • The number of metrics might have increased due to autoscaling events or an increase in cardinality.
  • The remote store might have disk issues.

Best Practices

  • Backup your data in the remote store.
  • Add security and authentication between your Prometheus and the remote store server. If your remote store does not support this natively, you can add a proxy like nginx in between and configure it to have TLS and authentication.
  • Monitor your remote store metrics for indications of trouble.
  • If you are in a regulated industry, ensure that your remote store is compliant with your requirements. E.g. if it's managed by a cloud vendor, ascertain that their security credentials are sufficient for your needs.

Conclusion

The remote store functionality in Prometheus offers a scalable and flexible way of adding a dedicated storage backend for Prometheus metrics. You can use the remote store for increased data retention,
durability of data, and offline data analysis.

sre Article's
30 articles in total
Favicon
In 2025, I resolve to spend less time troubleshooting
Favicon
Observability Unveiled: Key Insights from IBM’s SRE Expert
Favicon
SSH Keys | Change the label of the public key
Favicon
Rely.io Update Roundup - December 2024
Favicon
From Ancient Firefighters to Modern SREs: Balancing Proactive and Reactive Work with Callgoose SQIBS Automation
Favicon
AIOps Powered by AWS: Developing Intelligent Alerting with CloudWatch & Built-In Capabilities
Favicon
Automation for the People
Favicon
we are doing DevOps job market Q&A with folks from Google, AWS, Microsoft etc.
Favicon
SRE for the SaaS
Favicon
Rely.io October 2024 Product Update Roundup
Favicon
The Pocket Guide to Internal Developer Platform
Favicon
How to Configure a Remote Data Store for Prometheus
Favicon
Day 10: ls -l *
Favicon
Why does improving Engineering Performance feel broken?
Favicon
Incident Management vs Incident Response: What You Must Know
Favicon
Retry Pattern: Manejando Fallos Transitorios en Sistemas Distribuidos
Favicon
Top Backstage alternatives
Favicon
The Vital Role of Human Oversight in AI-Driven Incident Management and SRE
Favicon
The Role of External Service Monitoring in SRE Practices
Favicon
Looking for an incident management tool?
Favicon
Rely.io October 2024 Product Update Roundup
Favicon
A Very Deep Dive Into Docker Builds
Favicon
SRE Culture Embedding Reliability into Engineering Teams
Favicon
Check out our new whitepaper: "Internal Developer Platforms and Portals, a complete overview"
Favicon
Control In the Face of Chaos
Favicon
2x Faster, 40% less RAM: The Cloud Run stdout logging hack
Favicon
Understanding and Minimizing Downtime Costs: Strategies for SREs and IT Professionals
Favicon
SRE vs DevOps: What’s the Difference and Why Does It Matter? πŸ€“
Favicon
Rely.io September 2024 Product Update Roundup
Favicon
Best Practices for Choosing a Status Page Provider

Featured ones: