dev-resources.site
for different kinds of informations.
How to use OpenTelemetry to expose custom Prometheus metrics from nodeJS applications
A standard way of exposing metrics for a nodeJS application is to use the prom-client package, which has everything one would need. But we are in 2023, almost 2024 and it is time to use industry standards, and here I am talking about OpenTelemetry. Letâs answer the question together on how to use open-telemetry JS to expose custom metrics.
I donât think there is a need to present the OpenTelemetry project anymore. 2nd CNCF project with highest velocity at the time of writing, vendor-neutral and open-source.
Example nodeJS project
The following commands help to setup a nodeJS project with some OpenTelemetry packages. You might notice when using open telemetry package that the list in package.json grows quite fast. But be assured, most of them are lightweight. For instance @opentelemetry/exporter-prometheus
is 24.4
kB minified and gzipped.
mkdir otel-prom cd otel-prom
npm init -y
touch index.js
npm install --save @opentelemetry/exporter-prometheus @opentelemetry/api @opentelemetry/sdk-metrics
# Note: I am using the following versions:
# "@opentelemetry/api": "^1.7.0",
# "@opentelemetry/exporter-prometheus": "^0.45.0",
# "@opentelemetry/sdk-metrics": "^1.18.0",
-
@opentelemetry/sdk-metrics
and@opentelemetry/exporter-prometheus
are required to register metrics and export them in prometheus format. -
@opentelemetry/api
is optional, and contain some useful utilities & interfaces. - You could even add
@opentelemetry/sdk-node
to get some native nodejs metrics.
Note: The Otel packages shown in this blog are currently considered experimental packages under active development. New releases may include breaking changes.
Then proceed to edit the file index.js
with the following content:
const { ValueType } = require("@opentelemetry/api");
const { PrometheusExporter } = require("@opentelemetry/exporter-prometheus");
const { MeterProvider } = require("@opentelemetry/sdk-metrics");
const http = require("http");
// Mock a potential database call to query data
const queryDatabaseStuff = async () => new Promise((resolve) =>
setTimeout(resolve(Math.floor(Math.random() * 100)), 50)
);
const startMetricsExporter = () => {
console.log(`starting prometheus metrics server`);
// you can choose in the options the port, and if you want to start a webserver
const exporter = new PrometheusExporter({
port: 9100,
});
const meterProvider = new MeterProvider();
meterProvider.addMetricReader(exporter);
const meter = meterProvider.getMeter("prometheus");
// create the gauge
const outdatedDataCountGauge = meter.createObservableGauge("outdated_data_count", {
valueType: ValueType.INT,
description: "outdated data count",
});
// callbacks are executed when the /metrics endpoint is hit
outdatedDataCountGauge.addCallback(async (result) => {
const outdatedDataCount = await queryDatabaseStuff();
result.observe(outdatedDataCount);
});
};
// simulate a normal webserver, it could be fastify, express, etc.
// Just to illustrate you could run the exporter along a classic webserver
const startNormalWebServer = () => {
console.log(`starting normal web server`);
http
.createServer(function (req, res) {
res.write("Hello World!");
res.end();
})
.listen(8089);
};
startMetricsExporter();
startNormalWebServer();
Run the application with node index.js
to see 2 web servers. You can go to localhost:8089
to see the normal web server, and localhost:9100/metrics
to see the metrics endpoint.
Here are more detailed explanations of what the code is doing:
queryDatabaseStuff()
is to mock a DB call that would probably fetch some data count, like outdated data.
meter.createObservableGauge
then records a Gauge, where you can pass a description and the expected value type. The part not very well documented in OTel is the addCallback
function, which allow you to run a function when the metrics endpoint is called.
Be aware this approach is optimal for relatively light database queries. In situations where your query examines multiple databases, executing complex aggregations that can slow down the system, a more effective strategy is to run the query at predetermined intervals in the background. The recent results are stored and merely retrieved during the callback, rather than executing a comprehensive query each time. This tactic essentially minimizes the time-consuming database operations during the metrics' collection and maintains a responsive and efficient metrics endpoint.
About the options passed to PrometheusExporter
, I would advise keeping the default which is to start a web server on a different port. Therefore you are sure you wonât expose the metrics by mistake to the outside world.
and... with Kubernetes? â¸ď¸
If you use Kubernetes as your underlying platform, you can then have a service exposing 2 ports, with only one of them being for the ingress of the application web server, and the other one for the ServiceMonitor
watching the metrics web server.
---
apiVersion: apps/v1
kind: Deployment
# skip the boring part
...
ports:
- containerPort: 8089
name: http
protocol: TCP
- containerPort: 9100
name: metrics
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: my-service
labels:
app: my-web-app
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8089
- name: metrics
port: 9100
protocol: TCP
targetPort: metrics
selector:
app: my-web-app
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-servicemonitor
labels:
app: my-web-app
spec:
endpoints:
- interval: 30s
port: metrics
jobLabel: ''
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app: my-web-app
It makes it way easier to prevent exposing the metrics endpoint to the outside, you then don't have to juggle with ingress path routing to exclude /metrics
.
Conclusion
OpenTelemetry is a great project, and I am glad to see it growing so fast, but it can be uneasy to get familiar with the Otel terms. I hope this blog post will help you as documentation is still missing for observability with nodeJS in general.
Hope this post helps, and wonât be deprecated too fast!
Featured ones: