Exporting WebSocket Metrics to Prometheus #

You can read CPU and memory from your ws Node.js process, but those numbers tell you nothing about whether connections are climbing toward your file-descriptor ceiling, whether clients are silently dropping, or whether a single oversized frame is stalling the event loop. You landed here because you want first-class WebSocket signals — active sockets, churn rate, payload sizes, heartbeat latency — scraped by Prometheus and plotted in Grafana, without bolting on a heavyweight APM agent. This page wires prom-client directly into a ws server, exposes a GET /metrics endpoint, and gives you the matching scrape_config. It also covers the two traps that quietly corrupt these dashboards: high-cardinality labels and gauge drift after a crash.

This is one of the core signals in WebSocket Observability & Monitoring; pair it with distributed tracing for end-to-end visibility into your real-time backend.

Root cause #

Standard HTTP metrics middleware records request count and latency around a request/response cycle that begins and ends in milliseconds. A WebSocket connection inverts that model: a single connection event opens a session that may live for hours, carrying thousands of asynchronous frames in both directions. There is no response to hang a histogram observation off of, so the metrics that matter are not request-shaped — they are connection-lifecycle and frame-shaped.

That forces a specific instrument choice:

  • Active connections is a level that goes up and down, so it is a Gauge — you inc() on the connection event and dec() on close. A Counter would be wrong because Gauges can decrease.
  • Connects and disconnects are monotonic totals, so they are Counters. Their rate (rate(ws_connections_opened_total[5m])) gives you churn, and you label disconnects by close code to separate clean 1000 shutdowns from abnormal 1006 drops.
  • Message size and heartbeat round-trip time are distributions you want quantiles over, so they are Histograms with explicit buckets sized to your payloads and your network.

Two failure modes are inherent to this lifecycle. First, gauge drift: the active-connections Gauge is only correct if every inc() is balanced by exactly one dec(). If the process crashes, every in-flight connection’s dec() is lost — but because Prometheus scrapes a fresh process after restart, the Gauge resets to 0, which is actually the saving grace. The real drift comes from forgetting to register a close handler on some code path (an error during the upgrade, a handler that throws before wiring close), so a connection increments but never decrements within the same live process. Second, high cardinality: every distinct label-value combination is a separate time series in Prometheus’s memory. Labelling a Counter by userId, sessionId, or raw remoteAddress mints a new series per user and will exhaust Prometheus’s heap. Labels must be bounded enumerations — close code, message type, tenant tier — never unbounded identifiers.

WebSocket metrics pipeline The ws server updates prom-client instruments, exposes them on a metrics endpoint, and Prometheus scrapes that endpoint for Grafana. ws server connection / close message / pong prom-client Gauge + Counters Histograms GET /metrics text exposition Prometheus scrape 15s Prometheus pulls the endpoint events push instrument updates

Resolution #

The module below defines four instruments, wires them to ws lifecycle events, and serves the registry on a separate HTTP endpoint. The heartbeat-RTT histogram reuses the ping/pong machinery you already need for liveness — see Connection Lifecycle & Heartbeats for the missed-pong termination logic that complements this measurement.

import http from 'node:http';
import { WebSocketServer, WebSocket } from 'ws';
import { Registry, Gauge, Counter, Histogram, collectDefaultMetrics } from 'prom-client';

const HEARTBEAT_INTERVAL_MS = 30_000;

const registry = new Registry();
collectDefaultMetrics({ register: registry }); // event-loop lag, heap, gc — free baseline

// Level that rises and falls: MUST be a Gauge, never a Counter.
const activeConnections = new Gauge({
name: 'ws_connections_active',
help: 'Currently open WebSocket connections',
registers: [registry],
});

// Monotonic total of opened connections.
const connectionsOpened = new Counter({
name: 'ws_connections_opened_total',
help: 'Total WebSocket connections accepted',
registers: [registry],
});

// Disconnects labelled by close code — a BOUNDED set (1000, 1001, 1006, ...).
// Never label by userId/sessionId/IP: that is unbounded cardinality.
const connectionsClosed = new Counter({
name: 'ws_connections_closed_total',
help: 'Total WebSocket connections closed',
labelNames: ['code'] as const,
registers: [registry],
});

// Inbound frame size in bytes. Buckets span typical JSON up to large payloads.
const messageBytes = new Histogram({
name: 'ws_message_bytes',
help: 'Inbound WebSocket message size in bytes',
buckets: [64, 256, 1_024, 4_096, 16_384, 65_536, 262_144],
registers: [registry],
});

// Heartbeat round-trip time in seconds (Prometheus convention: seconds, not ms).
const heartbeatRttSeconds = new Histogram({
name: 'ws_heartbeat_rtt_seconds',
help: 'Ping-to-pong round-trip time',
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5],
registers: [registry],
});

const wss = new WebSocketServer({ port: 8080 });

wss.on('connection', (ws: WebSocket & { pingSentAt?: number }) => {
activeConnections.inc(); // +1 level
connectionsOpened.inc(); // +1 total

// CRITICAL: register close BEFORE any code that can throw, so dec() always runs.
ws.on('close', (code) => {
activeConnections.dec(); // balances the inc() above
connectionsClosed.inc({ code: String(code) }); // close code is bounded
});

ws.on('message', (data, isBinary) => {
// Buffer.byteLength avoids the UTF-16 string-length trap on text frames.
const size = isBinary
? (data as Buffer).byteLength
: Buffer.byteLength(data.toString());
messageBytes.observe(size);
});

ws.on('pong', () => {
if (ws.pingSentAt !== undefined) {
heartbeatRttSeconds.observe((Date.now() - ws.pingSentAt) / 1000);
}
});
});

// Heartbeat loop: stamp the send time, observe RTT when the pong returns.
setInterval(() => {
for (const ws of wss.clients) {
if (ws.readyState !== WebSocket.OPEN) continue;
(ws as WebSocket & { pingSentAt?: number }).pingSentAt = Date.now();
ws.ping(); // protocol-level frame; the 'pong' handler records the RTT
}
}, HEARTBEAT_INTERVAL_MS).unref(); // unref so the timer never blocks shutdown

// Separate plain-HTTP server for scraping — keep /metrics off the WS port's public surface.
http
.createServer(async (req, res) => {
if (req.url === '/metrics') {
res.setHeader('Content-Type', registry.contentType);
res.end(await registry.metrics()); // serialize the whole registry as text
return;
}
res.statusCode = 404;
res.end();
})
.listen(9091); // distinct port; firewall it to the Prometheus subnet only

The matching Prometheus scrape job, annotated:

scrape_configs:
- job_name: ws-server # appears as the `job` label on every series
metrics_path: /metrics # matches the endpoint above
scrape_interval: 15s # finer than the 30s heartbeat so you see drift
static_configs:
- targets:
- ws-node-1:9091 # the dedicated metrics port, not 8080
- ws-node-2:9091
labels:
tier: realtime # bounded label — safe to add for grouping

With multiple replicas behind a load balancer, scrape every instance directly (each target above) rather than through the balancer, so ws_connections_active is per-node and sum(ws_connections_active) gives the fleet total. This is the same per-node visibility you need when reasoning about the broader Backend WebSocket Connection Management registry across a horizontally scaled deployment.

Operational checklist #

  • Confirm curl localhost:9091/metrics returns all four series plus the collectDefaultMetrics
  • Verify ws_connections_active returns to 0 after closing all clients — a non-zero idle value means an unbalanced inc()/dec()
  • Grep the codebase for any userId, sessionId
  • Right-size histogram buckets to your real payloads; check that observations land across buckets, not all in the last (+Inf
  • Bind the :9091
  • Add a Prometheus alert on rate(ws_connections_closed_total{code="1006"}[5m])
  • After a deploy, accept that gauges reset to 0 on restart; rely on Counters (rate(...)

FAQ #

Why is my active-connections gauge stuck above zero with no clients connected? #

An inc() ran on a connection whose dec() never fired in the same process — usually because the close handler was registered after a line that threw, or on a path (upgrade error, auth rejection) that bypasses it. Register the close listener as the very first thing inside the connection handler. Note that a process crash does not cause this: the restarted process starts the gauge at 0.

How do I avoid high-cardinality labels? #

Only ever use labels whose value set is small and bounded: WebSocket close code (a handful of RFC 6455 values), message type from a fixed enum, or a tenant tier. Never use userId, sessionId, requestId, or raw remoteAddress — each unique value creates a permanent time series and will eventually OOM Prometheus.

Should /metrics be on the same port as the WebSocket server? #

Run it on a separate HTTP listener (:9091 above) so you can firewall the metrics surface to the Prometheus subnet without exposing it to WebSocket clients, and so scrape traffic never competes with the WebSocket upgrade path. Some teams mount it on the same port behind a path guard; the separate-port approach is cleaner to secure.

Why measure heartbeat RTT in seconds instead of milliseconds? #

Prometheus convention is base SI units, so durations are seconds. Histogram quantile functions, Grafana panels, and recording rules across the ecosystem assume _seconds suffixes. Divide your millisecond measurement by 1000 at observation time, as the resolution does.

Do I need a Histogram or a Summary for message size? #

Use a Histogram. Summaries compute quantiles per-instance and cannot be aggregated across replicas, so sum over a load-balanced fleet is meaningless. Histograms expose bucket counts that aggregate cleanly, letting histogram_quantile() compute fleet-wide p95/p99 from all nodes at once.

Back to WebSocket Observability & Monitoring