Scaling Real-Time Infrastructure Beyond One Node #

A single Node.js process holding 30,000 sockets works until it doesn’t — until you need a second box, a rolling deploy, or a zone failover. The moment a connection lives on node A and the event that should reach it originates on node B, in-process broadcast breaks silently. This guide covers the four mechanics that turn a single WebSocket server into a horizontally scaled fleet: cross-node message fan-out over a broker, distributed presence tracking, at-least-once delivery guarantees, and elastic autoscaling on Kubernetes. It is written for backend and platform engineers moving a real-time app from one box to many while keeping latency, ordering, and delivery semantics intact.

The broker is the keystone. Every node publishes locally produced events and subscribes to the channels that matter to its connected clients, so a message produced on node A reaches a socket pinned to node C without either node knowing the other exists. The rest of this guide builds out from that single idea.

Infrastructure baseline #

Before any fan-out logic, three things must already be configured. First, the load balancer has to forward the WebSocket upgrade and hold the connection open longer than your heartbeat interval. The proxy work, sticky routing, and upgrade headers belong to Backend WebSocket Connection Management; scaling assumes that foundation is solid.

# nginx.conf — upgrade passthrough for a WebSocket upstream pool
upstream ws_pool {
  least_conn;
  server 10.0.0.11:8080 max_fails=2 fail_timeout=10s;
  server 10.0.0.12:8080 max_fails=2 fail_timeout=10s;
  server 10.0.0.13:8080 max_fails=2 fail_timeout=10s;
}
location /ws {
  proxy_pass http://ws_pool;
  proxy_http_version 1.1;
  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection "upgrade";
  proxy_read_timeout 90s;          # must exceed HEARTBEAT_INTERVAL_MS
}

Second, because a reconnecting client may land on any node, you must either pin sessions or make every node stateless behind the broker. Stateless fan-out is the more scalable choice, but rolling deploys still benefit from Load Balancer Sticky Sessions to avoid mass reconnect storms when a pod drains.

Third, provision Redis with the headroom for pub/sub traffic. Pub/sub messages are not buffered — a slow subscriber gets disconnected once its output buffer overflows, so size the client output limits explicitly.

# redis.conf — give pub/sub subscribers a generous output buffer
# class    hard-limit  soft-limit  soft-seconds
redis-cli config set client-output-buffer-limit "pubsub 64mb 32mb 60"
redis-cli config set tcp-keepalive 60
ulimit -n 65535   # each WS node also needs a high fd ceiling

Core cross-node message fan-out #

The central mechanism is a publish/subscribe loop on every node. Each node keeps a local registry of its own sockets, subscribes to the channels it needs, and on receiving a published event delivers it to matching local sockets only. The deep dive lives in Redis Pub/Sub Fan-Out; below is the minimal runnable shape using ioredis and ws.

// Cross-node fan-out: one publisher connection, one subscriber connection.
// ioredis requires a dedicated connection for SUBSCRIBE mode.
import { WebSocketServer, WebSocket } from 'ws';
import { Redis } from 'ioredis';

const FANOUT_CHANNEL = 'ws:broadcast';
const NODE_ID = process.env.HOSTNAME ?? crypto.randomUUID();

const pub = new Redis(process.env.REDIS_URL!);          // publish + commands
const sub = new Redis(process.env.REDIS_URL!);          // subscribe-only mode
const localSockets = new Map<string, WebSocket>();      // this node's clients

const wss = new WebSocketServer({ port: 8080 });

wss.on('connection', (ws, req) => {
  const clientId = (req.headers['x-client-id'] as string) ?? crypto.randomUUID();
  localSockets.set(clientId, ws);
  ws.on('close', () => localSockets.delete(clientId));
});

// Publish a broadcast that every node (including this one) will receive.
async function broadcast(payload: unknown, targetClientId?: string) {
  const envelope = JSON.stringify({
    origin: NODE_ID,                 // lets nodes drop self-echo if desired
    target: targetClientId ?? null,  // null = broadcast to all sockets
    sentAt: Date.now(),              // used to compute fanout latency
    data: payload,
  });
  await pub.publish(FANOUT_CHANNEL, envelope);
}

// Every node subscribes once and delivers only to its own local sockets.
await sub.subscribe(FANOUT_CHANNEL);
sub.on('message', (_channel, raw) => {
  const msg = JSON.parse(raw) as {
    target: string | null; data: unknown;
  };
  for (const [id, ws] of localSockets) {
    if (msg.target && msg.target !== id) continue;       // targeted delivery
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(JSON.stringify(msg.data));                 // local delivery only
    }
  }
});

Two connection objects is the non-negotiable detail: once an ioredis client enters subscriber mode it cannot issue normal commands, so the publisher needs its own socket. For channel-scoped traffic — per-room or per-tenant — swap the single global channel for pattern subscriptions (psubscribe ws:room:*) so a node only receives events for rooms it actually hosts, cutting wasted deliveries dramatically.

Scaling & architecture #

Fan-out solves message reach, but three architectural concerns surface as the fleet grows: fan-out cost, presence accuracy, and delivery durability. The data path for a single targeted message looks like this:

The cost worth watching: with plain pub/sub, every node receives every published message regardless of whether it hosts a relevant socket. At N nodes that is O(N) delivery amplification. Pattern channels per room contain it; for large rooms or strict ordering, Redis Streams vs Pub/Sub compares the trade-off, since Streams add consumer groups and replay at the price of more bookkeeping.

Presence — who is online, in which room, on which node — is its own distributed-state problem. A naive in-memory set per node fragments the moment you scale out. Presence & Online Tracking covers Redis-backed presence with TTL heartbeats so a crashed node’s users expire instead of appearing online forever.

Delivery durability matters once a dropped message is a correctness bug rather than a cosmetic one. Pub/sub is fire-and-forget: a client mid-reconnect misses everything published during the gap. Message Delivery Guarantees layers acknowledgements and replay on top of fan-out to reach at-least-once semantics.

Finally, the fleet must size itself to load. CPU is a poor scaling signal for WebSocket nodes because idle connections cost almost nothing; the right signal is active connection count or broker lag. Horizontal Scaling on Kubernetes drives autoscaling on those custom metrics, and you should plan for graceful pod drain so scale-down does not sever live sockets without a reconnect signal.

Observability checklist #

You cannot scale what you cannot see. Export these metrics from every node and the broker; wire them into the conventions described in WebSocket Observability & Monitoring so dashboards stay consistent across the fleet.

ws_connections_active `ws_connections_active` — gauge of live sockets per node; the primary autoscale signal and the basis for even load distribution.
fanout_publish_latency_seconds `fanout_publish_latency_seconds` — histogram of broker publish round-trip; rising p99 means Redis is saturated or networked too far away.
fanout_delivery_lag_seconds — now - envelope.sentAt `fanout_delivery_lag_seconds` — `now - envelope.sentAt` at delivery; the true end-to-end fan-out latency a user feels.
redis_pubsub_lag `redis_pubsub_lag` — backlog or dropped-message count per subscriber; non-zero means subscribers can't keep up and are at risk of disconnect.
ws_messages_dropped_total — counter labelled by reason (buffer_overflow, socket_closed, no_local_match `ws_messages_dropped_total` — counter labelled by reason (`buffer_overflow`, `socket_closed`, `no_local_match`).
ws_reconnect_storm_rate `ws_reconnect_storm_rate` — reconnects per second; spikes during deploys indicate missing sticky routing or absent drain handling.
presence_set_size — distinct online users; should track ws_connections_active `presence_set_size` — distinct online users; should track `ws_connections_active` minus duplicate sessions, and diverging values reveal leaked presence keys.
node_cpu_seconds `node_cpu_seconds` and event-loop lag per pod — a saturated event loop delays every send regardless of broker health.

Failure modes #

Failure	Symptom	Root cause	Mitigation
Split-brain fan-out	Messages reach some clients, not others	A node lost its Redis subscriber connection but kept serving sockets	Health-check the subscriber; fail readiness probe if `sub` is disconnected so the pod is pulled
Pub/sub buffer overflow	Subscribers randomly dropped under load	Slow consumer exceeds `client-output-buffer-limit pubsub`	Raise the limit, shard channels per room, or move hot paths to Streams
Ghost presence	Users shown online after a node crash	In-memory presence never expired	Redis presence keys with heartbeat TTL; sweep on `expired` keyspace events
Missed messages on reconnect	Client gap after a brief disconnect	Fire-and-forget pub/sub has no replay	Add sequence numbers and at-least-once replay from a Stream
Scale-down socket sever	Mass disconnects on deploy	Pod terminated before draining sockets	`preStop` drain hook sending a reconnect close frame, generous `terminationGracePeriodSeconds`

Explore this area #

This area breaks into four focused topics that build on the fan-out core above. Start with Redis Pub/Sub Fan-Out for the broker-level broadcast mechanics, channel patterns, and the pub/sub-versus-Streams decision. Move to Presence & Online Tracking to build a distributed online-status system with TTL heartbeats that survives node failure. Read Message Delivery Guarantees when a dropped message becomes a correctness issue and you need acknowledgements and replay for at-least-once delivery. Finish with Horizontal Scaling on Kubernetes to autoscale the fleet on connection-count metrics and drain pods without breaking live sockets.

FAQ #

Why not just use sticky sessions and skip the broker entirely? #

Sticky sessions keep a given client on one node, but they do nothing for cross-node delivery. If user A on node 1 sends a chat message to user B on node 2, only a broker can carry it across. Stickiness and fan-out solve different problems: stickiness preserves per-connection locality during reconnects, while the broker delivers events between nodes. You almost always want both.

Does Redis pub/sub guarantee delivery? #

No. Redis pub/sub is fire-and-forget with no buffering, no acknowledgement, and no replay. A subscriber that is disconnected, reconnecting, or too slow simply misses messages. If a missed message is a correctness bug, layer acknowledgements and replay — typically with Redis Streams or a durable log — on top, as covered in Message Delivery Guarantees.

Should I scale WebSocket nodes on CPU? #

Rarely. Idle WebSocket connections consume memory and file descriptors but almost no CPU, so a CPU-based autoscaler under-provisions badly — you can hit the connection ceiling at 20% CPU. Scale on ws_connections_active or broker lag using a custom-metrics autoscaler instead.

What changes for Socket.IO versus raw ws? #

Socket.IO ships its own Redis adapter that wraps this exact fan-out pattern, so you write less plumbing but accept its framing, room model, and reconnect protocol. With raw ws you build the publish/subscribe loop yourself, as shown above, which keeps the wire format under your control and avoids the Socket.IO handshake overhead. The scaling principles — broker fan-out, TTL presence, replay for durability — are identical either way.

How many connections can one node hold before I must scale out? #

It depends on per-connection memory and message rate, but a tuned Node.js process with a 65k file-descriptor limit commonly handles 30k–50k mostly-idle sockets. The trigger to add nodes is rising event-loop lag or send latency, not a fixed count. Watch ws_connections_active against measured per-node capacity and autoscale before you reach it.

Redis Pub/Sub Fan-Out — broker-level broadcast, channel patterns, and pub/sub versus Streams.
Presence & Online Tracking — distributed online status with Redis TTL heartbeats.
Message Delivery Guarantees — acknowledgements and replay for at-least-once delivery.
Horizontal Scaling on Kubernetes — autoscaling on connection metrics with KEDA and graceful drain.
Backend WebSocket Connection Management — the single-node lifecycle, heartbeat, and routing foundation this builds on.

Back to Real-Time WebSocket Engineering