Scaling WebSocket broadcast across an N-node fleet with Redis Pub/Sub #
You ran a single Node process happily broadcasting to every connected socket with one loop over wss.clients. Then you added a second instance behind your load balancer, and broadcasts went dark: a client on node A never sees a message published by a client on node B. Each process only holds references to the sockets it terminated itself, so an in-memory fan-out can never reach connections living on a sibling node.
The fix is a shared message bus. Redis Pub/Sub gives every node a common channel: any node PUBLISHes an event, every node SUBSCRIBEd to that channel receives it, and each node re-emits the payload to its own local ws clients. This page shows the dual-client pattern, a versioned JSON envelope, and the one subtle trap — double-delivery — that bites teams the first time they wire it up. It sits under Redis Pub/Sub Fan-Out within the broader work of scaling real-time infrastructure.
Root cause #
A WebSocket connection is terminated by exactly one process. After the HTTP Upgrade, the TCP socket is owned by the kernel of one host and the file descriptor by one Node event loop. wss.clients is an in-memory Set of those descriptors — it is intrinsically node-local and has no awareness of connections on other instances. Sticky sessions (covered in Load Balancer Sticky Sessions) keep a given client pinned to its node, but they do nothing to bridge a message across nodes. Broadcast is the opposite problem: one event must reach all clients regardless of which node terminated them.
Redis Pub/Sub is a fire-and-forget message router. When a client calls SUBSCRIBE channel, Redis flags that connection as a subscriber; any PUBLISH channel payload is immediately copied to every current subscriber’s socket. There is no queue and no replay — if a node is disconnected at publish time, it never sees that message. That at-most-once delivery is exactly the boundary that pushes some teams toward streams instead; the trade-off is laid out in Redis Streams vs Pub/Sub for WebSocket fan-out.
Two constraints shape the implementation. First, a Redis connection in subscriber mode cannot issue normal commands — once it has run SUBSCRIBE, calling PUBLISH (or GET, SET) on the same client errors. So you need two Redis clients per node: one dedicated subscriber and one normal publisher. Second, the publishing node is itself a subscriber, so it receives its own broadcast back over the bus. If that node had already written the message to its local sockets at publish time, those clients get the payload twice. The clean rule: never write to local sockets directly on publish — write to local sockets only in the subscriber callback, so every node (including the origin) follows one identical path.
Resolution #
The implementation below uses ioredis and ws. Note the two distinct Redis clients, the typed JSON envelope, and the rule that local delivery happens only inside the subscriber handler.
import Redis from "ioredis";
import { WebSocketServer, WebSocket, RawData } from "ws";
import { randomUUID } from "node:crypto";
const BROADCAST_CHANNEL = "ws:broadcast"; // shared by every node in the fleet
const NODE_ID = process.env.NODE_ID ?? randomUUID(); // identifies the origin node
// Versioned envelope: keeps payloads forward-compatible and self-describing.
interface BroadcastEnvelope {
v: 1; // schema version — bump on breaking changes
id: string; // unique message id (dedupe / tracing)
origin: string; // NODE_ID that published — used to skip self-delivery if needed
type: string; // application event name, e.g. "price.update"
data: unknown; // the actual payload
ts: number; // epoch ms, set at publish time
}
// Two clients: a subscriber (locked to subscribe mode) and a normal publisher.
const sub = new Redis(process.env.REDIS_URL!);
const pub = new Redis(process.env.REDIS_URL!); // MUST be separate — sub mode blocks commands
const wss = new WebSocketServer({ port: 8080 });
// 1. Local delivery happens ONLY here, driven by the bus — never directly on publish.
// This is what guarantees every node (including the origin) delivers exactly once.
sub.on("message", (channel: string, raw: string) => {
if (channel !== BROADCAST_CHANNEL) return;
let env: BroadcastEnvelope;
try {
env = JSON.parse(raw);
} catch {
return; // drop malformed payloads rather than crash the handler
}
if (env.v !== 1) return; // ignore envelopes from an incompatible schema
const frame = JSON.stringify({ type: env.type, data: env.data, ts: env.ts });
for (const client of wss.clients) {
if (client.readyState === WebSocket.OPEN) {
client.send(frame); // re-emit to this node's local sockets
}
}
});
// 2. SUBSCRIBE must complete before any PUBLISH can be observed by this node.
sub.subscribe(BROADCAST_CHANNEL).catch((err) => {
console.error("subscribe failed", err);
process.exit(1); // a node that cannot subscribe is silently broken — fail loud
});
// 3. Publishing path: build the envelope, hand it to Redis, and return.
// Do NOT touch wss.clients here — the message comes back via the subscriber.
function broadcast(type: string, data: unknown): void {
const env: BroadcastEnvelope = {
v: 1,
id: randomUUID(),
origin: NODE_ID,
type,
data,
ts: Date.now(),
};
pub.publish(BROADCAST_CHANNEL, JSON.stringify(env)).catch((err) => {
console.error("publish failed", err); // surfaced for retry/alerting upstream
});
}
// Example: any client message triggers a fleet-wide broadcast.
wss.on("connection", (socket: WebSocket) => {
socket.on("message", (raw: RawData) => {
broadcast("chat.message", { text: raw.toString() });
});
});
The double-delivery guard is structural, not a runtime check: because no node ever writes to local sockets on the publish path, the origin node receives its own message back through sub.on("message") and delivers it through the same single code path as every other node. If you ever do want to suppress echo to the origin (for example, the publisher already optimistically rendered the event), compare env.origin === NODE_ID inside the handler and skip — but keep that an explicit, opt-in branch rather than scattering local writes across both paths.
Operational checklist #
- Confirm two separate Redis connections exist per node — sharing one client between
SUBSCRIBEandPUBLISH - Verify
sub.subscribe() - Send a test broadcast and assert clients on every
- Wrap
JSON.parse - Bump and gate on the envelope
v
FAQ #
Why do I need two Redis clients per node? #
A Redis connection that has issued SUBSCRIBE enters subscriber mode and rejects ordinary commands like PUBLISH. You need one client dedicated to subscribing and re-emitting, and a second normal client for publishing. They can point at the same Redis server.
How do I avoid delivering a message twice on the publishing node? #
Never write to wss.clients directly in your publish function. Write to local sockets only inside the subscriber’s message handler. The origin node receives its own published message back over the bus and delivers it through that one path, so every node follows identical logic and no client is double-served.
What happens to messages if a node is disconnected from Redis? #
Redis Pub/Sub is at-most-once with no replay. Any broadcast published while a node’s subscriber connection is down is lost for that node’s clients. If you need durability or catch-up after a gap, compare the trade-offs in Redis Streams vs Pub/Sub for WebSocket fan-out.
Does this still work with sticky sessions enabled? #
Yes, and they solve different problems. Sticky sessions keep one client pinned to one node across its lifetime; the Redis bridge fans a single event out to clients on all nodes. Run both: sticky routing for the connection, Pub/Sub for broadcast.
Can I target a subset of clients instead of broadcasting to everyone? #
Use distinct channels — for example ws:room:<id> — and have each node subscribe only to the channels it has interested clients for. The dual-client pattern and envelope stay the same; you just route by channel name instead of one global channel.
Related #
- Redis Pub/Sub Fan-Out — the parent area covering bridge patterns for multi-node fan-out.
- Redis Streams vs Pub/Sub for WebSocket fan-out — when at-most-once Pub/Sub is not enough and you need replay.
- Scaling Real-Time Infrastructure — the broader pillar on running WebSocket fleets horizontally.
- Load Balancer Sticky Sessions — pinning connections to nodes, the complement to cross-node broadcast.
Back to Redis Pub/Sub Fan-Out.