Redis Streams vs Pub/Sub for WebSocket Fan-Out #

You already bridge your WebSocket nodes with Redis Pub/Sub Fan-Out, and it works — until a node restarts mid-deploy and every client connected to it silently loses the messages published during the gap. Now you are staring at a Redis docs page weighing PUBLISH/SUBSCRIBE against XADD/XREADGROUP, trying to decide whether the upgrade to Streams is worth the added moving parts. This guide is the decision: it lays out exactly what pub/sub buys you, what Streams buys you, and the latency, memory, ordering, and reconnect-replay trade-offs that should drive the call — then shows a Streams consumer-group fan-out you can drop into a node.

Root cause #

The two primitives sit at opposite ends of a durability spectrum, and the difference is structural, not a tuning knob.

Pub/sub is fire-and-forget. When a node calls PUBLISH ws:room:42 <envelope>, Redis delivers that envelope to whatever clients are subscribed at that instant and then forgets it. There is no log, no offset, no acknowledgement. A subscriber that connects one millisecond later, or a node restarting through a deploy, never sees the message — it was never stored. The PUBLISH return value tells you how many subscribers received it, which is the only receipt you will ever get. This is why a Redis pub/sub broadcast is perfect for live cursors and typing indicators and structurally wrong for anything where a dropped message is a correctness bug.

Streams are an append-only log. XADD ws:room:42 * field value appends an entry with a monotonically increasing ID (<ms>-<seq>) and keeps it until you trim. Consumers read by ID range with XREAD, or — the interesting part for fan-out — join a consumer group with XREADGROUP. A group tracks a last-delivered ID and a Pending Entries List (PEL) of messages handed out but not yet XACK-ed. A node that crashes after reading but before acknowledging can reclaim those exact entries on restart with XAUTOCLAIM. That gives you at-least-once delivery, replay from any offset, and natural backpressure: slow consumers simply read fewer entries per XREADGROUP call rather than being drowned by an unthrottled push.

The cost is real. Streams consume memory proportional to retained entries, demand a trimming strategy (MAXLEN/MINID), and force you to think about acknowledgement, idempotency, and stuck-consumer recovery — concerns pub/sub lets you ignore entirely because it never remembers anything.

Side-by-side #

Dimension	Pub/Sub	Streams
Delivery semantics	At-most-once (fire-and-forget)	At-least-once with consumer groups
Persistence / replay	None — gone after publish	Append-only log, replay by ID
Missed while offline?	Lost forever	Retained until trimmed
Ordering	Per-channel, not durable	Strict per-stream by entry ID
Acknowledgement	None	`XACK` + Pending Entries List
Backpressure	Push, can overwhelm slow subs	Pull, `COUNT` bounds each read
Latency	Lowest (no write, no fsync path)	Low, plus one write + read round-trip
Memory cost	~0 (no storage)	Grows with retained entries
Crash recovery	Nothing to recover	`XAUTOCLAIM` reclaims the PEL
Operational complexity	Minimal	Trimming, idempotency, claim loops

The single sentence that decides it: if a subscriber being momentarily absent is allowed to mean a message is gone forever, pub/sub is correct and cheaper. If absence must mean delayed, not lost, you need a log — and on Redis that log is Streams. For the formal delivery-semantics framing behind this, see message delivery guarantees.

Resolution #

Here is a Streams-based fan-out for a single node. It runs a consumer-group reader loop that pulls new entries, delivers them to local WebSocket sockets, acknowledges, and on startup reclaims anything a previous crash left pending. Each node uses a distinct consumer name but shares one group so the PEL is per-consumer and crash recovery is targeted.

import { WebSocketServer, WebSocket } from 'ws';
import Redis from 'ioredis';

const STREAM_KEY = 'ws:room:42';                          // one stream per room
const GROUP = 'ws-fanout';                                // shared group across nodes
const CONSUMER = process.env.NODE_ID ?? crypto.randomUUID(); // unique per node
const BLOCK_MS = 5000;                                    // long-poll window per read
const BATCH = 50;                                         // backpressure: cap entries/read
const MAXLEN = 10_000;                                    // trim cap to bound memory

const redis = new Redis(process.env.REDIS_URL!);          // reader: blocks in XREADGROUP
const writer = new Redis(process.env.REDIS_URL!);         // writer: XADD path, never blocks
const localSockets = new Set<WebSocket>();                // sockets owned by THIS node

// Create the group once; ignore BUSYGROUP if it already exists. MKSTREAM lets the
// group exist before the first XADD. '$' = deliver only entries added from now on.
async function ensureGroup() {
  try {
    await redis.xgroup('CREATE', STREAM_KEY, GROUP, '$', 'MKSTREAM');
  } catch (err: any) {
    if (!String(err?.message).includes('BUSYGROUP')) throw err;
  }
}

// Publish path: append an entry. '*' lets Redis assign the monotonic ID, and
// MAXLEN '~' trims approximately (cheap) so the log cannot grow unbounded.
function publish(payload: unknown) {
  return writer.xadd(STREAM_KEY, 'MAXLEN', '~', MAXLEN, '*', 'data', JSON.stringify(payload));
}

// Deliver one stream entry to every open local socket, then report whether
// delivery succeeded so the caller can decide to XACK.
function deliver(fields: string[]): boolean {
  const idx = fields.indexOf('data');
  if (idx === -1) return true;                            // malformed: ack to skip
  const frame = fields[idx + 1];
  for (const ws of localSockets) {
    if (ws.readyState === WebSocket.OPEN) ws.send(frame); // local send only
  }
  return true;
}

// On startup, reclaim entries this consumer (by name) was handed but never
// acked — the crash-recovery path. XAUTOCLAIM walks the PEL from id '0'.
async function reclaimPending() {
  let cursor = '0-0';
  do {
    const [next, entries] = await redis.xautoclaim(
      STREAM_KEY, GROUP, CONSUMER, 60_000, cursor, 'COUNT', BATCH,
    ) as [string, [string, string[]][]];
    for (const [id, fields] of entries) {
      if (deliver(fields)) await redis.xack(STREAM_KEY, GROUP, id);
    }
    cursor = next;
  } while (cursor !== '0-0');                             // '0-0' = scan complete
}

// Main loop: block for new entries, deliver, ack. '>' means "messages never
// delivered to any consumer in this group" — the live tail.
async function consume() {
  while (true) {
    const res = await redis.xreadgroup(
      'GROUP', GROUP, CONSUMER, 'COUNT', BATCH, 'BLOCK', BLOCK_MS, 'STREAMS', STREAM_KEY, '>',
    ) as [string, [string, string[]][]][] | null;
    if (!res) continue;                                  // BLOCK timed out — loop again
    for (const [, entries] of res) {
      for (const [id, fields] of entries) {
        if (deliver(fields)) await redis.xack(STREAM_KEY, GROUP, id); // at-least-once
      }
    }
  }
}

const wss = new WebSocketServer({ port: 8080 });
wss.on('connection', (ws) => {
  localSockets.add(ws);
  ws.on('message', (data) => publish(JSON.parse(data.toString())));
  ws.on('close', () => localSockets.delete(ws));
});

await ensureGroup();
await reclaimPending();                                  // recover before tailing live
consume().catch((err) => { console.error('consumer loop died', err); process.exit(1); });

Two things make this safe. First, XACK runs only after a successful local send, so a crash between read and send leaves the entry in the PEL for XAUTOCLAIM to recover — that is what makes it at-least-once rather than at-most-once. Second, BLOCK plus a bounded COUNT is the backpressure mechanism: a node that falls behind reads at most BATCH entries per cycle and the rest wait in the log, instead of being pushed at it faster than it can flush to sockets. Because delivery is at-least-once, your client frames must be idempotent (carry an entry/message ID the client dedupes on) — a redelivery after reclaim will otherwise show as a duplicate.

Operational checklist #

Confirm XADD uses MAXLEN '~' <cap> (or a MINID Confirm `XADD` uses `MAXLEN '~' ` (or a `MINID` time bound) so the stream cannot grow without limit and exhaust Redis memory.
Verify each node sets a unique consumer name but the same group, so the Pending Entries List is partitioned per node and XAUTOCLAIM Verify each node sets a **unique** consumer name but the **same** group, so the Pending Entries List is partitioned per node and `XAUTOCLAIM` reclaims only that node's stuck entries.
Run reclaimPending() (or XAUTOCLAIM) on startup before entering the live > Run `reclaimPending()` (or `XAUTOCLAIM`) on startup before entering the live `>` loop, so a crashed node's unacked entries are replayed, not stranded.
Make client-facing frames idempotent (include the stream entry ID) so an at-least-once redelivery is deduped on the client, not rendered twice.
Alarm on PEL depth (XPENDING ws:room:42 ws-fanout Alarm on PEL depth (`XPENDING ws:room:42 ws-fanout`) — a growing pending count means a consumer is stuck or dead and entries are not being acknowledged.
Load-test the BLOCK/COUNT pair under your real message rate; too small a COUNT Load-test the `BLOCK`/`COUNT` pair under your real message rate; too small a `COUNT` caps throughput, too large defeats backpressure.
Document the rollback: if Streams overhead is not justified, the fallback is the simpler pub/sub broadcast Document the rollback: if Streams overhead is not justified, the fallback is the simpler [pub/sub broadcast](/scaling-real-time-infrastructure/redis-pubsub-fanout/scaling-websocket-broadcast-with-redis-pubsub/) — keep both paths behind a flag during migration.

FAQ #

Is Redis Streams slower than pub/sub for fan-out? #

Marginally, and almost never enough to matter for WebSocket traffic. Pub/sub skips the write entirely, so its tail latency is the lowest Redis offers. Streams add an XADD write plus an XREADGROUP read, typically a sub-millisecond round-trip on a local instance. The latency you actually feel in production is dominated by network and socket flush, not by which Redis primitive carries the message. Choose on durability needs first; if both satisfy correctness, then prefer pub/sub for the lowest latency.

How do I stop a Redis Stream from eating all my memory? #

Trim it. Pass MAXLEN '~' <cap> on every XADD to keep roughly the last N entries (the ~ makes trimming approximate and cheap), or use MINID to drop entries older than a timestamp. Without trimming, the log is append-only forever and will grow until Redis hits maxmemory. Size the cap to your worst-case reconnect-replay window — how far back a returning client legitimately needs to catch up — not to “keep everything.”

What happens to unacknowledged messages if a node crashes? #

They stay in that consumer’s Pending Entries List. When the node restarts under the same consumer name, XAUTOCLAIM (or XCLAIM after XPENDING) reclaims entries idle longer than your threshold and redelivers them, so nothing read-but-not-acked is lost. This is the entire point of consumer groups over plain XREAD. If the node never comes back, run a reaper that claims the dead consumer’s PEL onto a live one so its backlog still drains.

Can I keep pub/sub for some channels and Streams for others? #

Yes, and you usually should. Route ephemeral, high-frequency, loss-tolerant traffic (cursors, presence pings, typing) over pub/sub fan-out for minimum latency and zero storage, and route must-not-drop traffic (chat history, order updates, notifications) over Streams. They share the same Redis and the same node process; only the transport per message class differs. Pick per topic, not per system.

Do consumer groups distribute messages or copy them to every node? #

Within one group, each entry goes to exactly one consumer — that is load distribution, not fan-out, and it is the opposite of what broadcasting to all sockets needs. For WebSocket fan-out you want every node to deliver to its own local sockets, so give each node its own group (every group sees every entry) while keeping one consumer per group for crash recovery. Use a single shared group only when you want competing consumers to split the work, such as a pool of workers persisting messages.

Redis Pub/Sub Fan-Out for WebSockets — the parent area: bridging nodes with fire-and-forget pub/sub, the cheaper default.
Scaling WebSocket Broadcast with Redis Pub/Sub — the pub/sub broadcast loop with batching and backpressure, and your rollback path from Streams.
Message Delivery Guarantees — at-least-once delivery, acknowledgements, and idempotency, the semantics Streams enforce.
Scaling Real-Time Infrastructure — fan-out, presence, delivery guarantees, and horizontal scaling across the fleet.

Back to Redis Pub/Sub Fan-Out