Server-Side WebSocket Routing Patterns #
A single physical socket carries dozens of logically distinct message types: presence pings, chat events, document edits, telemetry, control frames. The moment you accept more than one message shape on a connection, you need a router — a layer that reads each inbound frame, decides which handler owns it, and dispatches without blocking the event loop. Get this wrong and the failure is not subtle: a tenant’s “delete-all” command leaks into a neighbouring tenant’s channel, a malformed route field throws inside the message loop and kills every socket on that worker, or an unbounded JSON.parse on attacker-supplied frames pins a CPU core at 100%.
This guide covers the dispatch layer that sits between socket acceptance and your business logic. The parent area, Backend WebSocket Connection Management, handles the handshake and the connection lifecycle; here we focus on what happens to each frame after the socket is open and before it reaches application code: parsing, validating, namespacing by tenant, rate-limiting per channel, and fanning out across nodes. The target is O(1) dispatch lookup, hard isolation between tenants, and bounded fan-out during broadcast storms.
Prerequisites #
Routing assumes a healthy, authenticated, single-server connection already exists. Before applying anything here, confirm:
- Liveness is handled. Dead sockets must be evicted by a heartbeat before they reach the router, or you will dispatch to half-open connections. See Connection Lifecycle & Heartbeats.
- Identity is established on the upgrade. The router keys every decision off a trusted tenant/user identity. That identity must be pinned at handshake time, not read from the message body — see WebSocket Authentication & Authorization.
- Affinity exists for in-memory routing tables. If your route map lives in process memory, clients must return to the same node — see Load Balancer Sticky Sessions. For multi-node broadcast you will instead lean on Redis Pub/Sub fan-out.
Core implementation #
The router is a Map from a string key to a typed handler. Resolution is O(1); validation happens before dispatch; every tenant decision is keyed off the identity pinned at handshake, never off the message body. The connection state lives in a WeakMap so a closed socket is garbage-collected without manual bookkeeping.
import type { WebSocket } from 'ws';
// Identity pinned during the upgrade — never trust the message body for this.
interface ConnContext {
tenantId: string; // e.g. "acme" — set once at handshake
userId: string;
subscriptions: Set<string>; // channels this socket has joined, namespaced
}
type RouteHandler = (
payload: unknown,
ctx: ConnContext,
ws: WebSocket,
) => void | Promise<void>;
const MAX_FRAME_BYTES = 64 * 1024; // reject oversized frames before parsing
const routes = new Map<string, RouteHandler>();
const contexts = new WeakMap<WebSocket, ConnContext>();
export function registerRoute(name: string, handler: RouteHandler): void {
routes.set(name, handler); // build the table once, at startup
}
// Namespacing: a channel is ALWAYS scoped by the connection's tenant.
// "room:42" from tenant "acme" becomes "acme:room:42" — no cross-tenant collision.
export function namespaced(ctx: ConnContext, channel: string): string {
return `${ctx.tenantId}:${channel}`;
}
export function routeMessage(ws: WebSocket, raw: Buffer): void {
const ctx = contexts.get(ws);
if (!ctx) return; // socket not yet registered — drop
if (raw.byteLength > MAX_FRAME_BYTES) {
return closeWith(ws, 1009, 'frame too large'); // 1009 = message too big
}
let route: string;
let payload: unknown;
try {
({ route, payload } = JSON.parse(raw.toString('utf8')));
} catch {
return sendError(ws, 'MALFORMED_FRAME'); // never let a parse throw escape
}
// Allowlist check BEFORE lookup — an unknown route is a protocol error,
// and prevents prototype-pollution-style keys from probing the Map.
if (typeof route !== 'string' || !routes.has(route)) {
return sendError(ws, 'UNKNOWN_ROUTE');
}
const handler = routes.get(route)!;
// Isolate handler failures: one bad payload must not kill the socket loop.
Promise.resolve(handler(payload, ctx, ws)).catch((err) => {
console.error(`route ${route} failed for ${ctx.tenantId}`, err);
sendError(ws, 'HANDLER_ERROR');
});
}
export function registerConnection(ws: WebSocket, ctx: ConnContext): void {
contexts.set(ws, ctx); // WeakMap: GC-friendly, no leak on close
}
function sendError(ws: WebSocket, code: string): void {
if (ws.readyState === ws.OPEN) {
ws.send(JSON.stringify({ type: 'ERROR', code, ts: Date.now() }));
}
}
function closeWith(ws: WebSocket, code: number, reason: string): void {
if (ws.readyState === ws.OPEN) ws.close(code, reason);
}
Subscription handlers join channels through the namespaced helper so a tenant can never address another tenant’s room — the prefix is derived from the trusted context, not from anything the client sent. A per-channel rate limiter wraps the handler before it broadcasts:
// Token-bucket per (tenant, channel). Keeps one noisy room from starving
// the worker and bounds fan-out during a broadcast storm.
const PUBLISH_REFILL_PER_SEC = 20;
const PUBLISH_BURST = 50;
interface Bucket { tokens: number; updatedAt: number; }
const buckets = new Map<string, Bucket>();
function allowPublish(key: string): boolean {
const now = Date.now();
const b = buckets.get(key) ?? { tokens: PUBLISH_BURST, updatedAt: now };
const refill = ((now - b.updatedAt) / 1000) * PUBLISH_REFILL_PER_SEC;
b.tokens = Math.min(PUBLISH_BURST, b.tokens + refill);
b.updatedAt = now;
if (b.tokens < 1) { buckets.set(key, b); return false; }
b.tokens -= 1;
buckets.set(key, b);
return true;
}
registerRoute('publish', (payload, ctx) => {
const { channel, body } = payload as { channel: string; body: unknown };
const ns = namespaced(ctx, channel); // tenant-scoped channel id
if (!ctx.subscriptions.has(ns)) return; // must be subscribed to publish
if (!allowPublish(ns)) return; // drop over-rate publishes
redis.publish(ns, JSON.stringify({ from: ctx.userId, body }));
});
When a connection count exceeds one node, that redis.publish is what carries the message to sockets on other workers; each node subscribes to the channels its local clients hold and re-emits inbound Redis messages to them. The full multi-node fan-out topology — sharding, ordering, and back-pressure — is covered in Redis Pub/Sub fan-out.
Configuration reference #
| Parameter | Type | Default | Production value | Notes |
|---|---|---|---|---|
MAX_FRAME_BYTES |
number | none | 65536 |
Reject before JSON.parse; mirror in ws maxPayload. |
PUBLISH_REFILL_PER_SEC |
number | unlimited | 20 |
Steady-state publishes per channel per second. |
PUBLISH_BURST |
number | unlimited | 50 |
Token-bucket ceiling; absorbs short spikes. |
routes table build |
enum | per-message | startup-only | Populate Map once; never mutate per-connection. |
namespaced prefix source |
string | — | handshake context | Derive tenant from pinned identity, never the body. |
ws.maxPayload (server opt) |
number | 104857600 |
65536 |
Library-level cap; backstops MAX_FRAME_BYTES. |
backpressure threshold |
number | none | ws.bufferedAmount > 1MB |
Pause/drop on slow consumers before OOM. |
Edge cases & gotchas #
- Tenant leakage via the message body. The single most dangerous bug. If
channelis used un-prefixed,tenant-Acan subscribe totenant-B’s room by guessing its name. Always run channel strings throughnamespaced(ctx, …)so the tenant prefix comes from the handshake identity, not the frame. - A throwing handler killing the whole worker. An uncaught exception inside the message loop propagates up and can crash the process, dropping every socket on that node. Wrap dispatch in
Promise.resolve(...).catch(...)and neverawaitun-guarded handler code in the read path. - Unbounded parse on hostile frames.
JSON.parseon a multi-megabyte frame blocks the event loop. EnforceMAX_FRAME_BYTESand the library-levelmaxPayloadso a malicious client cannot stall dispatch for everyone. - Slow consumers and back-pressure. A subscriber that stops reading makes
ws.bufferedAmountgrow without bound during fan-out, leaking memory. CheckbufferedAmountbefore broadcasting and drop or disconnect laggards rather than buffering forever.
Verification #
Confirm the router behaves under both normal and adversarial traffic:
# 1. Sockets are actually open and owned by the node process (not half-open).
ss -tnp 'sport = :8080' | head
# 2. Drive a tenant-isolation probe: subscribe as tenant A, attempt a
# cross-tenant publish, assert it is dropped (expects no delivery).
wscat -c "wss://api.example.com/ws?token=$TENANT_A_JWT" \
-x '{"route":"publish","payload":{"channel":"tenant-b:secret","body":1}}'
// 3. Metric assertion in a smoke test: unknown routes must be rejected,
// not silently dispatched.
const res = await sendFrame(ws, { route: '__proto__', payload: {} });
assert.equal(res.code, 'UNKNOWN_ROUTE');
// 4. Rate limiter caps fan-out: a burst above PUBLISH_BURST yields drops.
const accepted = await flood(ws, 'publish', 200);
assert.ok(accepted <= PUBLISH_BURST + PUBLISH_REFILL_PER_SEC);
In Chrome DevTools, open the Network → WS frames panel and watch a publish round-trip: an over-rate or cross-tenant frame should produce an ERROR frame or no echo, never a delivered payload.
Guides in this area #
- Multi-Tenant WebSocket Channel Namespacing — the deep dive on scoping every channel by tenant, enforcing isolation on subscribe/publish, and preventing cross-tenant leakage at the routing layer.
FAQ #
Should I use Socket.IO namespaces or roll my own router? #
Socket.IO namespaces and rooms give you a namespacing primitive for free, but they couple you to the Socket.IO protocol and its reconnection model. With the raw ws package you own the dispatch table and the wire format, which is what this guide assumes. If you already run Socket.IO, map its namespace to the tenant prefix shown here and keep the same allowlist and rate-limit guards — the security properties do not come from the library.
How do I route messages to a socket connected to a different node? #
In-process Map lookup only reaches sockets on the local worker. For cross-node delivery, publish to a channel that every node subscribes to and let each node re-emit to its local subscribers. That is exactly the Redis Pub/Sub fan-out pattern; the redis.publish(ns, …) call in the publish handler above is the hand-off point.
Where should tenant identity come from? #
From the connection context pinned during the upgrade handshake, validated by WebSocket Authentication & Authorization. Never read tenantId from the message body — a client can set any value there, which defeats namespacing entirely.
Does an in-memory route table work behind a load balancer? #
Yes, as long as the load balancer keeps a client pinned to the node that holds its subscriptions. Without affinity, a reconnect can land on a node whose Map has no record of the client’s channels. Pair the router with Load Balancer Sticky Sessions, or externalise the subscription registry to Redis so any node can serve any client.
Related #
- Multi-Tenant WebSocket Channel Namespacing — scoping channels per tenant and enforcing isolation on every dispatch.
- WebSocket Authentication & Authorization — pinning the trusted identity the router keys every decision off.
- Redis Pub/Sub Fan-Out — carrying routed messages across nodes once one worker is no longer enough.
- Load Balancer Sticky Sessions — keeping clients pinned to the node that owns their in-memory route table.
- Connection Lifecycle & Heartbeats — evicting dead sockets before they reach the dispatch layer.