WebSocket Message Delivery Guarantees #

A WebSocket frame is only as reliable as the TCP connection underneath it. While the socket stays OPEN, frames arrive in order exactly once — TCP guarantees that. But the moment the connection drops, every frame still buffered in the kernel send queue, in flight on the wire, or sitting in the server’s application memory waiting to be written vanishes silently. Neither side gets an error for the messages that were almost sent. The client reconnects, the socket reopens, and the stream continues from “now” — leaving a gap nobody noticed until a user reports a missing order confirmation, a stale balance, or a chat message that never showed up.

This is the core failure mode this guide solves: a network blip between two healthy peers drops in-flight messages, and the default WebSocket API gives you no way to detect or recover them. TCP ordering and onclose events are not delivery guarantees. To get an actual guarantee — at-most-once, at-least-once, or exactly-once — you must build an application-level protocol on top of the socket: sequence numbers, acknowledgements, a server-side outbox, and a dedup window. The rest of this page is that protocol, in TypeScript.

Prerequisites #

This guide assumes you already operate a WebSocket service and want to add a delivery guarantee on top of it. Before implementing acks and replay buffers, you need:

  • A working reconnection loop on the client. Delivery guarantees are meaningless without reconnect, because the whole point is to survive a drop and resume. See Auto-Reconnection Strategies for backoff, jitter, and resume-token handling.
  • A way to identify a client across reconnects (a stable clientId or session token), so the server can key an outbox to the right peer even when the socket object changes.
  • For multi-node deployments, a shared fan-out layer so a message published on node A reaches the socket pinned to node B. Redis Pub/Sub Fan-Out covers that broadcast path; the outbox described here sits between fan-out and the socket write.
  • The broader capacity, sticky-routing, and shutdown concerns in Scaling Real-Time Infrastructure — delivery guarantees interact with every one of them.

The three delivery semantics differ only in how much machinery you add. At-most-once is the raw socket: fire and forget, zero acks, messages can be lost but never duplicated. At-least-once adds acks plus resend: every message is delivered, possibly more than once after a reconnect. Exactly-once layers a dedup window on top of at-least-once so duplicates are discarded at the consumer. Pure exactly-once across an unreliable network is impossible; what you actually ship is at-least-once delivery plus idempotent handling — effectively-once.

Ack and resend with a sequence-numbered outbox Server sends sequence-numbered messages into an outbox, the client acks each one, unacked messages survive a reconnect and are replayed. Server outbox per client Client dedup by seq Outbox buffer seq 7 8 9 (unacked) send seq=8 payload ack seq=8 Connection drops: seq 9 never acked outbox keeps 9 until ack arrives reconnect: replay seq>=9 client discards any seq it already saw

Core implementation #

Every message that needs a guarantee is wrapped in an ack envelope: a monotonically increasing seq, a type, and the payload. Acks reference the seq they confirm. The server keeps an outbox per client — an ordered map of sent-but-unacked envelopes — and a timer that resends anything still unacked after a timeout. The client tracks the highest seq it has processed and discards duplicates.

// envelope.ts — the wire contract shared by client and server
export interface Envelope<T = unknown> {
seq: number; // monotonic per-client sequence number, assigned by sender
type: string; // e.g. "order.update"; routes to a handler on the client
payload: T; // application data
}
// Acks travel as their own frame so they never collide with data envelopes.
export interface Ack { kind: "ack"; seq: number } // confirms one seq
export interface Nack { kind: "nack"; seq: number } // reject: handler failed, do NOT advance
// server-outbox.ts — at-least-once delivery with a per-client replay buffer
import { WebSocket } from "ws";
import type { Envelope, Ack, Nack } from "./envelope";

const ACK_TIMEOUT_MS = 5_000; // wait this long for an ack before resending
const MAX_RESEND = 5; // give up (and alert) after this many attempts
const MAX_OUTBOX = 1_000; // hard cap on buffered unacked messages per client

interface Pending { env: Envelope; attempts: number; timer: NodeJS.Timeout }

export class ClientOutbox {
private seq = 0; // next seq to assign
private pending = new Map<number, Pending>(); // seq -> unacked envelope
private ws: WebSocket | null = null; // current socket (swapped on reconnect)

// Attach (or re-attach after reconnect) the live socket and flush the backlog.
attach(ws: WebSocket) {
this.ws = ws;
for (const p of this.pending.values()) this.transmit(p); // replay every unacked seq
}

// Enqueue a new message: assign a seq, buffer it, then send.
send<T>(type: string, payload: T) {
if (this.pending.size >= MAX_OUTBOX) throw new Error("outbox overflow"); // backpressure
const env: Envelope<T> = { seq: ++this.seq, type, payload };
const p: Pending = { env, attempts: 0, timer: setTimeout(() => {}, 0) };
this.pending.set(env.seq, p);
this.transmit(p);
}

// Write the envelope and arm a resend timer; only the socket-open path actually writes.
private transmit(p: Pending) {
clearTimeout(p.timer);
if (this.ws?.readyState === WebSocket.OPEN) {
p.attempts++;
this.ws.send(JSON.stringify(p.env)); // duplicates are fine — client dedups by seq
}
// Arm the timer regardless: if the socket is closed, we resend on the next attach().
p.timer = setTimeout(() => this.onTimeout(p.env.seq), ACK_TIMEOUT_MS);
}

private onTimeout(seq: number) {
const p = this.pending.get(seq);
if (!p) return; // already acked, nothing to do
if (p.attempts >= MAX_RESEND) {
this.pending.delete(seq); // stop the loop
console.error(`delivery failed seq=${seq} after ${MAX_RESEND} attempts`);
return; // emit a metric / dead-letter here
}
this.transmit(p); // resend and re-arm
}

// Client confirmed delivery: drop it from the outbox so it is never resent.
onAck(ack: Ack) {
const p = this.pending.get(ack.seq);
if (!p) return; // duplicate ack — ignore
clearTimeout(p.timer);
this.pending.delete(ack.seq);
}

// Client rejected the message: leave it pending so the resend timer retries it.
onNack(_nack: Nack) { /* optionally apply a longer backoff before retransmit */ }
}
// client-dedup.ts — exactly-once *processing* via a sliding dedup window
import type { Envelope, Ack } from "./envelope";

const DEDUP_WINDOW = 4_096; // remember this many recent seqs to reject duplicates

export class DedupConsumer {
private seen = new Set<number>(); // seqs processed in the current window
private order: number[] = []; // FIFO of seqs, to evict the oldest

// Returns true if the envelope is new and was handled; false if it was a duplicate.
consume(env: Envelope, handle: (e: Envelope) => void, sendAck: (a: Ack) => void): boolean {
if (this.seen.has(env.seq)) {
sendAck({ kind: "ack", seq: env.seq }); // RE-ack: the server's first ack was likely lost
return false; // do NOT run the handler again
}
handle(env); // apply the side effect exactly once
this.seen.add(env.seq);
this.order.push(env.seq);
if (this.order.length > DEDUP_WINDOW) {
this.seen.delete(this.order.shift()!); // evict the oldest tracked seq
}
sendAck({ kind: "ack", seq: env.seq }); // ack AFTER the handler succeeds
return true;
}
}

The ordering rule that makes this correct: the client sends the ack after the handler runs, not on receipt. If the process crashes mid-handle, no ack is sent, the server’s timer fires, and the message is redelivered. Acking on receipt would turn at-least-once into at-most-once the instant a handler throws.

Configuration reference #

Parameter Type Default Production value Notes
ACK_TIMEOUT_MS number (ms) 5000 20005000 Set above p99 round-trip plus handler time, or you resend healthy messages.
MAX_RESEND number 5 38 Caps retransmits; on exhaustion, dead-letter the message and alert.
MAX_OUTBOX number 1000 5005000 Per-client cap. Hitting it means the consumer is too slow — apply backpressure.
DEDUP_WINDOW number (seqs) 4096 2x in-flight max Must exceed the largest possible resend burst, or duplicates slip past.
outbox persistence enum in-memory Redis / disk In-memory loses the buffer on server restart; persist for cross-restart guarantees.
resume token TTL number (s) 30 60300 How long the server holds an outbox for a disconnected client before discarding.

Edge cases & gotchas #

Duplicate delivery after reconnect. This is by design in at-least-once, not a bug. A message resent because its ack was lost in the drop will arrive twice. The dedup window absorbs it — but only if the window is larger than the worst-case resend burst. If you size DEDUP_WINDOW too small, an old seq evicts before its duplicate arrives and the handler runs twice. Make handlers idempotent anyway: a true exactly-once guarantee over a lossy network does not exist, so design the side effect (upsert by id, conditional write) to tolerate replays.

Unbounded outbox growth. A client that disconnects and never returns leaves its outbox pinned in memory, and a slow consumer lets unacked messages pile up to MAX_OUTBOX. Both are memory leaks at scale. Enforce the cap, expire abandoned outboxes after the resume-token TTL, and treat “outbox at cap” as a backpressure signal that should pause upstream production rather than silently drop data.

Ack storms. Acking every single message doubles your frame count and, under high throughput, the ack traffic alone can saturate the socket. Batch acks: send one cumulative ack for the highest contiguous seq received every N messages or every M milliseconds. The server then treats ack seq=42 as confirming everything up to 42. This cuts ack volume by an order of magnitude but requires the server to track a contiguous high-water mark, not just individual seqs.

Sequence number resets. If a client reconnects with a fresh session (new clientId) the server starts a new outbox at seq=1, and a stale dedup window on the client could wrongly reject the new low seqs. Tie the seq space to the session identity and clear the dedup window whenever the session — not just the socket — is replaced.

Verification #

Confirm acks flow and the outbox drains. In Chrome DevTools, open the Network tab, select the WS connection, and watch the Messages pane: every data envelope should be followed by a matching ack frame within ACK_TIMEOUT_MS. A growing run of unacked seqs means the consumer is stalled.

Force a drop and prove replay. With the socket open, kill the connection at the TCP layer and watch the outbox replay on reattach:

# Find the established WebSocket connection for the server process
ss -tnp | grep ':8080'
# Drop in-flight packets to the client to simulate a blip without a clean close
sudo tc qdisc add dev eth0 root netem loss 100% # then remove after a few seconds
sudo tc qdisc del dev eth0 root netem # restore; client reconnects, outbox replays

Assert on metrics, not vibes. Track these and alert on the first two:

// Metrics worth exporting from the outbox (e.g. to Prometheus)
ws_outbox_pending{client} // current unacked count — should hover near zero
ws_delivery_resend_total{type} // resends; a spike means acks or sockets are failing
ws_delivery_failed_total{type} // MAX_RESEND exhausted — these are real lost messages
ws_dedup_rejected_total // duplicates caught — nonzero is normal after reconnects

A healthy system shows ws_outbox_pending returning to zero between bursts, occasional ws_dedup_rejected_total after reconnects, and ws_delivery_failed_total flat at zero.

Guides in this area #

For a complete, end-to-end build of the acknowledgement protocol — envelope schema, the resend state machine, and an idempotent consumer wired into a real handler — work through At-Least-Once WebSocket Delivery with Acknowledgements.

FAQ #

Does WebSocket guarantee message delivery on its own? #

No. While the socket is OPEN, TCP gives you in-order, exactly-once delivery of the bytes it accepts. But TCP only guarantees delivery of data already handed to the kernel on an open connection. Anything buffered in the application, in the kernel send queue, or in flight when the connection drops is lost with no error surfaced to either side. Any real delivery guarantee is an application-level protocol of sequence numbers and acks layered on top of the socket.

What is the difference between at-least-once and exactly-once over WebSocket? #

At-least-once means every message is delivered but may arrive more than once after a reconnect-triggered resend. Exactly-once means each message affects the consumer once and only once. True exactly-once delivery across an unreliable network is provably impossible, so in practice you implement at-least-once delivery plus a dedup window and idempotent handlers — “effectively once.” The duplicates still arrive on the wire; the consumer discards them.

How do I avoid resending messages that were actually delivered? #

The resend timer fires when an ack does not arrive within ACK_TIMEOUT_MS. Set that timeout comfortably above your p99 round-trip time plus handler execution time. If it is too tight, slow-but-successful messages get resent, inflating ws_delivery_resend_total and load. The dedup window on the client makes these spurious resends harmless, but tuning the timeout keeps wasted traffic low.

Where should the outbox live for multi-node deployments? #

The outbox is per client and the client is pinned to one node via sticky routing, so the outbox can live in that node’s memory for speed. The risk is a node restart wiping every outbox at once. For guarantees that survive restarts, persist the outbox to Redis or disk keyed by clientId. Note that fan-out across nodes (via Redis Pub/Sub Fan-Out) feeds the outbox; it does not replace it.

Do I need acks for both directions? #

Usually the server-to-client direction is where gaps hurt most, because the server pushes state the client cannot re-request easily. Client-to-server messages can ride the same protocol in reverse: the server maintains a dedup window and the client an outbox. Apply guarantees only to the directions and message types that need them — heartbeats and presence pings are fine as at-most-once.

Back to Scaling Real-Time Infrastructure.