When to use WebSockets over Server-Sent Events #
You shipped a real-time feature on Server-Sent Events because EventSource reconnects for free, and now clients sit at readyState: CLOSED without ever firing onerror, sequence numbers diverge, and the UI lags behind server truth under load. You are landing here because you need a precise answer to one question: is this a tuning problem on your current SSE stack, or a signal that the workload actually needs WebSockets? The short version — if clients mutate state frequently and need ordered, acknowledged, bidirectional delivery, SSE is the wrong transport and no amount of timeout tuning will fix it. This guide isolates the failure boundary, explains the root cause, and gives one production-ready WebSocket configuration that survives proxy idle timeouts.
Before guessing, isolate where frames die. Compare OS-level sockets to your application registry with ss -tn state established '( sport = :8080 )' | wc -l against your active connection count, watch the DevTools Network tab for frames stuck in (pending), and audit logs for missed pong acknowledgments. If the two counts disagree, you have zombie sockets, not an application bug.
Root cause #
Two distinct problems hide behind “SSE keeps dropping,” and they have different fixes.
The first is transport-level. Reverse proxies — Nginx, AWS ALB, Cloudflare — silently terminate idle TCP connections after an idle window without sending FIN or RST. The browser’s EventSource masks this because it auto-reconnects, so SSE appears resilient even while every long-idle stream is being recycled. The moment you switch to WebSockets for bidirectional traffic, the same idle timeout becomes visible as a half-open socket: the onclose event never fires until the next frame write fails, because no close frame was ever exchanged. TCP keepalives do not save you here — they operate below the proxy’s application-layer idle accounting and are usually configured with intervals longer than the proxy window. WebSocket ping/pong are protocol frames the proxy counts as activity, which is exactly why an application-layer heartbeat is the correct liveness mechanism.
The second is structural, and it is the real decision. SSE is unidirectional by specification: the server writes an text/event-stream body and the client cannot send on that channel. To push a mutation the client opens a separate HTTP request, so you lose ordering guarantees between the upstream POST and the downstream event, and you pay a full request round-trip per mutation. For a live feed that is fine. For collaborative editing, multiplayer state, or any acknowledged mutation queue, you need one ordered, full-duplex channel — which is what the WebSocket vs SSE vs WebRTC comparison calls the bidirectional requirement. No timeout tuning converts a unidirectional transport into an ordered bidirectional one.
Resolution #
The fix has two halves: make the proxy idle window strictly larger than the heartbeat so the proxy never kills a live socket first, and track liveness in the application so genuinely dead sockets are reaped. Set proxy_read_timeout well above the ping interval — for a long-lived socket with a 30s heartbeat, 86400s (24h) is the common production choice because it delegates all liveness to the application layer.
import { WebSocketServer, WebSocket } from "ws";
const HEARTBEAT_INTERVAL_MS = 30_000; // must be < proxy_read_timeout below
const wss = new WebSocketServer({ port: 8080 });
// Track liveness per socket so the proxy never out-races the heartbeat.
interface Liveness extends WebSocket {
isAlive: boolean;
}
wss.on("connection", (raw) => {
const ws = raw as Liveness;
ws.isAlive = true;
ws.on("pong", () => { ws.isAlive = true; }); // pong proves the peer is reachable
const heartbeat = setInterval(() => {
if (!ws.isAlive) {
// No pong since the last tick: a zombie half-open socket. Reap it now
// rather than waiting for a write to fail minutes later.
clearInterval(heartbeat);
return ws.terminate(); // terminate(), not close() — peer is unreachable
}
ws.isAlive = false; // cleared until the next pong flips it back true
ws.ping(); // ws swallows send errors on a dead socket; pong drives the state
}, HEARTBEAT_INTERVAL_MS);
ws.on("close", () => clearInterval(heartbeat)); // prevent the timer leaking
ws.on("message", (data) => {
try {
const { seq } = JSON.parse(data.toString());
// Acknowledge by sequence number so the client can confirm ordered delivery.
ws.send(JSON.stringify({ ack: seq, status: "ok" }));
} catch {
ws.send(JSON.stringify({ error: "INVALID_PAYLOAD" }));
}
});
});
location /ws {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 86400s; # >> heartbeat; application owns liveness
proxy_send_timeout 86400s;
proxy_buffering off; # never buffer a streamed/duplex body
}
The acknowledgment-by-sequence pattern is the concrete reason to leave SSE: the client can confirm each mutation landed in order. If your traffic is genuinely server-to-client and you only adopted WebSockets to dodge proxy timeouts, the cheaper move is HTTP/2 SSE — see HTTP/2 Server Push vs WebSocket for why multiplexing erases SSE’s connection-count penalty.
Operational checklist #
- Confirm
proxy_read_timeout - Verify
ws.terminate()— notws.close() - Clear the heartbeat interval on
closeand onterminate - Expose
ws_connections_active
FAQ #
Does SSE work behind a proxy that buffers responses? #
Only if buffering is disabled for that route. A buffering proxy holds the text/event-stream body until its buffer fills, so events arrive in bursts instead of in real time. Set proxy_buffering off (or X-Accel-Buffering: no) for SSE just as you do for WebSockets.
Why does my WebSocket onclose never fire when the proxy times out? #
Because the proxy dropped the TCP connection without a WebSocket close frame, leaving a half-open socket. Your runtime only learns the peer is gone on the next failed write or missed pong. The heartbeat above closes that gap by reaping any socket that misses a pong.
Can I avoid switching to WebSockets by tuning SSE timeouts? #
If your data is strictly server-to-client, yes — raise the idle timeout and rely on EventSource reconnection. Tuning cannot give you client-to-server ordering or per-mutation acknowledgments, so if clients mutate state frequently you need WebSockets regardless of timeouts.
How many concurrent SSE streams is too many on HTTP/1.1? #
Browsers cap connections per origin at roughly six on HTTP/1.1, and each SSE stream holds one open. A few dashboards exhaust that budget and block other requests. On HTTP/2 the streams multiplex over one connection, which removes the cap as a deciding factor.
Is TCP keepalive enough instead of application pings? #
No. TCP keepalive intervals default to hours and sit below the proxy’s idle accounting, so the proxy still reaps the socket first. Application-layer ping/pong frames register as activity the proxy forwards and gives you deterministic liveness.
Related #
- WebSocket vs SSE vs WebRTC: Protocol Guide — the full decision matrix across all three transports.
- HTTP/2 Server Push vs WebSocket — when multiplexed SSE beats a WebSocket upgrade.
- Configuring Nginx for WebSocket Upgrades — the upgrade and timeout headers referenced above, in full.
- Real-Time Protocol Selection & Architecture — the broader area covering handshakes, security, and protocol choice.