Load Balancer Sticky Sessions for WebSocket #

A user opens your collaborative editor, types for ten minutes against backend node-3, then a brief network blip triggers a reconnect. The load balancer, free to pick any healthy target, lands them on node-7. Their in-memory document cursor, presence color, and unacked edit buffer all lived on node-3 — and now they are gone. The client sees a blank canvas, fires a full resync, and you watch a spike of duplicate GET /document/:id hammering your database. This is the failure that session affinity exists to prevent: when WebSocket state lives in process memory, the routing layer must pin each client to the same backend instance for the life of the connection — and, ideally, across reconnects.

This page covers the L4/L7 routing tier: how nginx, HAProxy, and AWS ALB implement affinity for long-lived upgraded connections, how to preserve the Upgrade handshake through the proxy, and how to fail over without corrupting state. Application-level connection handling sits one layer up in Backend WebSocket Connection Management.

Prerequisites #

Before configuring affinity, make sure the following are already in place:

A working WebSocket upgrade path through your proxy. Header preservation (Upgrade/Connection) is covered in Configuring nginx for WebSocket upgrades; affinity without a clean upgrade is meaningless.
Server-side heartbeats so the routing layer and the application agree on liveness — see Connection Lifecycle & Heartbeats. Affinity drift is invisible until a half-open connection is detected.
A client reconnect policy with backoff, since affinity will break during deploys — see Auto-Reconnection Strategies.
An honest audit of what state is actually in-process. If everything you need is already in Redis or Postgres, you may not need affinity at all (see the first gotcha below).

How affinity pins a client to a backend #

The hard part of sticky routing is not the happy path — it is what happens on the reconnect. A new TCP connection arrives with no inherent memory of where the last one went; affinity reconstructs that memory from a cookie or a hash of the client. The diagram below traces both: the first connection establishes the pin, and the reconnect honours it.

Core implementation #

Affinity is configured at the proxy, but the application still has to cooperate at shutdown so the pin can be released cleanly. The TypeScript below runs alongside your ws server: it tracks live sockets and, on SIGTERM, tells clients to reconnect before the proxy yanks the target — which is the difference between a graceful failover and a wall of 1006 abnormal closures.

import { WebSocketServer, WebSocket } from "ws";

const DRAIN_GRACE_MS = 2_000;        // give clients time to ack the drain notice
const CLOSE_GOING_AWAY = 1001;       // RFC 6455: endpoint going away (deploy/restart)

const wss = new WebSocketServer({ port: 8080 });
const liveSockets = new Set<WebSocket>();

wss.on("connection", (ws) => {
  liveSockets.add(ws);
  ws.on("close", () => liveSockets.delete(ws)); // keep the set honest
});

// During a rolling deploy the orchestrator deregisters this target, then SIGTERMs us.
// We must push clients off BEFORE the LB stops sending traffic, or affinity breaks mid-flight.
process.on("SIGTERM", () => {
  for (const ws of liveSockets) {
    if (ws.readyState !== WebSocket.OPEN) continue;
    // Tell the client to back off and reconnect — it will land on a healthy target
    ws.send(JSON.stringify({ type: "SERVER_DRAIN", reconnectInMs: DRAIN_GRACE_MS }));
    ws.close(CLOSE_GOING_AWAY, "draining"); // clean close => no 1006 on the client
  }
  // Exit only after the grace window so in-flight frames flush
  setTimeout(() => process.exit(0), DRAIN_GRACE_MS);
});

The proxy side is where the actual pinning happens. Three common deployments follow.

nginx (open-source) — ip_hash affinity. The sticky cookie directive ships only with nginx Plus or the third-party nginx-sticky-module-ng. On stock nginx, ip_hash is the no-module fallback — it hashes the client address to a backend, so the same client returns to the same node as long as its source IP is stable.

upstream ws_backend {
  ip_hash;                       # pin by client IP (first 3 octets for IPv4)
  server 10.0.1.10:8080;
  server 10.0.1.11:8080;
}

server {
  location /ws/ {
    proxy_pass http://ws_backend;
    proxy_http_version 1.1;                      # required for upgrade
    proxy_set_header Upgrade $http_upgrade;      # preserve the handshake
    proxy_set_header Connection "upgrade";
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_read_timeout 86400s;                    # don't kill idle long-lived sockets
  }
}

HAProxy — cookie-based affinity with leastconn. This is the most precise of the three: HAProxy injects a SERVERID cookie naming the chosen server, and subsequent requests carrying it skip the balancer and go straight back. leastconn then governs only new, unpinned connections — the right load metric for long-lived sockets, since round-robin counts requests, not concurrent connections.

backend ws_nodes
  balance leastconn                              # new conns -> least-loaded node
  cookie SERVERID insert indirect nocache        # inject affinity cookie, don't cache it
  option http-server-close
  option forwardfor                              # add X-Forwarded-For for the app
  server node1 10.0.1.10:8080 check cookie node1 # cookie value pins to this server
  server node2 10.0.1.11:8080 check cookie node2

AWS ALB — application cookie stickiness (Terraform). Managed L7 balancers preserve the upgrade automatically, but stickiness and timeouts must be set explicitly. The deep walkthrough — idle-timeout math, health checks, and draining — is in Configuring AWS ALB for WebSocket sticky sessions.

resource "aws_lb_target_group" "ws_sticky" {
  name     = "ws-realtime-sync"
  port     = 8080
  protocol = "HTTP"                  # ALB upgrades HTTP listeners to WS transparently
  vpc_id   = var.vpc_id

  stickiness {
    type            = "app_cookie"   # honour an app-set cookie, not a duration cookie
    cookie_name     = "WS_SESSION_ID"
    cookie_duration = 3600
    enabled         = true
  }

  deregistration_delay {             # let live sockets drain before removal
    timeout_seconds = 300
  }

  health_check {
    path                = "/health"
    matcher             = "200"
    interval            = 15
    healthy_threshold   = 2
    unhealthy_threshold = 3
  }
}

Configuration reference #

Parameter	Type	Default	Production value	Notes
`proxy_read_timeout` (nginx)	duration	`60s`	`86400s`	Below this, idle WebSockets are killed by the proxy, not the app.
`balance` (HAProxy)	enum	`roundrobin`	`leastconn`	Round-robin counts requests; long-lived sockets need connection-count balancing.
`cookie ... insert indirect nocache`	directive	off	on	Injects `SERVERID`; `nocache` stops shared caches from leaking it.
ALB idle timeout	seconds	`60`	`≥90` (or `3600`)	Must exceed your heartbeat interval, or the ALB drops the socket.
ALB `stickiness.type`	enum	`lb_cookie`	`app_cookie`	App cookie survives target-group changes and is app-controlled.
`deregistration_delay.timeout_seconds`	seconds	`300`	`300`+	Window for live sockets to drain on deploy before forced close.
`ip_hash` (nginx)	flag	off	situational	Breaks for clients behind rotating NAT/CGNAT; prefer cookies when available.

Edge cases & gotchas #

You may not need affinity at all. If your in-process state is only ephemeral (presence color, cursor position) and everything durable already lives in Redis or Postgres, sticky routing buys you nothing and costs you balance. Externalize state and route freely — that is the path taken by horizontal scaling on Kubernetes, where pods are deliberately interchangeable. Audit before you pin.
ip_hash collapses behind NAT and mobile networks. Carrier-grade NAT puts thousands of users on one egress IP, so they all hash to one node, while a client roaming WiFi-to-cellular changes IP and loses its pin mid-session. Cookie affinity is immune to both; reach for ip_hash only when you genuinely cannot set a cookie.
Affinity is not failover. A cookie pins to a node that may have died. The proxy must health-check and re-pin on failure rather than blackholing the client. On a hard upstream failure, close the socket with 1011 and let the client reconnect cleanly instead of silently re-routing the same cookie to a cold node.
Idle timeout shorter than heartbeat = phantom drops. If the ALB idle timeout (60s) is below your heartbeat interval, the balancer severs sockets the app still believes are healthy. Always set the LB idle timeout above the ping interval, never the reverse.

Verification #

Confirm the upgrade, the pin, and the balance with these checks:

# 1. Confirm the WebSocket actually upgraded through the proxy (look for 101)
curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
  -H "Sec-WebSocket-Key: dGhlIHNhbXBsZQ==" -H "Sec-WebSocket-Version: 13" \
  https://example.com/ws/

# 2. Reconnect twice and confirm the SERVERID cookie pins to the same node
curl -i https://example.com/ws/ | grep -i 'set-cookie: SERVERID'

# 3. Count established WebSocket sockets per backend to spot affinity skew
ss -tnp state established '( dport = :8080 )' | wc -l

In Prometheus, assert that no node holds a disproportionate share of connections — a sign the pin is concentrating load on a single target:

# Fire when any node carries > 40% of all active connections
max(
  sum by (node) (ws_connections_active)
  / scalar(sum(ws_connections_active))
) > 0.40

Pair this with browser DevTools: open the Network tab, filter to WS, reconnect, and verify the response carries the same affinity cookie value across attempts.

Guides in this area #

Configuring AWS ALB for WebSocket sticky sessions — the platform-specific walkthrough: target-group stickiness, idle-timeout math against your heartbeat, health checks, and connection draining for zero-downtime deploys.

FAQ #

Do I need sticky sessions if I use Redis for state? #

Usually not. If every piece of state a reconnecting client needs is already in Redis or another shared store, any node can serve any client, and affinity only constrains your load balancing. Keep sticky routing for genuinely in-process state — unacked buffers, in-memory CRDT documents — that would be too latency-sensitive to externalize on every message.

Does sticky session affinity survive a WebSocket reconnect? #

Only if the affinity is cookie-based and the client resends the cookie. Browser WebSocket clients send cookies on the upgrade request automatically, so HAProxy SERVERID and ALB app_cookie affinity hold across reconnects. ip_hash survives only as long as the client’s source IP is stable, which fails on mobile networks.

Why use `leastconn` instead of round-robin for WebSockets? #

Round-robin balances by request count, but a WebSocket is one request that stays open for hours. After a deploy, round-robin sends an equal number of new connections to each node while ignoring that some nodes already hold thousands of live sockets. leastconn balances by concurrent connections, which is what actually loads a real-time backend.

How long should the load balancer idle timeout be? #

Strictly longer than your application heartbeat interval. A 30-second ping with a 60-second ALB idle timeout is safe; a 90-second ping with the same timeout will see the balancer drop sockets the app thinks are alive. For very quiet connections, raise the ALB idle timeout to 3600 seconds rather than relying on traffic to keep it open.

Configuring AWS ALB for WebSocket sticky sessions — ALB-specific stickiness, timeouts, and draining.
Connection Lifecycle & Heartbeats — detecting half-open sockets so affinity drift becomes visible.
Auto-Reconnection Strategies — backoff and jitter so reconnects after a failover don’t storm the routing layer.
Horizontal Scaling on Kubernetes — the stateless alternative where pods are interchangeable and affinity is unnecessary.

Back to Backend WebSocket Connection Management