Web Patterns
Real-time communication, async messaging, and event-driven integration patterns
Overview
Web patterns are communication strategies that go beyond the traditional HTTP request/response cycle. While REST APIs and standard HTTP serve most web interactions well, modern architectures demand more: real-time UIs that update instantly, event-driven microservices that react to state changes, and third-party integrations that push data to your systems as events occur.
These patterns solve fundamentally different problems. WebSockets enable full-duplex, persistent connections for real-time bidirectional communication. Webhooks let external systems notify you when something happens. Server-Sent Events stream one-way data from server to client over plain HTTP. Long polling simulates push over standard HTTP for maximum compatibility.
Pattern WebSockets
Full-duplex persistent TCP connections. Both client and server can send messages at any time. Used for chat, live dashboards, multiplayer games, and collaborative editing.
Pattern Webhooks
HTTP callbacks — a server POSTs to your endpoint when an event occurs. Used for third-party integrations (GitHub, Stripe, Slack), CI/CD pipelines, and event-driven workflows.
Pattern Server-Sent Events
One-way server-to-client streaming over HTTP. Built on the EventSource API. Used for live feeds, notifications, progress bars, and stock tickers.
Pattern Long Polling
Client sends a request, server holds it until data is available. Simulates push over standard HTTP. Used when WebSocket infrastructure isn't available or broad compatibility is needed.
Why Real-time UIs
Users expect live updates — typing indicators, real-time collaboration, live dashboards, instant notifications. Traditional polling wastes bandwidth and introduces latency.
Why Event-Driven Architecture
Microservices need to react to events without tight coupling. Webhooks and message queues enable loose coupling between services, reducing dependencies and improving resilience.
WebSockets
WebSockets provide full-duplex, persistent connections over a single TCP socket. Unlike HTTP where the client must initiate every exchange, a WebSocket connection allows both client and server to send messages independently at any time. The connection starts as an HTTP request and upgrades to the WebSocket protocol (RFC 6455).
The upgrade handshake
A WebSocket connection begins with an HTTP/1.1 upgrade request. The client sends a standard HTTP request with special headers, and the server responds with 101 Switching Protocols to confirm the upgrade.
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Origin: http://example.com
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
After the handshake, the connection switches from HTTP to the WebSocket frame-based protocol. The TCP connection stays open and both sides communicate using lightweight frames.
Protocol details
URL ws:// vs wss://
ws:// is unencrypted WebSocket (port 80). wss:// is WebSocket over TLS (port 443). Always use wss:// in production — it prevents man-in-the-middle attacks and works through most proxies and firewalls that block non-TLS WebSocket traffic.
Frames Frame format
WebSocket messages are transmitted as frames. Each frame has an opcode: 0x1 (text), 0x2 (binary), 0x8 (close), 0x9 (ping), 0xA (pong). Client-to-server frames are masked with a 32-bit key. Messages can be split across multiple frames for large payloads.
Health Ping/Pong heartbeats
The protocol includes built-in ping/pong frames for connection health monitoring. Either side can send a ping; the other must respond with a pong. Use this to detect dead connections — idle TCP connections can silently break through NATs and load balancers.
Use cases When to use WebSockets
Chat applications, live dashboards, multiplayer games, collaborative editing (Google Docs-style), live sports scores, financial trading terminals, IoT device communication, real-time analytics displays.
WebSockets trade statelessness for performance. Unlike HTTP, the server must maintain state for each connection. This has significant implications for scaling (connections are pinned to specific servers), load balancing (need sticky sessions or L4 balancing), and resource management (each connection consumes a file descriptor and memory).
WebSocket Implementation
Implementing WebSockets requires both server-side and client-side code. The server manages connections and broadcasts messages, while the client establishes the connection and handles events.
Server-side: Node.js (ws library)
const WebSocket = require('ws');
const server = new WebSocket.Server({ port: 8080 });
const clients = new Set();
server.on('connection', (ws, req) => {
const clientIP = req.socket.remoteAddress;
console.log(`New connection from ${clientIP}`);
clients.add(ws);
// Send a welcome message
ws.send(JSON.stringify({ type: 'welcome', message: 'Connected!' }));
ws.on('message', (data) => {
const message = JSON.parse(data);
// Broadcast to all other connected clients
for (const client of clients) {
if (client !== ws && client.readyState === WebSocket.OPEN) {
client.send(JSON.stringify(message));
}
}
});
ws.on('close', (code, reason) => {
console.log(`Connection closed: ${code} ${reason}`);
clients.delete(ws);
});
ws.on('error', (err) => {
console.error('WebSocket error:', err.message);
clients.delete(ws);
});
// Ping/pong heartbeat
ws.isAlive = true;
ws.on('pong', () => { ws.isAlive = true; });
});
// Heartbeat interval: detect broken connections
const heartbeat = setInterval(() => {
server.clients.forEach((ws) => {
if (!ws.isAlive) return ws.terminate();
ws.isAlive = false;
ws.ping();
});
}, 30000);
Server-side: Go (gorilla/websocket)
The gorilla/websocket repository was archived in December 2022. It still works but receives no updates or security patches. For new Go projects, consider github.com/coder/websocket (formerly nhooyr.io/websocket), which is actively maintained, supports context.Context for cancellation/timeouts, and handles concurrent writes safely.
package main
import (
"log"
"net/http"
"github.com/gorilla/websocket"
)
var upgrader = websocket.Upgrader{
ReadBufferSize: 1024,
WriteBufferSize: 1024,
CheckOrigin: func(r *http.Request) bool {
// Validate origin in production!
return true
},
}
func handleWS(w http.ResponseWriter, r *http.Request) {
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil {
log.Printf("Upgrade error: %v", err)
return
}
defer conn.Close()
for {
messageType, msg, err := conn.ReadMessage()
if err != nil {
log.Printf("Read error: %v", err)
break
}
// Echo the message back
if err := conn.WriteMessage(messageType, msg); err != nil {
log.Printf("Write error: %v", err)
break
}
}
}
func main() {
http.HandleFunc("/ws", handleWS)
log.Fatal(http.ListenAndServe(":8080", nil))
}
Client-side: JavaScript
class WebSocketClient {
constructor(url) {
this.url = url;
this.reconnectDelay = 1000;
this.maxReconnectDelay = 30000;
this.connect();
}
connect() {
this.ws = new WebSocket(this.url);
this.ws.onopen = () => {
console.log('Connected');
this.reconnectDelay = 1000; // Reset on successful connection
};
this.ws.onmessage = (event) => {
const data = JSON.parse(event.data);
this.handleMessage(data);
};
this.ws.onclose = (event) => {
console.log(`Disconnected: ${event.code} ${event.reason}`);
this.reconnect();
};
this.ws.onerror = (error) => {
console.error('WebSocket error:', error);
};
}
reconnect() {
// Exponential backoff with jitter
const jitter = Math.random() * 1000;
const delay = Math.min(this.reconnectDelay + jitter, this.maxReconnectDelay);
console.log(`Reconnecting in ${Math.round(delay)}ms...`);
setTimeout(() => {
this.reconnectDelay *= 2; // Exponential backoff
this.connect();
}, delay);
}
send(data) {
if (this.ws.readyState === WebSocket.OPEN) {
this.ws.send(JSON.stringify(data));
}
}
handleMessage(data) {
// Override this in your application
console.log('Received:', data);
}
}
// Usage
const client = new WebSocketClient('wss://api.example.com/ws');
Always implement reconnection with exponential backoff. WebSocket connections will drop — from network switches, load balancer timeouts, server restarts, or mobile network changes. Without backoff, thousands of clients reconnecting simultaneously will overwhelm your server (thundering herd).
Webhooks
Webhooks are HTTP callbacks — when an event occurs in a source system, it sends an HTTP POST request to a URL you've registered. Unlike polling (repeatedly asking "has anything changed?"), webhooks push data to you in near-real-time. They are the backbone of third-party integrations and event-driven architectures.
How webhooks work
Payload format
Webhook payloads are typically JSON. The source system includes metadata in HTTP headers to help you route and verify the request.
# Example: GitHub webhook delivery
# Headers:
# Content-Type: application/json
# X-GitHub-Event: pull_request
# X-GitHub-Delivery: 72d3162e-cc78-11e3-81ab-4c9367dc0958
# X-Hub-Signature-256: sha256=abc123...
#
# Body:
# {
# "action": "opened",
# "number": 42,
# "pull_request": {
# "title": "Add new feature",
# "user": { "login": "octocat" },
# "merged": false
# }
# }
Common webhook providers
VCS GitHub Webhooks
Events for pushes, PRs, issues, releases, deployments, and more. Verified with HMAC-SHA256 signatures. GitHub does not auto-retry failed deliveries; you must manually redeliver or automate redelivery via the REST API (within 3 days). Configurable per-repository or organization-wide.
Payments Stripe Events
Events for charges, subscriptions, invoices, disputes, and payouts. Signed with HMAC-SHA256. Includes a timestamp for replay attack prevention. Retries with exponential backoff for up to 3 days in live mode (immediately, 5 min, 30 min, 2 h, 5 h, 10 h, then every 12 h).
Chat Slack Events API
Events for messages, reactions, channel changes, app mentions. Uses a URL verification challenge during registration. Requires responding within 3 seconds (defer heavy processing to a background queue). Retries up to 3 times with exponential backoff; apps with >95% failure rate may be temporarily disabled.
Design Idempotency
Webhook deliveries use at-least-once semantics — the same event may be delivered multiple times (retries after timeouts, network errors). Your receiver must be idempotent. Use the delivery ID or event ID to deduplicate.
Retry policies
Webhook providers retry failed deliveries with different strategies:
- GitHub — does not auto-retry failed deliveries. Failed events can be manually redelivered via the UI or API within 3 days. You can build automated redelivery with a GitHub Actions workflow or script polling the REST API.
- Stripe — retries over up to 3 days with exponential backoff (immediately, 5 min, 30 min, 2 h, 5 h, 10 h, then every 12 h). Sandbox mode retries only 3 times over a few hours.
- Slack — retries up to 3 times with exponential backoff. If the Delayed Events feature is enabled, Slack follows the initial retries with hourly retries for up to 24 hours. Apps with >95% failure rate over 60 minutes may be temporarily disabled.
- General pattern — respond with
200 OKquickly, then process asynchronously. If you return 5xx or timeout, expect retries.
Return 200 OK as fast as possible. Process the webhook payload asynchronously (queue it for background processing). If your endpoint takes longer than the provider's timeout (GitHub: 10 seconds, Slack: 3 seconds, Stripe: varies), the delivery will be marked as failed — leading to retries (where supported) and potential duplicate processing.
Lightweight webhook server on Linux
The webhook tool is a tiny Go binary that listens for HTTP requests and runs shell commands. Perfect for auto-deploy workflows: GitHub pushes a commit → webhook fires → your server runs git pull and restarts.
# Install webhook (Debian/Ubuntu)
sudo apt install webhook
# Or download the binary directly
wget https://github.com/adnanh/webhook/releases/latest/download/webhook-linux-amd64.tar.gz
tar xzf webhook-linux-amd64.tar.gz
sudo mv webhook /usr/local/bin/
Define hooks in a JSON config file:
[
{
"id": "deploy",
"execute-command": "/opt/deploy.sh",
"command-working-directory": "/home/deploy/myapp",
"pass-arguments-to-command": [
{ "source": "payload", "name": "head_commit.message" }
],
"trigger-rule": {
"and": [
{
"match": {
"type": "payload-hmac-sha256",
"secret": "your-webhook-secret",
"parameter": {
"source": "header",
"name": "X-Hub-Signature-256"
}
}
},
{
"match": {
"type": "value",
"value": "refs/heads/main",
"parameter": {
"source": "payload",
"name": "ref"
}
}
}
]
}
}
]
The deploy script it runs:
#!/bin/bash
# /opt/deploy.sh
set -euo pipefail
cd /home/deploy/myapp
git fetch origin
git reset --hard origin/main
# restart service, rebuild, etc.
systemctl restart myapp
echo "Deployed at $(date)"
# Start the webhook listener
webhook -hooks /etc/webhook/hooks.json -port 9000 -verbose
# Your endpoint is now:
# http://your-server:9000/hooks/deploy
# Run as a systemd service for production
# [Unit]
# Description=Webhook listener
# After=network.target
#
# [Service]
# ExecStart=/usr/local/bin/webhook -hooks /etc/webhook/hooks.json -port 9000
# Restart=always
# User=deploy
#
# [Install]
# WantedBy=multi-user.target
In GitHub, go to Settings → Webhooks → Add webhook, set the Payload URL to https://your-server.com/hooks/deploy, content type to application/json, add your secret, and select Just the push event.
Exposing webhooks behind NAT with Cloudflare Zero Trust
If your server is behind NAT (home lab, internal network) and has no public IP, you can use Cloudflare Tunnel to expose just the webhook endpoint publicly — no port forwarding, no firewall changes.
# Install cloudflared
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb -o cloudflared.deb
sudo dpkg -i cloudflared.deb
# Authenticate with Cloudflare
cloudflared tunnel login
# Create a tunnel
cloudflared tunnel create my-webhook
# Configure the tunnel to proxy to your local webhook listener
cat > ~/.cloudflared/config.yml <<EOF
tunnel: my-webhook
credentials-file: /home/deploy/.cloudflared/<tunnel-id>.json
ingress:
- hostname: hooks.yourdomain.com
service: http://localhost:9000
- service: http_status:404
EOF
# Route DNS
cloudflared tunnel route dns my-webhook hooks.yourdomain.com
# Run the tunnel (or install as a systemd service)
cloudflared tunnel run my-webhook
# Install as a service for production:
# sudo cloudflared service install
The tunnel creates an outbound-only connection from your server to Cloudflare's edge. No inbound ports need to be open. Cloudflare proxies incoming HTTPS requests through the tunnel to your local webhook listener. You get free TLS, DDoS protection, and access policies — you can even restrict the tunnel to only accept requests from GitHub's webhook IP ranges using Cloudflare Access policies.
Webhook Security
Webhook endpoints are public HTTP URLs — anyone who knows the URL can send requests to it. Without verification, an attacker could forge webhook payloads and trick your system into taking unauthorized actions. HMAC signature verification is the primary defense.
HMAC-SHA256 signature verification
The webhook provider signs each payload with a shared secret using HMAC-SHA256. Your server recomputes the signature and compares it to the one in the header. If they match, the payload is authentic.
# Python: Verify a GitHub webhook signature
import hmac
import hashlib
def verify_github_signature(payload_body, signature_header, secret):
"""Verify that the payload was sent from GitHub."""
if not signature_header:
return False
# GitHub sends: sha256=
expected_signature = 'sha256=' + hmac.new(
secret.encode('utf-8'),
payload_body,
hashlib.sha256
).hexdigest()
# Timing-safe comparison to prevent timing attacks
return hmac.compare_digest(expected_signature, signature_header)
# In your webhook handler:
# payload = request.get_data()
# signature = request.headers.get('X-Hub-Signature-256')
# if not verify_github_signature(payload, signature, WEBHOOK_SECRET):
# abort(403)
#!/bin/bash
# Bash: Verify a webhook HMAC-SHA256 signature
WEBHOOK_SECRET="your-webhook-secret"
PAYLOAD='{"action":"opened","number":42}'
EXPECTED_SIG="sha256=abc123..."
# Compute the HMAC signature
COMPUTED_SIG="sha256=$(echo -n "$PAYLOAD" | openssl dgst -sha256 -hmac "$WEBHOOK_SECRET" | awk '{print $2}')"
if [ "$COMPUTED_SIG" = "$EXPECTED_SIG" ]; then
echo "Signature valid"
else
echo "Signature INVALID - rejecting payload"
exit 1
fi
Replay attack prevention
Even with valid signatures, an attacker could capture a legitimate webhook delivery and replay it later. Prevent this by validating the timestamp included in the webhook headers.
# Stripe-style timestamp validation
import time
def verify_timestamp(timestamp_header, tolerance_seconds=300):
"""Reject payloads older than 5 minutes."""
try:
timestamp = int(timestamp_header)
except (ValueError, TypeError):
return False
current_time = int(time.time())
return abs(current_time - timestamp) < tolerance_seconds
Additional security measures
Network IP Allowlisting
Some providers publish their webhook source IP ranges (e.g., GitHub publishes theirs at /meta). Restrict your webhook endpoint to only accept traffic from these IPs. Use this as defense in depth, not as a replacement for HMAC verification.
Transport TLS Requirement
Always use https:// for your webhook endpoint. Without TLS, payloads (including signatures and secrets) are transmitted in plaintext. Most providers refuse to deliver to HTTP endpoints in production.
Rotation Secret Rotation
Rotate your webhook signing secrets periodically. During rotation, temporarily accept signatures from both the old and new secret. Update the secret in the provider's configuration, then remove the old secret from your verification logic.
Comparison Timing-Safe Comparison
Always use hmac.compare_digest() (Python), crypto.timingSafeEqual() (Node.js), or equivalent. Standard string comparison (==) leaks information about the expected signature through timing differences, enabling byte-by-byte brute-force attacks.
Server-Sent Events
Server-Sent Events (SSE) provide a one-way, server-to-client streaming channel over standard HTTP. The client opens a persistent HTTP connection, and the server sends events as they occur. SSE is simpler than WebSockets, works natively with HTTP infrastructure, and includes built-in reconnection via the EventSource API.
How SSE works
The server responds with Content-Type: text/event-stream and sends events as plain text, each separated by two newlines. The connection stays open indefinitely.
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
event: message
id: 1
data: {"user": "alice", "text": "Hello!"}
event: message
id: 2
data: {"user": "bob", "text": "Hi there!"}
event: notification
id: 3
data: {"type": "user_joined", "user": "charlie"}
: this is a comment (heartbeat to keep connection alive)
Server implementation (Node.js)
const http = require('http');
http.createServer((req, res) => {
if (req.url === '/events') {
res.writeHead(200, {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Access-Control-Allow-Origin': '*',
});
let id = 0;
// Send an event every 2 seconds
const interval = setInterval(() => {
id++;
res.write(`event: update\n`);
res.write(`id: ${id}\n`);
res.write(`data: ${JSON.stringify({ time: new Date().toISOString(), count: id })}\n\n`);
}, 2000);
// Send heartbeat comment every 15 seconds
const heartbeat = setInterval(() => {
res.write(': heartbeat\n\n');
}, 15000);
req.on('close', () => {
clearInterval(interval);
clearInterval(heartbeat);
});
}
}).listen(8080);
Client: EventSource API
// Browser-native EventSource API
const source = new EventSource('/events');
// Default "message" event
source.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log('Message:', data);
};
// Named events
source.addEventListener('notification', (event) => {
const data = JSON.parse(event.data);
console.log('Notification:', data);
});
// Connection lifecycle
source.onopen = () => console.log('SSE connection opened');
source.onerror = (event) => {
if (source.readyState === EventSource.CONNECTING) {
console.log('Reconnecting...');
} else {
console.error('SSE error');
}
};
// Close the connection when done
// source.close();
Auto-reconnection & Last-Event-ID
The EventSource API automatically reconnects when the connection drops. On reconnection, the browser sends a Last-Event-ID header with the ID of the last received event, allowing the server to resume from where it left off. The server can control the reconnection delay with a retry: field.
retry: 5000
event: update
id: 42
data: {"status": "processing"}
event: update
id: 43
data: {"status": "complete"}
Use SSE instead of WebSockets when you only need server-to-client communication. SSE works through HTTP proxies and CDNs without special configuration, supports automatic reconnection out of the box, and uses far less server resources than WebSocket connections. It's ideal for live feeds, notification streams, and progress indicators.
Long Polling
Long polling is a technique where the client sends an HTTP request, and the server holds the request open until new data is available or a timeout occurs. Once the server responds, the client immediately sends a new request. This creates a near-real-time push effect using only standard HTTP — no special protocols or APIs required.
How long polling works
Server implementation
const express = require('express');
const app = express();
let events = [];
let waitingClients = [];
app.get('/updates', (req, res) => {
const since = parseInt(req.query.since) || 0;
// Check if there are already new events
const newEvents = events.filter(e => e.id > since);
if (newEvents.length > 0) {
return res.json({ events: newEvents });
}
// No new events: hold the connection
const client = { res, since };
waitingClients.push(client);
// Timeout after 30 seconds
const timeout = setTimeout(() => {
waitingClients = waitingClients.filter(c => c !== client);
res.json({ events: [] });
}, 30000);
// Clean up if client disconnects
req.on('close', () => {
clearTimeout(timeout);
waitingClients = waitingClients.filter(c => c !== client);
});
});
// When new data arrives, notify all waiting clients
function publishEvent(event) {
event.id = events.length + 1;
events.push(event);
waitingClients.forEach(client => {
const newEvents = events.filter(e => e.id > client.since);
client.res.json({ events: newEvents });
});
waitingClients = [];
}
app.listen(8080);
Timeout handling
The server should return an empty response after a timeout period (typically 20-30 seconds). This prevents proxy servers and load balancers from terminating the connection, and keeps NAT mappings alive. The client then immediately reconnects.
When long polling is still useful
- Broad compatibility — works everywhere HTTP works, including behind restrictive corporate proxies that block WebSocket upgrades
- Simple infrastructure — no special server support, works with any HTTP framework and standard load balancers
- The Comet pattern — long polling is part of the broader "Comet" family of techniques for server push over HTTP, which also includes streaming and hidden iframe approaches
- Fallback — many real-time libraries (Socket.IO, SignalR) use long polling as a fallback when WebSocket connections fail
Long polling has higher latency and more HTTP overhead than WebSockets or SSE. Each "cycle" requires a full HTTP request/response. With many clients, the server holds many open connections waiting for data. For new projects with modern clients, prefer WebSockets (bidirectional) or SSE (server-to-client) over long polling.
Comparison
Choosing the right real-time pattern depends on your requirements: direction of communication, infrastructure constraints, complexity budget, and scale.
Protocol comparison
| Feature | WebSockets | SSE | Long Polling | Short Polling |
|---|---|---|---|---|
| Direction | Bidirectional | Server → Client | Server → Client | Server → Client |
| Protocol | WS (over TCP) | HTTP | HTTP | HTTP |
| Connection | Persistent | Persistent | Semi-persistent | New per request |
| Browser support | All modern | All modern (not supported in legacy IE/old Edge <79) | Universal | Universal |
| Complexity | High | Low | Medium | Low |
| Reconnection | Manual | Automatic (built-in) | Manual | N/A |
| Proxy support | Problematic (needs L4) | Good (standard HTTP) | Good | Excellent |
| Overhead | Low (frames) | Low (text stream) | Medium (HTTP each cycle) | High (HTTP each interval) |
When to use what
WebSockets Use when
- You need bidirectional communication
- Low latency is critical (gaming, trading)
- High message frequency from both sides
- You control the infrastructure (can configure L4 LB)
SSE Use when
- You only need server-to-client streaming
- You want automatic reconnection
- You need to work through HTTP proxies/CDNs
- Use cases: feeds, notifications, progress updates
Long Polling Use when
- WebSockets and SSE are blocked (corporate proxies)
- You need maximum compatibility
- You're building a fallback transport
- Low-frequency updates are acceptable
Short Polling Use when
- Updates are infrequent (every 30s+)
- Simplicity is more important than latency
- Stateless infrastructure is required
- Example: checking build status every minute
Start with the simplest pattern that meets your requirements. If you only need server-to-client updates, use SSE. If you need bidirectional communication, use WebSockets. If you're receiving events from third parties, use webhooks. Only fall back to long polling or short polling when infrastructure constraints prevent the better options.
Scaling Patterns
Real-time connections are stateful — each WebSocket or SSE connection is pinned to a specific server process. This fundamentally changes how you scale compared to stateless HTTP APIs. Understanding backpressure and connection management is essential.
Sticky sessions
Since WebSocket connections are persistent, the load balancer must route all traffic for a given connection to the same backend server. This is called sticky sessions (or session affinity).
# Nginx: WebSocket proxy with sticky sessions
upstream websocket_backend {
ip_hash; # Sticky sessions based on client IP
server ws-server-1:8080;
server ws-server-2:8080;
server ws-server-3:8080;
}
server {
listen 443 ssl;
server_name ws.example.com;
location /ws {
proxy_pass http://websocket_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_read_timeout 86400; # 24h for persistent connections
}
}
Redis pub/sub for multi-instance WebSocket
When you have multiple WebSocket servers, a message sent to one server needs to be broadcast to clients connected to all servers. Redis pub/sub acts as a message bus between instances.
const Redis = require('ioredis');
const WebSocket = require('ws');
const pub = new Redis();
const sub = new Redis();
const wss = new WebSocket.Server({ port: 8080 });
// Subscribe to the broadcast channel
sub.subscribe('ws:broadcast');
sub.on('message', (channel, message) => {
// Forward to all local WebSocket clients
wss.clients.forEach(client => {
if (client.readyState === WebSocket.OPEN) {
client.send(message);
}
});
});
// When a local client sends a message, publish to Redis
wss.on('connection', (ws) => {
ws.on('message', (data) => {
// Publish to all instances via Redis
pub.publish('ws:broadcast', data.toString());
});
});
Connection limits and resource management
Limits Per-server connections
Each WebSocket connection consumes a file descriptor and ~10-50 KB of memory. A single server can typically handle 10,000-100,000 concurrent connections depending on hardware and message rate. Monitor with ulimit -n and increase if needed.
LB Load balancer config
L4 (TCP) load balancers work best for WebSockets — they forward raw TCP without understanding HTTP. L7 (HTTP) load balancers must explicitly support the WebSocket upgrade. Cloud LBs: use NLB (AWS), TCP LB (GCP), or configure ALB with WebSocket support.
Horizontal scaling strategies
- Shard by user/room/topic — route connections to specific servers based on a key (e.g., chat room ID). Reduces cross-server messaging.
- Redis/NATS pub/sub — use a message broker as a fan-out layer between server instances. Every instance subscribes to relevant channels.
- Dedicated WebSocket tier — separate your WebSocket servers from your HTTP API servers. Scale them independently based on connection count vs request rate.
- Connection draining — during deployments, stop accepting new connections on old instances and let existing connections drain gracefully before terminating.
For large-scale real-time systems, consider using a managed service (AWS API Gateway WebSockets, Ably, Pusher) or a dedicated real-time framework (Socket.IO with Redis adapter, Phoenix Channels). Building a production-grade WebSocket infrastructure from scratch is significantly more complex than building a REST API.
Message Queues
Message queues and event brokers are the backend counterpart to the client-facing web patterns. While WebSockets and SSE handle client communication, message queues handle inter-service communication in event-driven architectures. They complement webhooks and WebSockets by providing reliable, decoupled messaging between backend services.
Pub/Sub pattern
In publish/subscribe, producers send messages to a topic (not directly to consumers). Any number of consumers can subscribe to the topic and receive a copy of each message. This decouples producers from consumers completely.
# Architecture: Webhook + Message Queue + WebSocket
#
# 1. GitHub sends webhook to your API (webhook)
# 2. API validates and publishes to queue (message queue)
# 3. Worker consumes from queue (async processing)
# 4. Worker pushes update via WebSocket (real-time UI)
#
# Flow:
# GitHub --webhook--> API --> RabbitMQ --> Worker --> WebSocket --> Browser
Common message brokers
AMQP RabbitMQ
Feature-rich message broker supporting multiple patterns: queues, exchanges, routing keys, dead letter queues. Best for task distribution and complex routing. Supports message acknowledgment, persistence, and priority queues.
Streaming Apache Kafka
Distributed event streaming platform. Messages are persisted to a commit log and retained for a configurable duration. Best for event sourcing, log aggregation, and high-throughput streaming. Consumers can replay events from any point.
Lightweight NATS
High-performance, lightweight messaging system. Supports pub/sub, request/reply, and queue groups. NATS JetStream adds persistence and exactly-once publishing via message deduplication (using Nats-Msg-Id headers). Best for cloud-native microservices needing low-latency messaging.
Patterns Event Sourcing & CQRS
Event sourcing stores state changes as an immutable sequence of events (instead of current state). CQRS (Command Query Responsibility Segregation) separates read and write models. Both patterns pair naturally with message queues for propagating state changes across services.
Connecting the patterns
In a production architecture, these patterns work together:
- Webhooks receive events from external systems (GitHub, Stripe, Slack)
- Message queues decouple processing and provide reliability (retry, dead letter queues)
- WebSockets/SSE push processed results to connected clients in real-time
- Event sourcing provides an audit trail and enables rebuilding state from events
Observability
Real-time systems need specialized monitoring. Traditional HTTP request metrics (latency, error rate, throughput) don't capture the full picture when you have persistent connections, async message delivery, and event-driven flows.
WebSocket metrics
| Metric | What it tells you | Alert threshold |
|---|---|---|
ws_connections_active | Current open WebSocket connections | Approaching server limit (e.g., > 80% of max FDs) |
ws_connections_total | Total connections opened (counter) | Sudden spikes (reconnection storm) |
ws_messages_sent_total | Messages sent to clients per second | Drops to zero (server issue) or spikes (flood) |
ws_messages_received_total | Messages received from clients per second | Unexpected spikes (abuse or bug) |
ws_message_latency_ms | Time from message publish to client delivery | p99 > 500ms |
ws_errors_total | Connection errors, failed sends | > 1% of active connections |
Prometheus metrics for WebSocket connections
const client = require('prom-client');
// Gauge: current active connections
const wsConnections = new client.Gauge({
name: 'ws_connections_active',
help: 'Number of active WebSocket connections',
labelNames: ['server_id'],
});
// Counter: total messages
const wsMessagesSent = new client.Counter({
name: 'ws_messages_sent_total',
help: 'Total WebSocket messages sent',
labelNames: ['event_type'],
});
// Histogram: message latency
const wsLatency = new client.Histogram({
name: 'ws_message_latency_ms',
help: 'WebSocket message delivery latency in ms',
buckets: [5, 10, 25, 50, 100, 250, 500, 1000],
});
// In your WebSocket handler:
wss.on('connection', (ws) => {
wsConnections.inc({ server_id: process.env.SERVER_ID });
ws.on('close', () => {
wsConnections.dec({ server_id: process.env.SERVER_ID });
});
});
Webhook delivery monitoring
DLQ Dead letter queues
When webhook deliveries fail after all retries, send them to a dead letter queue. Monitor DLQ depth as a critical alert. DLQ messages need manual investigation — they represent events your system failed to process.
Dashboard Retry dashboards
Track webhook delivery attempts: first-attempt success rate, retry count distribution, time to successful delivery, and failure reasons (timeout, 5xx, connection refused). High retry rates indicate your webhook processor is too slow or unstable.
Key webhook metrics
# Prometheus-style metrics for webhook processing
webhook_deliveries_received_total{source="github",status="200"}
webhook_deliveries_received_total{source="stripe",status="400"}
webhook_processing_duration_seconds{source="github",quantile="0.99"}
webhook_signature_failures_total{source="github"}
webhook_dlq_depth{source="stripe"}
webhook_retry_count{source="github",attempt="1"}
webhook_retry_count{source="github",attempt="2"}
webhook_retry_count{source="github",attempt="3"}
Log every webhook delivery with: source, event type, delivery ID, processing time, and outcome. This creates an audit trail for debugging failed deliveries and understanding event flow through your system. Retain logs for at least 30 days.
Production Checklist
WebSocket checklist
- Use
wss://(TLS) in production — neverws://. Unencrypted WebSockets are blocked by many proxies and expose data to interception. - Implement ping/pong heartbeats — detect dead connections on both server and client. Set a 30-second interval and terminate connections that miss two consecutive pongs.
- Implement reconnection with exponential backoff — start at 1s, double each attempt, cap at 30s, add random jitter. Prevents thundering herd on server restarts.
- Validate origin header — check the
Originheader during the upgrade handshake to prevent cross-site WebSocket hijacking (CSWSH). - Set connection limits per user/IP — prevent resource exhaustion from a single client opening thousands of connections.
- Configure load balancer for WebSocket — use L4 (TCP) load balancing or ensure your L7 LB supports WebSocket upgrade. Set idle timeout to match your heartbeat interval.
- Use Redis pub/sub for multi-instance broadcast — if you run more than one WebSocket server, you need a message bus between them.
- Monitor active connection count — alert before hitting file descriptor limits. Track connection churn rate for anomaly detection.
Webhook checklist
- Verify HMAC signatures on every delivery — never process unverified payloads. Use timing-safe comparison.
- Validate timestamps for replay prevention — reject payloads older than 5 minutes to prevent replay attacks.
- Return 200 OK immediately, process async — queue the payload for background processing. Providers will retry on slow responses.
- Implement idempotent handlers — deduplicate by delivery ID or event ID. The same webhook may be delivered multiple times.
- Use HTTPS endpoints only — never expose webhook endpoints over plain HTTP.
- Set up dead letter queues — capture failed webhook processing for manual review and replay.
- Rotate webhook secrets periodically — support dual-secret verification during rotation windows.
- IP allowlist as defense in depth — restrict webhook endpoints to known provider IP ranges where available.
SSE checklist
- Set
Cache-Control: no-cache— prevent proxies from caching the event stream. - Include event IDs for resumption — the EventSource API sends
Last-Event-IDon reconnection. Your server must handle this header. - Send periodic heartbeat comments — a
: heartbeat\n\nevery 15 seconds keeps the connection alive through proxies. - Set appropriate
retry:values — control client reconnection delay based on your server's capacity. - Handle connection cleanup on server — detect client disconnects and clean up resources (timers, subscriptions).
General checklist
- Monitor all real-time connections — active connections, message rates, error rates, and latency percentiles for every pattern.
- Implement backpressure handling — decide what to do when consumers can't keep up: buffer, drop, or rate-limit.
- Plan for graceful degradation — what happens when WebSocket servers are down? Fall back to polling. What happens when webhook processing is slow? Queue and retry.
- Load test with realistic connection counts — test with thousands of concurrent WebSocket connections, not just HTTP request throughput.
- Document your webhook retry behavior — if you're a webhook provider, clearly document retry policies, timeout values, and expected response codes.