Web Patterns Production Guide

Overview

Web patterns are communication strategies that go beyond the traditional HTTP request/response cycle. While REST APIs and standard HTTP serve most web interactions well, modern architectures demand more: real-time UIs that update instantly, event-driven microservices that react to state changes, and third-party integrations that push data to your systems as events occur.

These patterns solve fundamentally different problems. WebSockets enable full-duplex, persistent connections for real-time bidirectional communication. Webhooks let external systems notify you when something happens. Server-Sent Events stream one-way data from server to client over plain HTTP. Long polling simulates push over standard HTTP for maximum compatibility.

Pattern WebSockets

Full-duplex persistent TCP connections. Both client and server can send messages at any time. Used for chat, live dashboards, multiplayer games, and collaborative editing.

Pattern Webhooks

HTTP callbacks — a server POSTs to your endpoint when an event occurs. Used for third-party integrations (GitHub, Stripe, Slack), CI/CD pipelines, and event-driven workflows.

Pattern Server-Sent Events

One-way server-to-client streaming over HTTP. Built on the EventSource API. Used for live feeds, notifications, progress bars, and stock tickers.

Pattern Long Polling

Client sends a request, server holds it until data is available. Simulates push over standard HTTP. Used when WebSocket infrastructure isn't available or broad compatibility is needed.

Why Real-time UIs

Users expect live updates — typing indicators, real-time collaboration, live dashboards, instant notifications. Traditional polling wastes bandwidth and introduces latency.

Why Event-Driven Architecture

Microservices need to react to events without tight coupling. Webhooks and message queues enable loose coupling between services, reducing dependencies and improving resilience.

WebSockets

WebSockets provide full-duplex, persistent connections over a single TCP socket. Unlike HTTP where the client must initiate every exchange, a WebSocket connection allows both client and server to send messages independently at any time. The connection starts as an HTTP request and upgrades to the WebSocket protocol (RFC 6455).

The upgrade handshake

A WebSocket connection begins with an HTTP/1.1 upgrade request. The client sends a standard HTTP request with special headers, and the server responds with 101 Switching Protocols to confirm the upgrade.

GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Origin: http://example.com

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

After the handshake, the connection switches from HTTP to the WebSocket frame-based protocol. The TCP connection stays open and both sides communicate using lightweight frames.

Protocol details

URL ws:// vs wss://

ws:// is unencrypted WebSocket (port 80). wss:// is WebSocket over TLS (port 443). Always use wss:// in production — it prevents man-in-the-middle attacks and works through most proxies and firewalls that block non-TLS WebSocket traffic.

Frames Frame format

WebSocket messages are transmitted as frames. Each frame has an opcode: 0x1 (text), 0x2 (binary), 0x8 (close), 0x9 (ping), 0xA (pong). Client-to-server frames are masked with a 32-bit key. Messages can be split across multiple frames for large payloads.

Health Ping/Pong heartbeats

The protocol includes built-in ping/pong frames for connection health monitoring. Either side can send a ping; the other must respond with a pong. Use this to detect dead connections — idle TCP connections can silently break through NATs and load balancers.

Use cases When to use WebSockets

Chat applications, live dashboards, multiplayer games, collaborative editing (Google Docs-style), live sports scores, financial trading terminals, IoT device communication, real-time analytics displays.

Key insight

WebSockets trade statelessness for performance. Unlike HTTP, the server must maintain state for each connection. This has significant implications for scaling (connections are pinned to specific servers), load balancing (need sticky sessions or L4 balancing), and resource management (each connection consumes a file descriptor and memory).

WebSocket Implementation

Implementing WebSockets requires both server-side and client-side code. The server manages connections and broadcasts messages, while the client establishes the connection and handles events.

Server-side: Node.js (ws library)

const WebSocket = require('ws');
const server = new WebSocket.Server({ port: 8080 });

const clients = new Set();

server.on('connection', (ws, req) => {
  const clientIP = req.socket.remoteAddress;
  console.log(`New connection from ${clientIP}`);
  clients.add(ws);

  // Send a welcome message
  ws.send(JSON.stringify({ type: 'welcome', message: 'Connected!' }));

  ws.on('message', (data) => {
    const message = JSON.parse(data);
    // Broadcast to all other connected clients
    for (const client of clients) {
      if (client !== ws && client.readyState === WebSocket.OPEN) {
        client.send(JSON.stringify(message));
      }
    }
  });

  ws.on('close', (code, reason) => {
    console.log(`Connection closed: ${code} ${reason}`);
    clients.delete(ws);
  });

  ws.on('error', (err) => {
    console.error('WebSocket error:', err.message);
    clients.delete(ws);
  });

  // Ping/pong heartbeat
  ws.isAlive = true;
  ws.on('pong', () => { ws.isAlive = true; });
});

// Heartbeat interval: detect broken connections
const heartbeat = setInterval(() => {
  server.clients.forEach((ws) => {
    if (!ws.isAlive) return ws.terminate();
    ws.isAlive = false;
    ws.ping();
  });
}, 30000);

Server-side: Go (gorilla/websocket)

Note

The gorilla/websocket repository was archived in December 2022. It still works but receives no updates or security patches. For new Go projects, consider github.com/coder/websocket (formerly nhooyr.io/websocket), which is actively maintained, supports context.Context for cancellation/timeouts, and handles concurrent writes safely.

package main

import (
    "log"
    "net/http"
    "github.com/gorilla/websocket"
)

var upgrader = websocket.Upgrader{
    ReadBufferSize:  1024,
    WriteBufferSize: 1024,
    CheckOrigin: func(r *http.Request) bool {
        // Validate origin in production!
        return true
    },
}

func handleWS(w http.ResponseWriter, r *http.Request) {
    conn, err := upgrader.Upgrade(w, r, nil)
    if err != nil {
        log.Printf("Upgrade error: %v", err)
        return
    }
    defer conn.Close()

    for {
        messageType, msg, err := conn.ReadMessage()
        if err != nil {
            log.Printf("Read error: %v", err)
            break
        }
        // Echo the message back
        if err := conn.WriteMessage(messageType, msg); err != nil {
            log.Printf("Write error: %v", err)
            break
        }
    }
}

func main() {
    http.HandleFunc("/ws", handleWS)
    log.Fatal(http.ListenAndServe(":8080", nil))
}

Client-side: JavaScript

class WebSocketClient {
  constructor(url) {
    this.url = url;
    this.reconnectDelay = 1000;
    this.maxReconnectDelay = 30000;
    this.connect();
  }

  connect() {
    this.ws = new WebSocket(this.url);

    this.ws.onopen = () => {
      console.log('Connected');
      this.reconnectDelay = 1000; // Reset on successful connection
    };

    this.ws.onmessage = (event) => {
      const data = JSON.parse(event.data);
      this.handleMessage(data);
    };

    this.ws.onclose = (event) => {
      console.log(`Disconnected: ${event.code} ${event.reason}`);
      this.reconnect();
    };

    this.ws.onerror = (error) => {
      console.error('WebSocket error:', error);
    };
  }

  reconnect() {
    // Exponential backoff with jitter
    const jitter = Math.random() * 1000;
    const delay = Math.min(this.reconnectDelay + jitter, this.maxReconnectDelay);
    console.log(`Reconnecting in ${Math.round(delay)}ms...`);

    setTimeout(() => {
      this.reconnectDelay *= 2; // Exponential backoff
      this.connect();
    }, delay);
  }

  send(data) {
    if (this.ws.readyState === WebSocket.OPEN) {
      this.ws.send(JSON.stringify(data));
    }
  }

  handleMessage(data) {
    // Override this in your application
    console.log('Received:', data);
  }
}

// Usage
const client = new WebSocketClient('wss://api.example.com/ws');

Warning

Always implement reconnection with exponential backoff. WebSocket connections will drop — from network switches, load balancer timeouts, server restarts, or mobile network changes. Without backoff, thousands of clients reconnecting simultaneously will overwhelm your server (thundering herd).

Webhooks

Webhooks are HTTP callbacks — when an event occurs in a source system, it sends an HTTP POST request to a URL you've registered. Unlike polling (repeatedly asking "has anything changed?"), webhooks push data to you in near-real-time. They are the backbone of third-party integrations and event-driven architectures.

How webhooks work

1. REGISTER You give the source system a callback URL POST https://api.github.com/repos/org/repo/hooks { "url": "https://your-app.com/webhooks/github" } 2. EVENT Something happens in the source system A pull request is merged on GitHub 3. DELIVER Source system POSTs to your URL POST https://your-app.com/webhooks/github Headers: X-Hub-Signature-256, X-GitHub-Event Body: { "action": "closed", "pull_request": {...} } 4. RESPOND Your server processes and returns 200 OK (Within the provider's timeout, e.g. 10s for GitHub, 3s for Slack)

Payload format

Webhook payloads are typically JSON. The source system includes metadata in HTTP headers to help you route and verify the request.

# Example: GitHub webhook delivery
# Headers:
#   Content-Type: application/json
#   X-GitHub-Event: pull_request
#   X-GitHub-Delivery: 72d3162e-cc78-11e3-81ab-4c9367dc0958
#   X-Hub-Signature-256: sha256=abc123...
#
# Body:
# {
#   "action": "opened",
#   "number": 42,
#   "pull_request": {
#     "title": "Add new feature",
#     "user": { "login": "octocat" },
#     "merged": false
#   }
# }

Common webhook providers

VCS GitHub Webhooks

Events for pushes, PRs, issues, releases, deployments, and more. Verified with HMAC-SHA256 signatures. GitHub does not auto-retry failed deliveries; you must manually redeliver or automate redelivery via the REST API (within 3 days). Configurable per-repository or organization-wide.

Payments Stripe Events

Events for charges, subscriptions, invoices, disputes, and payouts. Signed with HMAC-SHA256. Includes a timestamp for replay attack prevention. Retries with exponential backoff for up to 3 days in live mode (immediately, 5 min, 30 min, 2 h, 5 h, 10 h, then every 12 h).

Chat Slack Events API

Events for messages, reactions, channel changes, app mentions. Uses a URL verification challenge during registration. Requires responding within 3 seconds (defer heavy processing to a background queue). Retries up to 3 times with exponential backoff; apps with >95% failure rate may be temporarily disabled.

Design Idempotency

Webhook deliveries use at-least-once semantics — the same event may be delivered multiple times (retries after timeouts, network errors). Your receiver must be idempotent. Use the delivery ID or event ID to deduplicate.

Retry policies

Webhook providers retry failed deliveries with different strategies:

GitHub — does not auto-retry failed deliveries. Failed events can be manually redelivered via the UI or API within 3 days. You can build automated redelivery with a GitHub Actions workflow or script polling the REST API.
Stripe — retries over up to 3 days with exponential backoff (immediately, 5 min, 30 min, 2 h, 5 h, 10 h, then every 12 h). Sandbox mode retries only 3 times over a few hours.
Slack — retries up to 3 times with exponential backoff. If the Delayed Events feature is enabled, Slack follows the initial retries with hourly retries for up to 24 hours. Apps with >95% failure rate over 60 minutes may be temporarily disabled.
General pattern — respond with 200 OK quickly, then process asynchronously. If you return 5xx or timeout, expect retries.

Critical

Return 200 OK as fast as possible. Process the webhook payload asynchronously (queue it for background processing). If your endpoint takes longer than the provider's timeout (GitHub: 10 seconds, Slack: 3 seconds, Stripe: varies), the delivery will be marked as failed — leading to retries (where supported) and potential duplicate processing.

Lightweight webhook server on Linux

The webhook tool is a tiny Go binary that listens for HTTP requests and runs shell commands. Perfect for auto-deploy workflows: GitHub pushes a commit → webhook fires → your server runs git pull and restarts.

# Install webhook (Debian/Ubuntu)
sudo apt install webhook

# Or download the binary directly
wget https://github.com/adnanh/webhook/releases/latest/download/webhook-linux-amd64.tar.gz
tar xzf webhook-linux-amd64.tar.gz
sudo mv webhook /usr/local/bin/

Define hooks in a JSON config file:

[
  {
    "id": "deploy",
    "execute-command": "/opt/deploy.sh",
    "command-working-directory": "/home/deploy/myapp",
    "pass-arguments-to-command": [
      { "source": "payload", "name": "head_commit.message" }
    ],
    "trigger-rule": {
      "and": [
        {
          "match": {
            "type": "payload-hmac-sha256",
            "secret": "your-webhook-secret",
            "parameter": {
              "source": "header",
              "name": "X-Hub-Signature-256"
            }
          }
        },
        {
          "match": {
            "type": "value",
            "value": "refs/heads/main",
            "parameter": {
              "source": "payload",
              "name": "ref"
            }
          }
        }
      ]
    }
  }
]

The deploy script it runs:

#!/bin/bash
# /opt/deploy.sh
set -euo pipefail
cd /home/deploy/myapp
git fetch origin
git reset --hard origin/main
# restart service, rebuild, etc.
systemctl restart myapp
echo "Deployed at $(date)"

# Start the webhook listener
webhook -hooks /etc/webhook/hooks.json -port 9000 -verbose

# Your endpoint is now:
# http://your-server:9000/hooks/deploy

# Run as a systemd service for production
# [Unit]
# Description=Webhook listener
# After=network.target
#
# [Service]
# ExecStart=/usr/local/bin/webhook -hooks /etc/webhook/hooks.json -port 9000
# Restart=always
# User=deploy
#
# [Install]
# WantedBy=multi-user.target

In GitHub, go to Settings → Webhooks → Add webhook, set the Payload URL to https://your-server.com/hooks/deploy, content type to application/json, add your secret, and select Just the push event.

Exposing webhooks behind NAT with Cloudflare Zero Trust

If your server is behind NAT (home lab, internal network) and has no public IP, you can use Cloudflare Tunnel to expose just the webhook endpoint publicly — no port forwarding, no firewall changes.

# Install cloudflared
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb -o cloudflared.deb
sudo dpkg -i cloudflared.deb

# Authenticate with Cloudflare
cloudflared tunnel login

# Create a tunnel
cloudflared tunnel create my-webhook

# Configure the tunnel to proxy to your local webhook listener
cat > ~/.cloudflared/config.yml <<EOF
tunnel: my-webhook
credentials-file: /home/deploy/.cloudflared/<tunnel-id>.json

ingress:
  - hostname: hooks.yourdomain.com
    service: http://localhost:9000
  - service: http_status:404
EOF

# Route DNS
cloudflared tunnel route dns my-webhook hooks.yourdomain.com

# Run the tunnel (or install as a systemd service)
cloudflared tunnel run my-webhook

# Install as a service for production:
# sudo cloudflared service install

Why Cloudflare Tunnel

The tunnel creates an outbound-only connection from your server to Cloudflare's edge. No inbound ports need to be open. Cloudflare proxies incoming HTTPS requests through the tunnel to your local webhook listener. You get free TLS, DDoS protection, and access policies — you can even restrict the tunnel to only accept requests from GitHub's webhook IP ranges using Cloudflare Access policies.

Webhook Security

Webhook endpoints are public HTTP URLs — anyone who knows the URL can send requests to it. Without verification, an attacker could forge webhook payloads and trick your system into taking unauthorized actions. HMAC signature verification is the primary defense.

HMAC-SHA256 signature verification

The webhook provider signs each payload with a shared secret using HMAC-SHA256. Your server recomputes the signature and compares it to the one in the header. If they match, the payload is authentic.

# Python: Verify a GitHub webhook signature
import hmac
import hashlib

def verify_github_signature(payload_body, signature_header, secret):
    """Verify that the payload was sent from GitHub."""
    if not signature_header:
        return False

    # GitHub sends: sha256=
    expected_signature = 'sha256=' + hmac.new(
        secret.encode('utf-8'),
        payload_body,
        hashlib.sha256
    ).hexdigest()

    # Timing-safe comparison to prevent timing attacks
    return hmac.compare_digest(expected_signature, signature_header)

# In your webhook handler:
# payload = request.get_data()
# signature = request.headers.get('X-Hub-Signature-256')
# if not verify_github_signature(payload, signature, WEBHOOK_SECRET):
#     abort(403)

#!/bin/bash
# Bash: Verify a webhook HMAC-SHA256 signature
WEBHOOK_SECRET="your-webhook-secret"
PAYLOAD='{"action":"opened","number":42}'
EXPECTED_SIG="sha256=abc123..."

# Compute the HMAC signature
COMPUTED_SIG="sha256=$(echo -n "$PAYLOAD" | openssl dgst -sha256 -hmac "$WEBHOOK_SECRET" | awk '{print $2}')"

if [ "$COMPUTED_SIG" = "$EXPECTED_SIG" ]; then
  echo "Signature valid"
else
  echo "Signature INVALID - rejecting payload"
  exit 1
fi

Replay attack prevention

Even with valid signatures, an attacker could capture a legitimate webhook delivery and replay it later. Prevent this by validating the timestamp included in the webhook headers.

# Stripe-style timestamp validation
import time

def verify_timestamp(timestamp_header, tolerance_seconds=300):
    """Reject payloads older than 5 minutes."""
    try:
        timestamp = int(timestamp_header)
    except (ValueError, TypeError):
        return False

    current_time = int(time.time())
    return abs(current_time - timestamp) < tolerance_seconds

Additional security measures

Network IP Allowlisting

Some providers publish their webhook source IP ranges (e.g., GitHub publishes theirs at /meta). Restrict your webhook endpoint to only accept traffic from these IPs. Use this as defense in depth, not as a replacement for HMAC verification.

Transport TLS Requirement

Always use https:// for your webhook endpoint. Without TLS, payloads (including signatures and secrets) are transmitted in plaintext. Most providers refuse to deliver to HTTP endpoints in production.

Rotation Secret Rotation

Rotate your webhook signing secrets periodically. During rotation, temporarily accept signatures from both the old and new secret. Update the secret in the provider's configuration, then remove the old secret from your verification logic.

Comparison Timing-Safe Comparison

Always use hmac.compare_digest() (Python), crypto.timingSafeEqual() (Node.js), or equivalent. Standard string comparison (==) leaks information about the expected signature through timing differences, enabling byte-by-byte brute-force attacks.

Server-Sent Events

Server-Sent Events (SSE) provide a one-way, server-to-client streaming channel over standard HTTP. The client opens a persistent HTTP connection, and the server sends events as they occur. SSE is simpler than WebSockets, works natively with HTTP infrastructure, and includes built-in reconnection via the EventSource API.

How SSE works

The server responds with Content-Type: text/event-stream and sends events as plain text, each separated by two newlines. The connection stays open indefinitely.

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

event: message
id: 1
data: {"user": "alice", "text": "Hello!"}

event: message
id: 2
data: {"user": "bob", "text": "Hi there!"}

event: notification
id: 3
data: {"type": "user_joined", "user": "charlie"}

: this is a comment (heartbeat to keep connection alive)

Server implementation (Node.js)

const http = require('http');

http.createServer((req, res) => {
  if (req.url === '/events') {
    res.writeHead(200, {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive',
      'Access-Control-Allow-Origin': '*',
    });

    let id = 0;

    // Send an event every 2 seconds
    const interval = setInterval(() => {
      id++;
      res.write(`event: update\n`);
      res.write(`id: ${id}\n`);
      res.write(`data: ${JSON.stringify({ time: new Date().toISOString(), count: id })}\n\n`);
    }, 2000);

    // Send heartbeat comment every 15 seconds
    const heartbeat = setInterval(() => {
      res.write(': heartbeat\n\n');
    }, 15000);

    req.on('close', () => {
      clearInterval(interval);
      clearInterval(heartbeat);
    });
  }
}).listen(8080);

Client: EventSource API

// Browser-native EventSource API
const source = new EventSource('/events');

// Default "message" event
source.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log('Message:', data);
};

// Named events
source.addEventListener('notification', (event) => {
  const data = JSON.parse(event.data);
  console.log('Notification:', data);
});

// Connection lifecycle
source.onopen = () => console.log('SSE connection opened');
source.onerror = (event) => {
  if (source.readyState === EventSource.CONNECTING) {
    console.log('Reconnecting...');
  } else {
    console.error('SSE error');
  }
};

// Close the connection when done
// source.close();

Auto-reconnection & Last-Event-ID

The EventSource API automatically reconnects when the connection drops. On reconnection, the browser sends a Last-Event-ID header with the ID of the last received event, allowing the server to resume from where it left off. The server can control the reconnection delay with a retry: field.

retry: 5000
event: update
id: 42
data: {"status": "processing"}

event: update
id: 43
data: {"status": "complete"}

Recommendation

Use SSE instead of WebSockets when you only need server-to-client communication. SSE works through HTTP proxies and CDNs without special configuration, supports automatic reconnection out of the box, and uses far less server resources than WebSocket connections. It's ideal for live feeds, notification streams, and progress indicators.

Long Polling

Long polling is a technique where the client sends an HTTP request, and the server holds the request open until new data is available or a timeout occurs. Once the server responds, the client immediately sends a new request. This creates a near-real-time push effect using only standard HTTP — no special protocols or APIs required.

How long polling works

Client Server | | |--- GET /updates?since=42 ----------> | | | (holds request open) | | (waits for new data...) | | (30 seconds pass, timeout) |<-- 200 OK { "events": [] } --------- | | | |--- GET /updates?since=42 ----------> | | | (new data arrives!) |<-- 200 OK { "events": [{id:43}] } -- | | | |--- GET /updates?since=43 ----------> | | (repeat) |

Server implementation

const express = require('express');
const app = express();

let events = [];
let waitingClients = [];

app.get('/updates', (req, res) => {
  const since = parseInt(req.query.since) || 0;

  // Check if there are already new events
  const newEvents = events.filter(e => e.id > since);
  if (newEvents.length > 0) {
    return res.json({ events: newEvents });
  }

  // No new events: hold the connection
  const client = { res, since };
  waitingClients.push(client);

  // Timeout after 30 seconds
  const timeout = setTimeout(() => {
    waitingClients = waitingClients.filter(c => c !== client);
    res.json({ events: [] });
  }, 30000);

  // Clean up if client disconnects
  req.on('close', () => {
    clearTimeout(timeout);
    waitingClients = waitingClients.filter(c => c !== client);
  });
});

// When new data arrives, notify all waiting clients
function publishEvent(event) {
  event.id = events.length + 1;
  events.push(event);

  waitingClients.forEach(client => {
    const newEvents = events.filter(e => e.id > client.since);
    client.res.json({ events: newEvents });
  });
  waitingClients = [];
}

app.listen(8080);

Timeout handling

The server should return an empty response after a timeout period (typically 20-30 seconds). This prevents proxy servers and load balancers from terminating the connection, and keeps NAT mappings alive. The client then immediately reconnects.

When long polling is still useful

Broad compatibility — works everywhere HTTP works, including behind restrictive corporate proxies that block WebSocket upgrades
Simple infrastructure — no special server support, works with any HTTP framework and standard load balancers
The Comet pattern — long polling is part of the broader "Comet" family of techniques for server push over HTTP, which also includes streaming and hidden iframe approaches
Fallback — many real-time libraries (Socket.IO, SignalR) use long polling as a fallback when WebSocket connections fail

Trade-offs

Long polling has higher latency and more HTTP overhead than WebSockets or SSE. Each "cycle" requires a full HTTP request/response. With many clients, the server holds many open connections waiting for data. For new projects with modern clients, prefer WebSockets (bidirectional) or SSE (server-to-client) over long polling.

Comparison

Choosing the right real-time pattern depends on your requirements: direction of communication, infrastructure constraints, complexity budget, and scale.

Protocol comparison

Feature	WebSockets	SSE	Long Polling	Short Polling
Direction	Bidirectional	Server → Client	Server → Client	Server → Client
Protocol	WS (over TCP)	HTTP	HTTP	HTTP
Connection	Persistent	Persistent	Semi-persistent	New per request
Browser support	All modern	All modern (not supported in legacy IE/old Edge <79)	Universal	Universal
Complexity	High	Low	Medium	Low
Reconnection	Manual	Automatic (built-in)	Manual	N/A
Proxy support	Problematic (needs L4)	Good (standard HTTP)	Good	Excellent
Overhead	Low (frames)	Low (text stream)	Medium (HTTP each cycle)	High (HTTP each interval)

When to use what

WebSockets Use when

You need bidirectional communication
Low latency is critical (gaming, trading)
High message frequency from both sides
You control the infrastructure (can configure L4 LB)

SSE Use when

You only need server-to-client streaming
You want automatic reconnection
You need to work through HTTP proxies/CDNs
Use cases: feeds, notifications, progress updates

Long Polling Use when

WebSockets and SSE are blocked (corporate proxies)
You need maximum compatibility
You're building a fallback transport
Low-frequency updates are acceptable

Short Polling Use when

Updates are infrequent (every 30s+)
Simplicity is more important than latency
Stateless infrastructure is required
Example: checking build status every minute

Decision framework

Start with the simplest pattern that meets your requirements. If you only need server-to-client updates, use SSE. If you need bidirectional communication, use WebSockets. If you're receiving events from third parties, use webhooks. Only fall back to long polling or short polling when infrastructure constraints prevent the better options.

Scaling Patterns

Real-time connections are stateful — each WebSocket or SSE connection is pinned to a specific server process. This fundamentally changes how you scale compared to stateless HTTP APIs. Understanding backpressure and connection management is essential.

Sticky sessions

Since WebSocket connections are persistent, the load balancer must route all traffic for a given connection to the same backend server. This is called sticky sessions (or session affinity).

# Nginx: WebSocket proxy with sticky sessions
upstream websocket_backend {
    ip_hash;  # Sticky sessions based on client IP
    server ws-server-1:8080;
    server ws-server-2:8080;
    server ws-server-3:8080;
}

server {
    listen 443 ssl;
    server_name ws.example.com;

    location /ws {
        proxy_pass http://websocket_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_read_timeout 86400;  # 24h for persistent connections
    }
}

Redis pub/sub for multi-instance WebSocket

When you have multiple WebSocket servers, a message sent to one server needs to be broadcast to clients connected to all servers. Redis pub/sub acts as a message bus between instances.

const Redis = require('ioredis');
const WebSocket = require('ws');

const pub = new Redis();
const sub = new Redis();
const wss = new WebSocket.Server({ port: 8080 });

// Subscribe to the broadcast channel
sub.subscribe('ws:broadcast');
sub.on('message', (channel, message) => {
  // Forward to all local WebSocket clients
  wss.clients.forEach(client => {
    if (client.readyState === WebSocket.OPEN) {
      client.send(message);
    }
  });
});

// When a local client sends a message, publish to Redis
wss.on('connection', (ws) => {
  ws.on('message', (data) => {
    // Publish to all instances via Redis
    pub.publish('ws:broadcast', data.toString());
  });
});

Connection limits and resource management

Limits Per-server connections

Each WebSocket connection consumes a file descriptor and ~10-50 KB of memory. A single server can typically handle 10,000-100,000 concurrent connections depending on hardware and message rate. Monitor with ulimit -n and increase if needed.

LB Load balancer config

L4 (TCP) load balancers work best for WebSockets — they forward raw TCP without understanding HTTP. L7 (HTTP) load balancers must explicitly support the WebSocket upgrade. Cloud LBs: use NLB (AWS), TCP LB (GCP), or configure ALB with WebSocket support.

Horizontal scaling strategies

Shard by user/room/topic — route connections to specific servers based on a key (e.g., chat room ID). Reduces cross-server messaging.
Redis/NATS pub/sub — use a message broker as a fan-out layer between server instances. Every instance subscribes to relevant channels.
Dedicated WebSocket tier — separate your WebSocket servers from your HTTP API servers. Scale them independently based on connection count vs request rate.
Connection draining — during deployments, stop accepting new connections on old instances and let existing connections drain gracefully before terminating.

Architecture tip

For large-scale real-time systems, consider using a managed service (AWS API Gateway WebSockets, Ably, Pusher) or a dedicated real-time framework (Socket.IO with Redis adapter, Phoenix Channels). Building a production-grade WebSocket infrastructure from scratch is significantly more complex than building a REST API.

Message Queues

Message queues and event brokers are the backend counterpart to the client-facing web patterns. While WebSockets and SSE handle client communication, message queues handle inter-service communication in event-driven architectures. They complement webhooks and WebSockets by providing reliable, decoupled messaging between backend services.

Pub/Sub pattern

In publish/subscribe, producers send messages to a topic (not directly to consumers). Any number of consumers can subscribe to the topic and receive a copy of each message. This decouples producers from consumers completely.

# Architecture: Webhook + Message Queue + WebSocket
#
# 1. GitHub sends webhook to your API    (webhook)
# 2. API validates and publishes to queue (message queue)
# 3. Worker consumes from queue           (async processing)
# 4. Worker pushes update via WebSocket   (real-time UI)
#
# Flow:
# GitHub --webhook--> API --> RabbitMQ --> Worker --> WebSocket --> Browser

Common message brokers

AMQP RabbitMQ

Feature-rich message broker supporting multiple patterns: queues, exchanges, routing keys, dead letter queues. Best for task distribution and complex routing. Supports message acknowledgment, persistence, and priority queues.

Streaming Apache Kafka

Distributed event streaming platform. Messages are persisted to a commit log and retained for a configurable duration. Best for event sourcing, log aggregation, and high-throughput streaming. Consumers can replay events from any point.

Lightweight NATS

High-performance, lightweight messaging system. Supports pub/sub, request/reply, and queue groups. NATS JetStream adds persistence and exactly-once publishing via message deduplication (using Nats-Msg-Id headers). Best for cloud-native microservices needing low-latency messaging.

Patterns Event Sourcing & CQRS

Event sourcing stores state changes as an immutable sequence of events (instead of current state). CQRS (Command Query Responsibility Segregation) separates read and write models. Both patterns pair naturally with message queues for propagating state changes across services.

Connecting the patterns

In a production architecture, these patterns work together:

Webhooks receive events from external systems (GitHub, Stripe, Slack)
Message queues decouple processing and provide reliability (retry, dead letter queues)
WebSockets/SSE push processed results to connected clients in real-time
Event sourcing provides an audit trail and enables rebuilding state from events

Observability

Real-time systems need specialized monitoring. Traditional HTTP request metrics (latency, error rate, throughput) don't capture the full picture when you have persistent connections, async message delivery, and event-driven flows.

WebSocket metrics

Metric	What it tells you	Alert threshold
`ws_connections_active`	Current open WebSocket connections	Approaching server limit (e.g., > 80% of max FDs)
`ws_connections_total`	Total connections opened (counter)	Sudden spikes (reconnection storm)
`ws_messages_sent_total`	Messages sent to clients per second	Drops to zero (server issue) or spikes (flood)
`ws_messages_received_total`	Messages received from clients per second	Unexpected spikes (abuse or bug)
`ws_message_latency_ms`	Time from message publish to client delivery	p99 > 500ms
`ws_errors_total`	Connection errors, failed sends	> 1% of active connections

Prometheus metrics for WebSocket connections

const client = require('prom-client');

// Gauge: current active connections
const wsConnections = new client.Gauge({
  name: 'ws_connections_active',
  help: 'Number of active WebSocket connections',
  labelNames: ['server_id'],
});

// Counter: total messages
const wsMessagesSent = new client.Counter({
  name: 'ws_messages_sent_total',
  help: 'Total WebSocket messages sent',
  labelNames: ['event_type'],
});

// Histogram: message latency
const wsLatency = new client.Histogram({
  name: 'ws_message_latency_ms',
  help: 'WebSocket message delivery latency in ms',
  buckets: [5, 10, 25, 50, 100, 250, 500, 1000],
});

// In your WebSocket handler:
wss.on('connection', (ws) => {
  wsConnections.inc({ server_id: process.env.SERVER_ID });
  ws.on('close', () => {
    wsConnections.dec({ server_id: process.env.SERVER_ID });
  });
});

Webhook delivery monitoring

DLQ Dead letter queues

When webhook deliveries fail after all retries, send them to a dead letter queue. Monitor DLQ depth as a critical alert. DLQ messages need manual investigation — they represent events your system failed to process.

Dashboard Retry dashboards

Track webhook delivery attempts: first-attempt success rate, retry count distribution, time to successful delivery, and failure reasons (timeout, 5xx, connection refused). High retry rates indicate your webhook processor is too slow or unstable.

Key webhook metrics

# Prometheus-style metrics for webhook processing
webhook_deliveries_received_total{source="github",status="200"}
webhook_deliveries_received_total{source="stripe",status="400"}
webhook_processing_duration_seconds{source="github",quantile="0.99"}
webhook_signature_failures_total{source="github"}
webhook_dlq_depth{source="stripe"}
webhook_retry_count{source="github",attempt="1"}
webhook_retry_count{source="github",attempt="2"}
webhook_retry_count{source="github",attempt="3"}

Recommendation

Log every webhook delivery with: source, event type, delivery ID, processing time, and outcome. This creates an audit trail for debugging failed deliveries and understanding event flow through your system. Retain logs for at least 30 days.

Production Checklist

WebSocket checklist

Use wss:// (TLS) in production — never ws://. Unencrypted WebSockets are blocked by many proxies and expose data to interception.
Implement ping/pong heartbeats — detect dead connections on both server and client. Set a 30-second interval and terminate connections that miss two consecutive pongs.
Implement reconnection with exponential backoff — start at 1s, double each attempt, cap at 30s, add random jitter. Prevents thundering herd on server restarts.
Validate origin header — check the Origin header during the upgrade handshake to prevent cross-site WebSocket hijacking (CSWSH).
Set connection limits per user/IP — prevent resource exhaustion from a single client opening thousands of connections.
Configure load balancer for WebSocket — use L4 (TCP) load balancing or ensure your L7 LB supports WebSocket upgrade. Set idle timeout to match your heartbeat interval.
Use Redis pub/sub for multi-instance broadcast — if you run more than one WebSocket server, you need a message bus between them.
Monitor active connection count — alert before hitting file descriptor limits. Track connection churn rate for anomaly detection.

Webhook checklist

Verify HMAC signatures on every delivery — never process unverified payloads. Use timing-safe comparison.
Validate timestamps for replay prevention — reject payloads older than 5 minutes to prevent replay attacks.
Return 200 OK immediately, process async — queue the payload for background processing. Providers will retry on slow responses.
Implement idempotent handlers — deduplicate by delivery ID or event ID. The same webhook may be delivered multiple times.
Use HTTPS endpoints only — never expose webhook endpoints over plain HTTP.
Set up dead letter queues — capture failed webhook processing for manual review and replay.
Rotate webhook secrets periodically — support dual-secret verification during rotation windows.
IP allowlist as defense in depth — restrict webhook endpoints to known provider IP ranges where available.

SSE checklist

Set Cache-Control: no-cache — prevent proxies from caching the event stream.
Include event IDs for resumption — the EventSource API sends Last-Event-ID on reconnection. Your server must handle this header.
Send periodic heartbeat comments — a : heartbeat\n\n every 15 seconds keeps the connection alive through proxies.
Set appropriate retry: values — control client reconnection delay based on your server's capacity.
Handle connection cleanup on server — detect client disconnects and clean up resources (timers, subscriptions).

General checklist

Monitor all real-time connections — active connections, message rates, error rates, and latency percentiles for every pattern.
Implement backpressure handling — decide what to do when consumers can't keep up: buffer, drop, or rate-limit.
Plan for graceful degradation — what happens when WebSocket servers are down? Fall back to polling. What happens when webhook processing is slow? Queue and retry.
Load test with realistic connection counts — test with thousands of concurrent WebSocket connections, not just HTTP request throughput.
Document your webhook retry behavior — if you're a webhook provider, clearly document retry policies, timeout values, and expected response codes.