API Production Guide

Overview

An API (Application Programming Interface) is a contract that defines how software components communicate. APIs are the backbone of modern distributed systems — they connect microservices, expose functionality to external developers, power mobile apps, and enable third-party integrations. Designing and operating APIs well is a core infrastructure discipline.

API-first design means treating your API as a first-class product. The API contract is defined before implementation begins. This enables parallel development (front-end and back-end teams work simultaneously), ensures consistency, and produces better developer experiences. The API is the product.

Type REST

Resource-oriented architecture over HTTP. Uses standard methods (GET, POST, PUT, DELETE) and status codes. The most widely adopted API style. Stateless, cacheable, and simple to understand. Best for CRUD-heavy, public-facing APIs.

Type GraphQL

A query language for APIs. Clients request exactly the data they need in a single request. Eliminates over-fetching and under-fetching. Schema-typed and introspectable. Best for complex data graphs and mobile clients with bandwidth constraints.

Type gRPC

Google's high-performance RPC framework using Protocol Buffers for serialization. Binary protocol over HTTP/2. Supports streaming. Best for internal service-to-service communication where performance matters.

Type SOAP

XML-based protocol with WSDL contracts. Heavy, verbose, but highly standardized with built-in WS-Security. Still used in enterprise and financial systems. Generally avoided for new projects due to complexity.

Scope Internal APIs

APIs consumed within your organization. Higher trust, faster iteration, less strict versioning. Used for microservice communication. Can use gRPC or REST. Security still matters (zero-trust), but documentation standards may be more relaxed.

Scope External / Public APIs

APIs consumed by third-party developers. Require rigorous documentation, stable versioning, rate limiting, and developer onboarding (API keys, SDKs, sandbox environments). Breaking changes are extremely costly.

REST

REST (Representational State Transfer) is an architectural style for building APIs over HTTP. RESTful APIs model the domain as resources (nouns) identified by URLs, manipulated via standard HTTP methods (verbs), and represented in formats like JSON. REST is stateless — each request contains all the information needed to process it.

HTTP methods

Method	Purpose	Idempotent	Safe
`GET`	Retrieve a resource or collection	Yes	Yes
`POST`	Create a new resource	No	No
`PUT`	Replace a resource entirely	Yes	No
`PATCH`	Partially update a resource	No*	No
`DELETE`	Remove a resource	Yes	No

*PATCH can be made idempotent depending on the implementation, but the spec does not guarantee it.

Resource naming conventions

Use nouns, not verbs: /users not /getUsers
Use plural names: /users, /orders, /products
Nest resources to show relationships: /users/42/orders
Use kebab-case for multi-word resources: /order-items
Use query parameters for filtering, sorting, pagination: /users?status=active&sort=created_at&limit=20

Status codes

2xx Success

200 OK — general success (GET, PUT, PATCH)
201 Created — resource created (POST), include Location header
204 No Content — success with no response body (DELETE)

3xx Redirection

301 Moved Permanently — resource URL changed permanently
304 Not Modified — client cache is still valid (ETag/If-None-Match)

4xx Client Error

400 Bad Request — malformed request syntax or validation error
401 Unauthorized — missing or invalid authentication
403 Forbidden — authenticated but not authorized
404 Not Found — resource does not exist
409 Conflict — state conflict (e.g., duplicate creation)
422 Unprocessable Entity — valid syntax but semantic errors
429 Too Many Requests — rate limit exceeded

5xx Server Error

500 Internal Server Error — unexpected server failure
502 Bad Gateway — upstream service returned invalid response
503 Service Unavailable — server overloaded or in maintenance
504 Gateway Timeout — upstream service timed out

Example: RESTful resource operations

# List users (with pagination)
GET /api/v1/users?page=1&limit=20
Accept: application/json

# Get a single user
GET /api/v1/users/42
Accept: application/json

# Create a user
POST /api/v1/users
Content-Type: application/json
{
  "name": "Alice",
  "email": "alice@example.com",
  "role": "admin"
}
# Response: 201 Created
# Location: /api/v1/users/43

# Update a user (partial)
PATCH /api/v1/users/42
Content-Type: application/json
{
  "role": "viewer"
}

# Delete a user
DELETE /api/v1/users/42
# Response: 204 No Content

Content negotiation

Clients specify their preferred response format via the Accept header, and their request body format via Content-Type. A well-designed API respects these headers.

# Client requests JSON
Accept: application/json

# Client sends JSON
Content-Type: application/json

# Client requests XML (if supported)
Accept: application/xml

# Content negotiation with versioning
Accept: application/vnd.myapi.v2+json

HATEOAS

Hypermedia As The Engine Of Application State. The API response includes links to related actions and resources, allowing clients to discover the API dynamically rather than hardcoding URLs.

{
  "id": 42,
  "name": "Alice",
  "email": "alice@example.com",
  "_links": {
    "self": { "href": "/api/v1/users/42" },
    "orders": { "href": "/api/v1/users/42/orders" },
    "update": { "href": "/api/v1/users/42", "method": "PATCH" },
    "delete": { "href": "/api/v1/users/42", "method": "DELETE" }
  }
}

Key concept

Statelessness is REST's most important constraint. The server never stores client session state between requests. Every request must include all necessary context (authentication token, pagination cursor, etc.). This makes REST APIs horizontally scalable — any server in the pool can handle any request.

GraphQL

GraphQL is a query language and runtime for APIs, developed by Facebook in 2012 and open-sourced in 2015. Instead of multiple endpoints returning fixed data shapes, GraphQL exposes a single endpoint where clients specify exactly what fields they need. The server returns precisely that data — nothing more, nothing less.

Schema definition

A GraphQL API is defined by its schema, written in SDL (Schema Definition Language). The schema defines types, queries (reads), mutations (writes), and subscriptions (real-time).

# Schema Definition Language (SDL)
type User {
  id: ID!
  name: String!
  email: String!
  role: Role!
  orders: [Order!]!
  createdAt: DateTime!
}

type Order {
  id: ID!
  total: Float!
  status: OrderStatus!
  items: [OrderItem!]!
}

enum Role {
  ADMIN
  USER
  VIEWER
}

enum OrderStatus {
  PENDING
  SHIPPED
  DELIVERED
  CANCELLED
}

type Query {
  user(id: ID!): User
  users(limit: Int = 20, offset: Int = 0): [User!]!
  order(id: ID!): Order
}

type Mutation {
  createUser(input: CreateUserInput!): User!
  updateUser(id: ID!, input: UpdateUserInput!): User!
  deleteUser(id: ID!): Boolean!
}

input CreateUserInput {
  name: String!
  email: String!
  role: Role!
}

input UpdateUserInput {
  name: String
  email: String
  role: Role
}

type Subscription {
  orderStatusChanged(orderId: ID!): Order!
}

Queries and mutations

# Query: fetch exactly the fields you need
query GetUser {
  user(id: "42") {
    name
    email
    orders {
      id
      total
      status
    }
  }
}

# Mutation: create a user
mutation CreateUser {
  createUser(input: {
    name: "Alice"
    email: "alice@example.com"
    role: ADMIN
  }) {
    id
    name
  }
}

# Using variables (preferred for production)
query GetUser($userId: ID!) {
  user(id: $userId) {
    name
    email
  }
}
# Variables: { "userId": "42" }

# Fragments: reuse field selections
fragment UserFields on User {
  id
  name
  email
  role
}

query GetUsers {
  users(limit: 10) {
    ...UserFields
    orders {
      id
      total
    }
  }
}

N+1 problem and DataLoader

When resolving nested fields, a naive implementation executes one database query per parent item. Fetching 20 users with their orders means 1 query for users + 20 queries for orders = 21 total. DataLoader solves this by batching and caching: all 20 order queries are combined into a single SELECT ... WHERE user_id IN (...) query.

// DataLoader batching example (Node.js)
const DataLoader = require('dataloader');

const orderLoader = new DataLoader(async (userIds) => {
  // Single query: SELECT * FROM orders WHERE user_id IN (1, 2, 3, ...)
  const orders = await db.query(
    'SELECT * FROM orders WHERE user_id = ANY($1)',
    [userIds]
  );
  // Map results back to input order
  return userIds.map(id => orders.filter(o => o.user_id === id));
});

// In the User resolver
const resolvers = {
  User: {
    orders: (user) => orderLoader.load(user.id),
  },
};

Introspection

GraphQL APIs are self-documenting. Clients can query the schema itself using introspection queries. This powers tools like GraphiQL and Apollo Studio.

# Introspection: list all types
{
  __schema {
    types {
      name
      kind
    }
  }
}

# Introspection: get fields of a type
{
  __type(name: "User") {
    fields {
      name
      type { name kind }
    }
  }
}

When to use GraphQL vs REST

GraphQL shines

Complex data relationships (graphs)
Multiple client types (web, mobile, IoT) needing different data shapes
Reducing number of HTTP round trips
Rapid front-end iteration without back-end changes
Teams with strong schema-first culture

REST is better

Simple CRUD resources with well-defined endpoints
Heavy use of HTTP caching (CDN, ETags)
File uploads and downloads
Public APIs (REST is more universally understood)
When you want strong HTTP semantics (status codes, methods)

Warning

Disable introspection in production for public-facing GraphQL APIs. It exposes your entire schema to potential attackers. Use persisted queries (allowlisted query strings) to prevent arbitrary query execution and mitigate denial-of-service via deeply nested queries.

gRPC

gRPC is a high-performance, open-source RPC (Remote Procedure Call) framework developed by Google. It uses Protocol Buffers (protobuf) as its interface definition language and serialization format, runs over HTTP/2, and supports multiple programming languages via code generation. gRPC is the standard for internal service-to-service communication in high-performance systems.

Service definition with .proto files

// user_service.proto
syntax = "proto3";

package user.v1;

option go_package = "github.com/myorg/api/user/v1";

// Service definition
service UserService {
  // Unary RPC
  rpc GetUser(GetUserRequest) returns (GetUserResponse);
  rpc CreateUser(CreateUserRequest) returns (CreateUserResponse);
  rpc DeleteUser(DeleteUserRequest) returns (DeleteUserResponse);

  // Server streaming: server sends multiple responses
  rpc ListUsers(ListUsersRequest) returns (stream User);

  // Client streaming: client sends multiple requests
  rpc UploadUsers(stream CreateUserRequest) returns (UploadUsersResponse);

  // Bidirectional streaming
  rpc SyncUsers(stream SyncRequest) returns (stream SyncResponse);
}

message User {
  string id = 1;
  string name = 2;
  string email = 3;
  Role role = 4;
  google.protobuf.Timestamp created_at = 5;
}

enum Role {
  ROLE_UNSPECIFIED = 0;
  ROLE_ADMIN = 1;
  ROLE_USER = 2;
  ROLE_VIEWER = 3;
}

message GetUserRequest {
  string id = 1;
}

message GetUserResponse {
  User user = 1;
}

message CreateUserRequest {
  string name = 1;
  string email = 2;
  Role role = 3;
}

message CreateUserResponse {
  User user = 1;
}

message DeleteUserRequest {
  string id = 1;
}

message DeleteUserResponse {}

message ListUsersRequest {
  int32 page_size = 1;
  string page_token = 2;
}

message UploadUsersResponse {
  int32 created_count = 1;
}

message SyncRequest {
  User user = 1;
}

message SyncResponse {
  string status = 1;
  User user = 2;
}

Streaming types

Type	Client	Server	Use case
Unary	1 request	1 response	Standard request/response (like REST)
Server streaming	1 request	N responses	Large result sets, real-time feeds
Client streaming	N requests	1 response	File upload, batch operations
Bidirectional	N requests	N responses	Chat, live sync, interactive sessions

Code generation

# Install protoc compiler and language plugins
# Generate Go code
protoc --go_out=. --go-grpc_out=. user_service.proto

# Generate Python code
python -m grpc_tools.protoc -I. \
  --python_out=. --grpc_python_out=. user_service.proto

# Generate TypeScript code (using ts-proto)
protoc --plugin=./node_modules/.bin/protoc-gen-ts_proto \
  --ts_proto_out=. user_service.proto

Metadata and deadlines

gRPC supports metadata (key-value pairs sent as HTTP/2 headers) for passing auth tokens, trace IDs, and other context. Deadlines propagate timeout expectations across service boundaries — if a deadline is exceeded, the call is cancelled.

# Python gRPC client with metadata and deadline
import grpc
from user.v1 import user_service_pb2, user_service_pb2_grpc

channel = grpc.insecure_channel('localhost:50051')
stub = user_service_pb2_grpc.UserServiceStub(channel)

# Set metadata (auth token, trace ID)
metadata = [
    ('authorization', 'Bearer eyJhbG...'),
    ('x-request-id', 'req-abc-123'),
]

# Set deadline (5 seconds from now)
response = stub.GetUser(
    user_service_pb2.GetUserRequest(id='42'),
    metadata=metadata,
    timeout=5.0,  # deadline in seconds
)

print(f"User: {response.user.name}")

HTTP/2 transport

gRPC uses HTTP/2 which provides multiplexing (multiple RPCs over a single TCP connection), header compression (HPACK), and bidirectional streaming. This makes gRPC significantly more efficient than REST over HTTP/1.1 for high-throughput internal communication. The binary protobuf encoding is 3–10x smaller than JSON for the same data.

API Documentation

Good API documentation is the difference between an API that gets adopted and one that gets abandoned. The OpenAPI Specification (formerly Swagger) is the industry standard for documenting REST APIs. It provides a machine-readable contract that powers documentation UIs, client SDKs, and testing tools.

Sample OpenAPI spec

# openapi.yaml
openapi: 3.1.0
info:
  title: User API
  description: Manage user accounts
  version: 1.0.0
  contact:
    name: Platform Team
    email: platform@example.com

servers:
  - url: https://api.example.com/v1
    description: Production
  - url: https://staging-api.example.com/v1
    description: Staging

paths:
  /users:
    get:
      summary: List users
      operationId: listUsers
      tags:
        - Users
      parameters:
        - name: limit
          in: query
          schema:
            type: integer
            default: 20
            maximum: 100
        - name: offset
          in: query
          schema:
            type: integer
            default: 0
      responses:
        '200':
          description: A list of users
          content:
            application/json:
              schema:
                type: object
                properties:
                  data:
                    type: array
                    items:
                      $ref: '#/components/schemas/User'
                  total:
                    type: integer
        '401':
          $ref: '#/components/responses/Unauthorized'

    post:
      summary: Create a user
      operationId: createUser
      tags:
        - Users
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateUserInput'
      responses:
        '201':
          description: User created
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/User'
          headers:
            Location:
              schema:
                type: string
        '400':
          $ref: '#/components/responses/BadRequest'

  /users/{id}:
    get:
      summary: Get a user by ID
      operationId: getUser
      tags:
        - Users
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      responses:
        '200':
          description: The user
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/User'
        '404':
          $ref: '#/components/responses/NotFound'

components:
  schemas:
    User:
      type: object
      required: [id, name, email, role, created_at]
      properties:
        id:
          type: string
          format: uuid
        name:
          type: string
          example: Alice
        email:
          type: string
          format: email
        role:
          type: string
          enum: [admin, user, viewer]
        created_at:
          type: string
          format: date-time

    CreateUserInput:
      type: object
      required: [name, email, role]
      properties:
        name:
          type: string
          minLength: 1
          maxLength: 100
        email:
          type: string
          format: email
        role:
          type: string
          enum: [admin, user, viewer]

    Error:
      type: object
      required: [type, title, status]
      properties:
        type:
          type: string
          format: uri
        title:
          type: string
        status:
          type: integer
        detail:
          type: string
        instance:
          type: string
          format: uri

  responses:
    Unauthorized:
      description: Authentication required
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/Error'
    BadRequest:
      description: Invalid request
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/Error'
    NotFound:
      description: Resource not found
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/Error'

  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: JWT

security:
  - BearerAuth: []

Documentation tools

Tool Swagger UI

Interactive API explorer generated from an OpenAPI spec. Developers can try API calls directly from the browser. Bundled with many API frameworks. Hosted at /docs or /swagger.

Tool Redoc

Clean, responsive, three-panel documentation UI. Better reading experience than Swagger UI for reference documentation. Supports nested schemas and markdown descriptions.

Tool Postman Collections

Shareable collections of API requests with examples, variables, and tests. Can be generated from OpenAPI specs. Great for onboarding and manual testing. Export as JSON for version control.

Practice API Changelog

Maintain a changelog documenting every API change: new endpoints, deprecated fields, behavior changes. Include the date, version, and migration instructions. Notify consumers via email or webhook before breaking changes.

Recommendation

Write descriptions for every operation, parameter, and schema property. Include realistic examples. The best API docs read like a tutorial, not a schema dump. Use example fields in your OpenAPI spec — they appear in Swagger UI's "Try it out" feature and in generated SDKs.

API Authentication

Every production API needs authentication (who is calling?) and often authorization (what are they allowed to do?). The choice of auth method depends on the API's audience, security requirements, and operational complexity.

Auth methods compared

Method	Best for	Security	Complexity
API Keys	Simple integrations, internal services	Low (shared secret)	Low
Bearer Tokens (OAuth 2.0)	User-facing APIs, third-party access	High (scoped, expiring)	Medium
Basic Auth	Simple internal tools, CI/CD	Low (base64, not encrypted)	Low
Mutual TLS (mTLS)	Service-to-service, zero-trust	Very high (certificate-based)	High
HMAC Signatures	Webhooks, tamper-proof requests	High (request integrity)	Medium

Header formats

# API Key (in header)
curl -H "X-API-Key: sk_live_abc123def456" \
  https://api.example.com/v1/users

# API Key (in query parameter — less secure, logged in URLs)
curl "https://api.example.com/v1/users?api_key=sk_live_abc123def456"

# Bearer Token (OAuth 2.0 / JWT)
curl -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyXzQyIiwic2NvcGUiOiJyZWFkOnVzZXJzIHdyaXRlOnVzZXJzIiwiZXhwIjoxNzExMDAwMDAwfQ.signature" \
  https://api.example.com/v1/users

# Basic Auth (base64-encoded username:password)
curl -H "Authorization: Basic YWxpY2U6cGFzc3dvcmQxMjM=" \
  https://api.example.com/v1/users
# Decoded: alice:password123

# HMAC Signature (webhook verification)
curl -H "X-Signature-256: sha256=5d7cee6c5e37b...abcdef" \
  -H "X-Timestamp: 1711000000" \
  -d '{"event":"user.created"}' \
  https://api.example.com/webhooks

HMAC signature verification

# Server-side HMAC webhook verification (Python)
import hmac
import hashlib

def verify_webhook(payload: bytes, signature: str, secret: str) -> bool:
    expected = hmac.new(
        secret.encode(),
        payload,
        hashlib.sha256
    ).hexdigest()
    received = signature.replace('sha256=', '')
    return hmac.compare_digest(expected, received)

Rate limiting per API key

Different API keys can have different rate limits. Free-tier keys might get 100 requests/minute, paid keys 10,000. The API gateway enforces this by looking up the key's plan and applying the corresponding rate limit.

Critical

Never send API keys in query parameters for production APIs — they appear in server logs, browser history, and CDN caches. Always use headers. For Bearer tokens, set short expiration times (15–60 minutes) and implement token refresh. For mTLS, automate certificate rotation.

API Versioning

APIs evolve. Versioning strategies let you introduce changes without breaking existing consumers. The key is to make the version explicit so clients can migrate at their own pace.

Versioning strategies

Strategy	Example	Pros	Cons
URL path	`/v1/users`	Simple, visible, cacheable	URL pollution, hard to share resources across versions
Query parameter	`/users?version=1`	Easy to add, optional	Easy to forget, poor cache behavior
Header	`Accept: application/vnd.api.v1+json`	Clean URLs, proper HTTP semantics	Hidden, harder to test in browser
Content negotiation	`Accept: application/vnd.api+json; version=2`	Most RESTful approach	Rarely used, tooling support varies

Breaking vs non-breaking changes

Safe Non-breaking changes

Adding new endpoints
Adding new optional fields to request/response
Adding new enum values (if clients handle unknown values)
Adding new query parameters
Relaxing validation (accepting wider input)

Breaking Breaking changes

Removing or renaming fields
Changing field types
Removing endpoints
Adding required fields to requests
Changing status codes or error formats
Tightening validation (rejecting previously valid input)

Deprecation and the Sunset header

RFC 8594 defines the Sunset header to communicate when an API version or endpoint will be removed. RFC 9745 standardizes the Deprecation header field, which signals that a resource has been or will be deprecated. Combine both headers to give consumers advance warning of an endpoint's full lifecycle.

# Response headers for a deprecated endpoint
HTTP/1.1 200 OK
Deprecation: @1719792000
Sunset: Sat, 01 Nov 2026 00:00:00 GMT
Link: <https://api.example.com/v2/users>; rel="successor-version"

# Response body can include deprecation notice
{
  "data": [...],
  "_deprecation": {
    "message": "v1 is deprecated. Migrate to v2 by November 2026.",
    "successor": "https://api.example.com/v2/users",
    "sunset": "2026-11-01T00:00:00Z"
  }
}

Warning

URL path versioning (/v1/) is the most common strategy because it's the simplest to implement and reason about. However, avoid creating a new version for every change. Only bump the major version for genuinely breaking changes. Use additive, non-breaking changes within a version as much as possible.

Rate Limiting & Throttling

Rate limiting protects your API from abuse, ensures fair usage among consumers, and prevents any single client from overwhelming your infrastructure. It is a critical production concern for every public and internal API.

Algorithms

Algorithm Token Bucket

A bucket holds tokens up to a maximum capacity. Each request consumes a token. Tokens are refilled at a fixed rate. If the bucket is empty, requests are rejected. Allows short bursts while maintaining an average rate.

Algorithm Leaky Bucket

Requests enter a queue (bucket) and are processed at a fixed rate. Excess requests overflow and are dropped. Produces a smooth, constant output rate. No bursting.

Algorithm Sliding Window

Tracks requests in a rolling time window. Combines the simplicity of fixed windows with the accuracy of per-request tracking. Prevents the boundary spike problem of fixed windows.

Algorithm Fixed Window

Counts requests in discrete time windows (e.g., per minute). Simple to implement but vulnerable to boundary spikes: a client can make 2x the limit by timing requests at the window boundary.

Rate limit headers

# Standard rate limit response headers
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000          # Max requests per window
X-RateLimit-Remaining: 847       # Requests remaining in current window
X-RateLimit-Reset: 1711000060    # Unix timestamp when window resets

# When rate limit is exceeded
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1711000060
Retry-After: 30                  # Seconds until the client should retry
Content-Type: application/json

{
  "type": "https://api.example.com/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "You have exceeded 1000 requests per minute. Retry after 30 seconds."
}

Per-user vs per-endpoint limits

Scope	Example	Purpose
Per API key	1000 req/min per key	Fair usage across consumers
Per endpoint	100 req/min on `POST /users`	Protect expensive operations
Per IP	50 req/min per IP	Brute-force protection (login endpoints)
Global	50,000 req/min total	Infrastructure protection

API quotas

Quotas are longer-term limits: "10,000 API calls per day" or "1,000,000 per month." They are distinct from rate limits (which are per-second or per-minute). Quotas map to billing tiers. When a quota is exhausted, return 429 with a Retry-After header set to the quota reset time.

Recommendation

Always return rate limit headers on every response, not just when limits are exceeded. This lets clients implement proactive backoff before hitting the limit. Use the Retry-After header (RFC 9110, Section 10.2.3) to tell clients exactly when to retry.

Error Handling

Consistent, informative error responses are essential for API usability. Developers spend more time debugging errors than reading success responses. Use a standardized error format across your entire API surface.

RFC 9457 Problem Details

RFC 9457 (which obsoletes RFC 7807) defines a standard JSON format for HTTP API error responses. It provides a consistent structure that clients can parse programmatically. RFC 9457 is fully backward-compatible with RFC 7807 and adds a common problem type registry and clearer guidance on representing multiple problems.

{
  "type": "https://api.example.com/errors/validation-failed",
  "title": "Validation Failed",
  "status": 422,
  "detail": "The request body contains invalid fields.",
  "instance": "/api/v1/users",
  "errors": [
    {
      "field": "email",
      "code": "invalid_format",
      "message": "Must be a valid email address"
    },
    {
      "field": "name",
      "code": "too_short",
      "message": "Must be at least 1 character",
      "meta": {
        "min_length": 1,
        "actual_length": 0
      }
    }
  ],
  "request_id": "req-abc-123"
}

Error response structure

Field	Type	Purpose
`type`	URI	A URL identifying the error type (can link to documentation)
`title`	String	Short, human-readable summary
`status`	Integer	HTTP status code (matches the response status)
`detail`	String	Detailed, human-readable explanation specific to this occurrence
`instance`	URI	URI reference identifying the specific occurrence
`errors`	Array	Field-level validation errors (extension)
`request_id`	String	Correlation ID for debugging (extension)

Retry-safe vs terminal errors

Retryable Safe to retry

429 Too Many Requests — retry after backoff
500 Internal Server Error — transient failures
502 Bad Gateway — upstream temporarily down
503 Service Unavailable — server overloaded
504 Gateway Timeout — upstream timed out

Terminal Do not retry

400 Bad Request — fix the request first
401 Unauthorized — refresh the token, then retry
403 Forbidden — insufficient permissions
404 Not Found — resource does not exist
409 Conflict — resolve the conflict first
422 Unprocessable Entity — fix validation errors

Client-side error handling

// Robust API client with retry logic
async function apiRequest(url, options = {}, retries = 3) {
  for (let attempt = 1; attempt <= retries; attempt++) {
    const response = await fetch(url, {
      ...options,
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${getToken()}`,
        ...options.headers,
      },
    });

    if (response.ok) return response.json();

    const error = await response.json();

    // Terminal errors: don't retry
    if ([400, 401, 403, 404, 409, 422].includes(response.status)) {
      throw new ApiError(error);
    }

    // Rate limited: wait and retry
    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After') || 5;
      await sleep(retryAfter * 1000);
      continue;
    }

    // Server error: exponential backoff
    if (response.status >= 500 && attempt < retries) {
      await sleep(Math.pow(2, attempt) * 1000);
      continue;
    }

    throw new ApiError(error);
  }
}

Key concept

Always include a request_id in error responses. This lets developers correlate their failed request with your server logs. Use the same ID in your structured logging, distributed traces, and support tickets. A single ID that threads through the entire request lifecycle saves hours of debugging.

API Gateway

An API gateway sits between clients and your backend services. It handles cross-cutting concerns — authentication, rate limiting, routing, transformation, caching, and logging — so your services don't have to. Every production API deployment should have a gateway layer.

What a gateway does

Gateway tools

Tool	Type	Key features
Kong	Open source / Enterprise	Plugin ecosystem, Lua/Go extensibility, DB-less mode, Kubernetes-native
Apache APISIX	Open source	High performance (etcd-based), dynamic routing, Wasm plugins
AWS API Gateway	Managed	Lambda integration, WebSocket support, usage plans, no infrastructure to manage
Envoy	Open source proxy	L4/L7 proxy, xDS control plane, gRPC-native, often used with Istio or Consul Connect
Traefik	Open source	Auto-discovery, Let's Encrypt, Docker/K8s native, middleware chains

Gateway patterns

Pattern Edge Gateway

A single gateway at the edge of your network handling all external traffic. Handles TLS termination, auth, rate limiting, and routes to internal services. The most common pattern.

Pattern Micro-Gateway

Per-team or per-service gateways. Each team manages their own gateway config. Reduces blast radius and enables independent deployment. Used in large organizations with many teams.

Pattern BFF (Backend for Frontend)

A dedicated gateway per client type (web, mobile, IoT). Each BFF aggregates and transforms data from backend services into the shape that specific client needs. Eliminates the "one size fits all" API problem. The mobile BFF might return minimal payloads while the web BFF returns richer data. Each BFF is owned by the front-end team that consumes it.

Recommendation

Start with a single edge gateway. Only introduce BFF or micro-gateway patterns when you have distinct client types with genuinely different data needs, or when the edge gateway becomes a bottleneck for team autonomy. Complexity has a cost.

Testing & Monitoring

APIs need testing at multiple levels: contract validation, integration correctness, and performance under load. In production, APIs need continuous monitoring for latency, error rates, and availability.

Contract testing

Pact is the most popular contract testing framework. It verifies that the API provider honors the contract expected by each consumer, without requiring a running instance of the consumer.

// Pact consumer test (JavaScript)
const { Pact, Matchers } = require('@pact-foundation/pact');

const provider = new Pact({
  consumer: 'WebApp',
  provider: 'UserAPI',
});

describe('User API Contract', () => {
  beforeAll(() => provider.setup());
  afterAll(() => provider.finalize());

  it('returns a user by ID', async () => {
    await provider.addInteraction({
      state: 'user 42 exists',
      uponReceiving: 'a request for user 42',
      withRequest: {
        method: 'GET',
        path: '/api/v1/users/42',
        headers: { Accept: 'application/json' },
      },
      willRespondWith: {
        status: 200,
        headers: { 'Content-Type': 'application/json' },
        body: {
          id: '42',
          name: Matchers.string('Alice'),
          email: Matchers.email(),
        },
      },
    });

    const response = await fetch(
      `${provider.mockService.baseUrl}/api/v1/users/42`,
      { headers: { Accept: 'application/json' } }
    );
    expect(response.status).toBe(200);
  });
});

Load testing

// k6 load test script
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '1m', target: 50 },   // Ramp up to 50 users
    { duration: '3m', target: 50 },   // Stay at 50
    { duration: '1m', target: 200 },  // Spike to 200
    { duration: '2m', target: 200 },  // Stay at 200
    { duration: '1m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95) < 500'],  // 95th percentile < 500ms
    http_req_failed: ['rate < 0.01'],     // Error rate < 1%
  },
};

export default function () {
  const res = http.get('https://api.example.com/v1/users', {
    headers: { Authorization: `Bearer ${__ENV.API_TOKEN}` },
  });

  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
    'has users': (r) => JSON.parse(r.body).data.length > 0,
  });

  sleep(1);
}

Monitoring and SLAs

Metric Latency Percentiles

Track p50, p95, and p99 latencies per endpoint. Averages hide outliers. A p95 of 200ms means 5% of requests take longer than 200ms — those 5% might be your most important customers.

Metric Error Rate

Track 4xx and 5xx rates separately. A spike in 5xx means your service is broken. A spike in 4xx might mean a client released a buggy update, or your rate limits are too aggressive.

Practice Health Check Endpoints

Expose GET /health (basic liveness) and GET /ready (full readiness including dependencies). Health checks should be fast, unauthenticated, and not cached. Use them for load balancer checks and Kubernetes probes.

Practice Synthetic Monitoring

Run automated API calls from external locations on a schedule (every 1–5 minutes). Detects outages before real users report them. Tools: Datadog Synthetic, Pingdom, Checkly, or a simple cron + curl script.

OpenTelemetry for API tracing

# Python: instrument a Flask API with OpenTelemetry
from opentelemetry import trace
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Setup tracer
provider = TracerProvider()
exporter = OTLPSpanExporter(endpoint="http://otel-collector:4317")
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

# Auto-instrument Flask and outgoing HTTP requests
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
RequestsInstrumentor().instrument()

# Every request now generates a trace with:
# - HTTP method, path, status code
# - Request/response sizes
# - Downstream service calls (propagated context)

Key concept

Combine the "four golden signals" for API monitoring: latency (how fast), traffic (how much), errors (how many failures), and saturation (how full is the system). Set SLOs (Service Level Objectives) for each: e.g., "p99 latency < 1s, error rate < 0.1%, availability > 99.9%."

Production Checklist

Define the API contract first — write the OpenAPI spec or .proto file before writing implementation code. Review the contract with consumers.
Use HTTPS everywhere — never serve API traffic over plain HTTP, even internally. TLS is non-negotiable. Use TLS 1.3 where possible.
Authenticate every request — choose an auth strategy (OAuth 2.0 for external, mTLS or API keys for internal). Validate tokens at the gateway.
Implement rate limiting — protect every endpoint with rate limits. Return standard headers (X-RateLimit-*, Retry-After). Set per-key and per-endpoint limits.
Version your API — use URL path versioning (/v1/) for simplicity. Document breaking vs non-breaking changes. Use the Sunset header for deprecations.
Standardize error responses — adopt RFC 9457 Problem Details (successor to RFC 7807). Include request_id in every error. Distinguish retryable from terminal errors.
Document every endpoint — maintain a complete OpenAPI spec with descriptions, examples, and schemas. Serve interactive docs via Swagger UI or Redoc.
Add health check endpoints — implement /health (liveness) and /ready (readiness). Use them for load balancers, Kubernetes probes, and synthetic monitoring.
Implement request validation — validate all inputs (types, ranges, formats, required fields) at the boundary. Reject invalid requests early with clear field-level error messages.
Set up monitoring and alerting — track latency percentiles (p50, p95, p99), error rates, and throughput per endpoint. Alert on SLO breaches. Use OpenTelemetry for distributed tracing.
Write contract tests — use Pact or similar tools to verify that API changes don't break consumers. Run contract tests in CI/CD before every deployment.
Load test before launch — use k6 or wrk to simulate expected traffic patterns. Identify bottlenecks, establish baseline performance, and set autoscaling thresholds.
Use an API gateway — deploy Kong, APISIX, or a cloud-managed gateway for centralized auth, rate limiting, logging, and routing. Don't reimplement these in every service.
Log structured request data — log method, path, status code, latency, request_id, and user_id for every request. Use structured JSON logging. Correlate with traces.
Plan for deprecation — publish a deprecation policy. Give consumers at least 6 months notice before removing any endpoint or field. Monitor deprecated endpoint usage before removal.