API
API design, documentation, authentication, and lifecycle management
Overview
An API (Application Programming Interface) is a contract that defines how software components communicate. APIs are the backbone of modern distributed systems — they connect microservices, expose functionality to external developers, power mobile apps, and enable third-party integrations. Designing and operating APIs well is a core infrastructure discipline.
API-first design means treating your API as a first-class product. The API contract is defined before implementation begins. This enables parallel development (front-end and back-end teams work simultaneously), ensures consistency, and produces better developer experiences. The API is the product.
Type REST
Resource-oriented architecture over HTTP. Uses standard methods (GET, POST, PUT, DELETE) and status codes. The most widely adopted API style. Stateless, cacheable, and simple to understand. Best for CRUD-heavy, public-facing APIs.
Type GraphQL
A query language for APIs. Clients request exactly the data they need in a single request. Eliminates over-fetching and under-fetching. Schema-typed and introspectable. Best for complex data graphs and mobile clients with bandwidth constraints.
Type gRPC
Google's high-performance RPC framework using Protocol Buffers for serialization. Binary protocol over HTTP/2. Supports streaming. Best for internal service-to-service communication where performance matters.
Type SOAP
XML-based protocol with WSDL contracts. Heavy, verbose, but highly standardized with built-in WS-Security. Still used in enterprise and financial systems. Generally avoided for new projects due to complexity.
Scope Internal APIs
APIs consumed within your organization. Higher trust, faster iteration, less strict versioning. Used for microservice communication. Can use gRPC or REST. Security still matters (zero-trust), but documentation standards may be more relaxed.
Scope External / Public APIs
APIs consumed by third-party developers. Require rigorous documentation, stable versioning, rate limiting, and developer onboarding (API keys, SDKs, sandbox environments). Breaking changes are extremely costly.
REST
REST (Representational State Transfer) is an architectural style for building APIs over HTTP. RESTful APIs model the domain as resources (nouns) identified by URLs, manipulated via standard HTTP methods (verbs), and represented in formats like JSON. REST is stateless — each request contains all the information needed to process it.
HTTP methods
| Method | Purpose | Idempotent | Safe |
|---|---|---|---|
GET | Retrieve a resource or collection | Yes | Yes |
POST | Create a new resource | No | No |
PUT | Replace a resource entirely | Yes | No |
PATCH | Partially update a resource | No* | No |
DELETE | Remove a resource | Yes | No |
*PATCH can be made idempotent depending on the implementation, but the spec does not guarantee it.
Resource naming conventions
- Use nouns, not verbs:
/usersnot/getUsers - Use plural names:
/users,/orders,/products - Nest resources to show relationships:
/users/42/orders - Use kebab-case for multi-word resources:
/order-items - Use query parameters for filtering, sorting, pagination:
/users?status=active&sort=created_at&limit=20
Status codes
2xx Success
200 OK— general success (GET, PUT, PATCH)201 Created— resource created (POST), includeLocationheader204 No Content— success with no response body (DELETE)
3xx Redirection
301 Moved Permanently— resource URL changed permanently304 Not Modified— client cache is still valid (ETag/If-None-Match)
4xx Client Error
400 Bad Request— malformed request syntax or validation error401 Unauthorized— missing or invalid authentication403 Forbidden— authenticated but not authorized404 Not Found— resource does not exist409 Conflict— state conflict (e.g., duplicate creation)422 Unprocessable Entity— valid syntax but semantic errors429 Too Many Requests— rate limit exceeded
5xx Server Error
500 Internal Server Error— unexpected server failure502 Bad Gateway— upstream service returned invalid response503 Service Unavailable— server overloaded or in maintenance504 Gateway Timeout— upstream service timed out
Example: RESTful resource operations
# List users (with pagination)
GET /api/v1/users?page=1&limit=20
Accept: application/json
# Get a single user
GET /api/v1/users/42
Accept: application/json
# Create a user
POST /api/v1/users
Content-Type: application/json
{
"name": "Alice",
"email": "alice@example.com",
"role": "admin"
}
# Response: 201 Created
# Location: /api/v1/users/43
# Update a user (partial)
PATCH /api/v1/users/42
Content-Type: application/json
{
"role": "viewer"
}
# Delete a user
DELETE /api/v1/users/42
# Response: 204 No Content
Content negotiation
Clients specify their preferred response format via the Accept header, and their request body format via Content-Type. A well-designed API respects these headers.
# Client requests JSON
Accept: application/json
# Client sends JSON
Content-Type: application/json
# Client requests XML (if supported)
Accept: application/xml
# Content negotiation with versioning
Accept: application/vnd.myapi.v2+json
HATEOAS
Hypermedia As The Engine Of Application State. The API response includes links to related actions and resources, allowing clients to discover the API dynamically rather than hardcoding URLs.
{
"id": 42,
"name": "Alice",
"email": "alice@example.com",
"_links": {
"self": { "href": "/api/v1/users/42" },
"orders": { "href": "/api/v1/users/42/orders" },
"update": { "href": "/api/v1/users/42", "method": "PATCH" },
"delete": { "href": "/api/v1/users/42", "method": "DELETE" }
}
}
Statelessness is REST's most important constraint. The server never stores client session state between requests. Every request must include all necessary context (authentication token, pagination cursor, etc.). This makes REST APIs horizontally scalable — any server in the pool can handle any request.
GraphQL
GraphQL is a query language and runtime for APIs, developed by Facebook in 2012 and open-sourced in 2015. Instead of multiple endpoints returning fixed data shapes, GraphQL exposes a single endpoint where clients specify exactly what fields they need. The server returns precisely that data — nothing more, nothing less.
Schema definition
A GraphQL API is defined by its schema, written in SDL (Schema Definition Language). The schema defines types, queries (reads), mutations (writes), and subscriptions (real-time).
# Schema Definition Language (SDL)
type User {
id: ID!
name: String!
email: String!
role: Role!
orders: [Order!]!
createdAt: DateTime!
}
type Order {
id: ID!
total: Float!
status: OrderStatus!
items: [OrderItem!]!
}
enum Role {
ADMIN
USER
VIEWER
}
enum OrderStatus {
PENDING
SHIPPED
DELIVERED
CANCELLED
}
type Query {
user(id: ID!): User
users(limit: Int = 20, offset: Int = 0): [User!]!
order(id: ID!): Order
}
type Mutation {
createUser(input: CreateUserInput!): User!
updateUser(id: ID!, input: UpdateUserInput!): User!
deleteUser(id: ID!): Boolean!
}
input CreateUserInput {
name: String!
email: String!
role: Role!
}
input UpdateUserInput {
name: String
email: String
role: Role
}
type Subscription {
orderStatusChanged(orderId: ID!): Order!
}
Queries and mutations
# Query: fetch exactly the fields you need
query GetUser {
user(id: "42") {
name
email
orders {
id
total
status
}
}
}
# Mutation: create a user
mutation CreateUser {
createUser(input: {
name: "Alice"
email: "alice@example.com"
role: ADMIN
}) {
id
name
}
}
# Using variables (preferred for production)
query GetUser($userId: ID!) {
user(id: $userId) {
name
email
}
}
# Variables: { "userId": "42" }
# Fragments: reuse field selections
fragment UserFields on User {
id
name
email
role
}
query GetUsers {
users(limit: 10) {
...UserFields
orders {
id
total
}
}
}
N+1 problem and DataLoader
When resolving nested fields, a naive implementation executes one database query per parent item. Fetching 20 users with their orders means 1 query for users + 20 queries for orders = 21 total. DataLoader solves this by batching and caching: all 20 order queries are combined into a single SELECT ... WHERE user_id IN (...) query.
// DataLoader batching example (Node.js)
const DataLoader = require('dataloader');
const orderLoader = new DataLoader(async (userIds) => {
// Single query: SELECT * FROM orders WHERE user_id IN (1, 2, 3, ...)
const orders = await db.query(
'SELECT * FROM orders WHERE user_id = ANY($1)',
[userIds]
);
// Map results back to input order
return userIds.map(id => orders.filter(o => o.user_id === id));
});
// In the User resolver
const resolvers = {
User: {
orders: (user) => orderLoader.load(user.id),
},
};
Introspection
GraphQL APIs are self-documenting. Clients can query the schema itself using introspection queries. This powers tools like GraphiQL and Apollo Studio.
# Introspection: list all types
{
__schema {
types {
name
kind
}
}
}
# Introspection: get fields of a type
{
__type(name: "User") {
fields {
name
type { name kind }
}
}
}
When to use GraphQL vs REST
GraphQL shines
- Complex data relationships (graphs)
- Multiple client types (web, mobile, IoT) needing different data shapes
- Reducing number of HTTP round trips
- Rapid front-end iteration without back-end changes
- Teams with strong schema-first culture
REST is better
- Simple CRUD resources with well-defined endpoints
- Heavy use of HTTP caching (CDN, ETags)
- File uploads and downloads
- Public APIs (REST is more universally understood)
- When you want strong HTTP semantics (status codes, methods)
Disable introspection in production for public-facing GraphQL APIs. It exposes your entire schema to potential attackers. Use persisted queries (allowlisted query strings) to prevent arbitrary query execution and mitigate denial-of-service via deeply nested queries.
gRPC
gRPC is a high-performance, open-source RPC (Remote Procedure Call) framework developed by Google. It uses Protocol Buffers (protobuf) as its interface definition language and serialization format, runs over HTTP/2, and supports multiple programming languages via code generation. gRPC is the standard for internal service-to-service communication in high-performance systems.
Service definition with .proto files
// user_service.proto
syntax = "proto3";
package user.v1;
option go_package = "github.com/myorg/api/user/v1";
// Service definition
service UserService {
// Unary RPC
rpc GetUser(GetUserRequest) returns (GetUserResponse);
rpc CreateUser(CreateUserRequest) returns (CreateUserResponse);
rpc DeleteUser(DeleteUserRequest) returns (DeleteUserResponse);
// Server streaming: server sends multiple responses
rpc ListUsers(ListUsersRequest) returns (stream User);
// Client streaming: client sends multiple requests
rpc UploadUsers(stream CreateUserRequest) returns (UploadUsersResponse);
// Bidirectional streaming
rpc SyncUsers(stream SyncRequest) returns (stream SyncResponse);
}
message User {
string id = 1;
string name = 2;
string email = 3;
Role role = 4;
google.protobuf.Timestamp created_at = 5;
}
enum Role {
ROLE_UNSPECIFIED = 0;
ROLE_ADMIN = 1;
ROLE_USER = 2;
ROLE_VIEWER = 3;
}
message GetUserRequest {
string id = 1;
}
message GetUserResponse {
User user = 1;
}
message CreateUserRequest {
string name = 1;
string email = 2;
Role role = 3;
}
message CreateUserResponse {
User user = 1;
}
message DeleteUserRequest {
string id = 1;
}
message DeleteUserResponse {}
message ListUsersRequest {
int32 page_size = 1;
string page_token = 2;
}
message UploadUsersResponse {
int32 created_count = 1;
}
message SyncRequest {
User user = 1;
}
message SyncResponse {
string status = 1;
User user = 2;
}
Streaming types
| Type | Client | Server | Use case |
|---|---|---|---|
| Unary | 1 request | 1 response | Standard request/response (like REST) |
| Server streaming | 1 request | N responses | Large result sets, real-time feeds |
| Client streaming | N requests | 1 response | File upload, batch operations |
| Bidirectional | N requests | N responses | Chat, live sync, interactive sessions |
Code generation
# Install protoc compiler and language plugins
# Generate Go code
protoc --go_out=. --go-grpc_out=. user_service.proto
# Generate Python code
python -m grpc_tools.protoc -I. \
--python_out=. --grpc_python_out=. user_service.proto
# Generate TypeScript code (using ts-proto)
protoc --plugin=./node_modules/.bin/protoc-gen-ts_proto \
--ts_proto_out=. user_service.proto
Metadata and deadlines
gRPC supports metadata (key-value pairs sent as HTTP/2 headers) for passing auth tokens, trace IDs, and other context. Deadlines propagate timeout expectations across service boundaries — if a deadline is exceeded, the call is cancelled.
# Python gRPC client with metadata and deadline
import grpc
from user.v1 import user_service_pb2, user_service_pb2_grpc
channel = grpc.insecure_channel('localhost:50051')
stub = user_service_pb2_grpc.UserServiceStub(channel)
# Set metadata (auth token, trace ID)
metadata = [
('authorization', 'Bearer eyJhbG...'),
('x-request-id', 'req-abc-123'),
]
# Set deadline (5 seconds from now)
response = stub.GetUser(
user_service_pb2.GetUserRequest(id='42'),
metadata=metadata,
timeout=5.0, # deadline in seconds
)
print(f"User: {response.user.name}")
gRPC uses HTTP/2 which provides multiplexing (multiple RPCs over a single TCP connection), header compression (HPACK), and bidirectional streaming. This makes gRPC significantly more efficient than REST over HTTP/1.1 for high-throughput internal communication. The binary protobuf encoding is 3–10x smaller than JSON for the same data.
API Documentation
Good API documentation is the difference between an API that gets adopted and one that gets abandoned. The OpenAPI Specification (formerly Swagger) is the industry standard for documenting REST APIs. It provides a machine-readable contract that powers documentation UIs, client SDKs, and testing tools.
Sample OpenAPI spec
# openapi.yaml
openapi: 3.1.0
info:
title: User API
description: Manage user accounts
version: 1.0.0
contact:
name: Platform Team
email: platform@example.com
servers:
- url: https://api.example.com/v1
description: Production
- url: https://staging-api.example.com/v1
description: Staging
paths:
/users:
get:
summary: List users
operationId: listUsers
tags:
- Users
parameters:
- name: limit
in: query
schema:
type: integer
default: 20
maximum: 100
- name: offset
in: query
schema:
type: integer
default: 0
responses:
'200':
description: A list of users
content:
application/json:
schema:
type: object
properties:
data:
type: array
items:
$ref: '#/components/schemas/User'
total:
type: integer
'401':
$ref: '#/components/responses/Unauthorized'
post:
summary: Create a user
operationId: createUser
tags:
- Users
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/CreateUserInput'
responses:
'201':
description: User created
content:
application/json:
schema:
$ref: '#/components/schemas/User'
headers:
Location:
schema:
type: string
'400':
$ref: '#/components/responses/BadRequest'
/users/{id}:
get:
summary: Get a user by ID
operationId: getUser
tags:
- Users
parameters:
- name: id
in: path
required: true
schema:
type: string
format: uuid
responses:
'200':
description: The user
content:
application/json:
schema:
$ref: '#/components/schemas/User'
'404':
$ref: '#/components/responses/NotFound'
components:
schemas:
User:
type: object
required: [id, name, email, role, created_at]
properties:
id:
type: string
format: uuid
name:
type: string
example: Alice
email:
type: string
format: email
role:
type: string
enum: [admin, user, viewer]
created_at:
type: string
format: date-time
CreateUserInput:
type: object
required: [name, email, role]
properties:
name:
type: string
minLength: 1
maxLength: 100
email:
type: string
format: email
role:
type: string
enum: [admin, user, viewer]
Error:
type: object
required: [type, title, status]
properties:
type:
type: string
format: uri
title:
type: string
status:
type: integer
detail:
type: string
instance:
type: string
format: uri
responses:
Unauthorized:
description: Authentication required
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
BadRequest:
description: Invalid request
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
NotFound:
description: Resource not found
content:
application/json:
schema:
$ref: '#/components/schemas/Error'
securitySchemes:
BearerAuth:
type: http
scheme: bearer
bearerFormat: JWT
security:
- BearerAuth: []
Documentation tools
Tool Swagger UI
Interactive API explorer generated from an OpenAPI spec. Developers can try API calls directly from the browser. Bundled with many API frameworks. Hosted at /docs or /swagger.
Tool Redoc
Clean, responsive, three-panel documentation UI. Better reading experience than Swagger UI for reference documentation. Supports nested schemas and markdown descriptions.
Tool Postman Collections
Shareable collections of API requests with examples, variables, and tests. Can be generated from OpenAPI specs. Great for onboarding and manual testing. Export as JSON for version control.
Practice API Changelog
Maintain a changelog documenting every API change: new endpoints, deprecated fields, behavior changes. Include the date, version, and migration instructions. Notify consumers via email or webhook before breaking changes.
Write descriptions for every operation, parameter, and schema property. Include realistic examples. The best API docs read like a tutorial, not a schema dump. Use example fields in your OpenAPI spec — they appear in Swagger UI's "Try it out" feature and in generated SDKs.
API Authentication
Every production API needs authentication (who is calling?) and often authorization (what are they allowed to do?). The choice of auth method depends on the API's audience, security requirements, and operational complexity.
Auth methods compared
| Method | Best for | Security | Complexity |
|---|---|---|---|
| API Keys | Simple integrations, internal services | Low (shared secret) | Low |
| Bearer Tokens (OAuth 2.0) | User-facing APIs, third-party access | High (scoped, expiring) | Medium |
| Basic Auth | Simple internal tools, CI/CD | Low (base64, not encrypted) | Low |
| Mutual TLS (mTLS) | Service-to-service, zero-trust | Very high (certificate-based) | High |
| HMAC Signatures | Webhooks, tamper-proof requests | High (request integrity) | Medium |
Header formats
# API Key (in header)
curl -H "X-API-Key: sk_live_abc123def456" \
https://api.example.com/v1/users
# API Key (in query parameter — less secure, logged in URLs)
curl "https://api.example.com/v1/users?api_key=sk_live_abc123def456"
# Bearer Token (OAuth 2.0 / JWT)
curl -H "Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJ1c2VyXzQyIiwic2NvcGUiOiJyZWFkOnVzZXJzIHdyaXRlOnVzZXJzIiwiZXhwIjoxNzExMDAwMDAwfQ.signature" \
https://api.example.com/v1/users
# Basic Auth (base64-encoded username:password)
curl -H "Authorization: Basic YWxpY2U6cGFzc3dvcmQxMjM=" \
https://api.example.com/v1/users
# Decoded: alice:password123
# HMAC Signature (webhook verification)
curl -H "X-Signature-256: sha256=5d7cee6c5e37b...abcdef" \
-H "X-Timestamp: 1711000000" \
-d '{"event":"user.created"}' \
https://api.example.com/webhooks
HMAC signature verification
# Server-side HMAC webhook verification (Python)
import hmac
import hashlib
def verify_webhook(payload: bytes, signature: str, secret: str) -> bool:
expected = hmac.new(
secret.encode(),
payload,
hashlib.sha256
).hexdigest()
received = signature.replace('sha256=', '')
return hmac.compare_digest(expected, received)
Rate limiting per API key
Different API keys can have different rate limits. Free-tier keys might get 100 requests/minute, paid keys 10,000. The API gateway enforces this by looking up the key's plan and applying the corresponding rate limit.
Never send API keys in query parameters for production APIs — they appear in server logs, browser history, and CDN caches. Always use headers. For Bearer tokens, set short expiration times (15–60 minutes) and implement token refresh. For mTLS, automate certificate rotation.
API Versioning
APIs evolve. Versioning strategies let you introduce changes without breaking existing consumers. The key is to make the version explicit so clients can migrate at their own pace.
Versioning strategies
| Strategy | Example | Pros | Cons |
|---|---|---|---|
| URL path | /v1/users | Simple, visible, cacheable | URL pollution, hard to share resources across versions |
| Query parameter | /users?version=1 | Easy to add, optional | Easy to forget, poor cache behavior |
| Header | Accept: application/vnd.api.v1+json | Clean URLs, proper HTTP semantics | Hidden, harder to test in browser |
| Content negotiation | Accept: application/vnd.api+json; version=2 | Most RESTful approach | Rarely used, tooling support varies |
Breaking vs non-breaking changes
Safe Non-breaking changes
- Adding new endpoints
- Adding new optional fields to request/response
- Adding new enum values (if clients handle unknown values)
- Adding new query parameters
- Relaxing validation (accepting wider input)
Breaking Breaking changes
- Removing or renaming fields
- Changing field types
- Removing endpoints
- Adding required fields to requests
- Changing status codes or error formats
- Tightening validation (rejecting previously valid input)
Deprecation and the Sunset header
RFC 8594 defines the Sunset header to communicate when an API version or endpoint will be removed. RFC 9745 standardizes the Deprecation header field, which signals that a resource has been or will be deprecated. Combine both headers to give consumers advance warning of an endpoint's full lifecycle.
# Response headers for a deprecated endpoint
HTTP/1.1 200 OK
Deprecation: @1719792000
Sunset: Sat, 01 Nov 2026 00:00:00 GMT
Link: <https://api.example.com/v2/users>; rel="successor-version"
# Response body can include deprecation notice
{
"data": [...],
"_deprecation": {
"message": "v1 is deprecated. Migrate to v2 by November 2026.",
"successor": "https://api.example.com/v2/users",
"sunset": "2026-11-01T00:00:00Z"
}
}
URL path versioning (/v1/) is the most common strategy because it's the simplest to implement and reason about. However, avoid creating a new version for every change. Only bump the major version for genuinely breaking changes. Use additive, non-breaking changes within a version as much as possible.
Rate Limiting & Throttling
Rate limiting protects your API from abuse, ensures fair usage among consumers, and prevents any single client from overwhelming your infrastructure. It is a critical production concern for every public and internal API.
Algorithms
Algorithm Token Bucket
A bucket holds tokens up to a maximum capacity. Each request consumes a token. Tokens are refilled at a fixed rate. If the bucket is empty, requests are rejected. Allows short bursts while maintaining an average rate.
Algorithm Leaky Bucket
Requests enter a queue (bucket) and are processed at a fixed rate. Excess requests overflow and are dropped. Produces a smooth, constant output rate. No bursting.
Algorithm Sliding Window
Tracks requests in a rolling time window. Combines the simplicity of fixed windows with the accuracy of per-request tracking. Prevents the boundary spike problem of fixed windows.
Algorithm Fixed Window
Counts requests in discrete time windows (e.g., per minute). Simple to implement but vulnerable to boundary spikes: a client can make 2x the limit by timing requests at the window boundary.
Rate limit headers
# Standard rate limit response headers
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000 # Max requests per window
X-RateLimit-Remaining: 847 # Requests remaining in current window
X-RateLimit-Reset: 1711000060 # Unix timestamp when window resets
# When rate limit is exceeded
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1711000060
Retry-After: 30 # Seconds until the client should retry
Content-Type: application/json
{
"type": "https://api.example.com/errors/rate-limit-exceeded",
"title": "Rate Limit Exceeded",
"status": 429,
"detail": "You have exceeded 1000 requests per minute. Retry after 30 seconds."
}
Per-user vs per-endpoint limits
| Scope | Example | Purpose |
|---|---|---|
| Per API key | 1000 req/min per key | Fair usage across consumers |
| Per endpoint | 100 req/min on POST /users | Protect expensive operations |
| Per IP | 50 req/min per IP | Brute-force protection (login endpoints) |
| Global | 50,000 req/min total | Infrastructure protection |
API quotas
Quotas are longer-term limits: "10,000 API calls per day" or "1,000,000 per month." They are distinct from rate limits (which are per-second or per-minute). Quotas map to billing tiers. When a quota is exhausted, return 429 with a Retry-After header set to the quota reset time.
Always return rate limit headers on every response, not just when limits are exceeded. This lets clients implement proactive backoff before hitting the limit. Use the Retry-After header (RFC 9110, Section 10.2.3) to tell clients exactly when to retry.
Error Handling
Consistent, informative error responses are essential for API usability. Developers spend more time debugging errors than reading success responses. Use a standardized error format across your entire API surface.
RFC 9457 Problem Details
RFC 9457 (which obsoletes RFC 7807) defines a standard JSON format for HTTP API error responses. It provides a consistent structure that clients can parse programmatically. RFC 9457 is fully backward-compatible with RFC 7807 and adds a common problem type registry and clearer guidance on representing multiple problems.
{
"type": "https://api.example.com/errors/validation-failed",
"title": "Validation Failed",
"status": 422,
"detail": "The request body contains invalid fields.",
"instance": "/api/v1/users",
"errors": [
{
"field": "email",
"code": "invalid_format",
"message": "Must be a valid email address"
},
{
"field": "name",
"code": "too_short",
"message": "Must be at least 1 character",
"meta": {
"min_length": 1,
"actual_length": 0
}
}
],
"request_id": "req-abc-123"
}
Error response structure
| Field | Type | Purpose |
|---|---|---|
type | URI | A URL identifying the error type (can link to documentation) |
title | String | Short, human-readable summary |
status | Integer | HTTP status code (matches the response status) |
detail | String | Detailed, human-readable explanation specific to this occurrence |
instance | URI | URI reference identifying the specific occurrence |
errors | Array | Field-level validation errors (extension) |
request_id | String | Correlation ID for debugging (extension) |
Retry-safe vs terminal errors
Retryable Safe to retry
429 Too Many Requests— retry after backoff500 Internal Server Error— transient failures502 Bad Gateway— upstream temporarily down503 Service Unavailable— server overloaded504 Gateway Timeout— upstream timed out
Terminal Do not retry
400 Bad Request— fix the request first401 Unauthorized— refresh the token, then retry403 Forbidden— insufficient permissions404 Not Found— resource does not exist409 Conflict— resolve the conflict first422 Unprocessable Entity— fix validation errors
Client-side error handling
// Robust API client with retry logic
async function apiRequest(url, options = {}, retries = 3) {
for (let attempt = 1; attempt <= retries; attempt++) {
const response = await fetch(url, {
...options,
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${getToken()}`,
...options.headers,
},
});
if (response.ok) return response.json();
const error = await response.json();
// Terminal errors: don't retry
if ([400, 401, 403, 404, 409, 422].includes(response.status)) {
throw new ApiError(error);
}
// Rate limited: wait and retry
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After') || 5;
await sleep(retryAfter * 1000);
continue;
}
// Server error: exponential backoff
if (response.status >= 500 && attempt < retries) {
await sleep(Math.pow(2, attempt) * 1000);
continue;
}
throw new ApiError(error);
}
}
Always include a request_id in error responses. This lets developers correlate their failed request with your server logs. Use the same ID in your structured logging, distributed traces, and support tickets. A single ID that threads through the entire request lifecycle saves hours of debugging.
API Gateway
An API gateway sits between clients and your backend services. It handles cross-cutting concerns — authentication, rate limiting, routing, transformation, caching, and logging — so your services don't have to. Every production API deployment should have a gateway layer.
What a gateway does
Gateway tools
| Tool | Type | Key features |
|---|---|---|
| Kong | Open source / Enterprise | Plugin ecosystem, Lua/Go extensibility, DB-less mode, Kubernetes-native |
| Apache APISIX | Open source | High performance (etcd-based), dynamic routing, Wasm plugins |
| AWS API Gateway | Managed | Lambda integration, WebSocket support, usage plans, no infrastructure to manage |
| Envoy | Open source proxy | L4/L7 proxy, xDS control plane, gRPC-native, often used with Istio or Consul Connect |
| Traefik | Open source | Auto-discovery, Let's Encrypt, Docker/K8s native, middleware chains |
Gateway patterns
Pattern Edge Gateway
A single gateway at the edge of your network handling all external traffic. Handles TLS termination, auth, rate limiting, and routes to internal services. The most common pattern.
Pattern Micro-Gateway
Per-team or per-service gateways. Each team manages their own gateway config. Reduces blast radius and enables independent deployment. Used in large organizations with many teams.
Pattern BFF (Backend for Frontend)
A dedicated gateway per client type (web, mobile, IoT). Each BFF aggregates and transforms data from backend services into the shape that specific client needs. Eliminates the "one size fits all" API problem. The mobile BFF might return minimal payloads while the web BFF returns richer data. Each BFF is owned by the front-end team that consumes it.
Start with a single edge gateway. Only introduce BFF or micro-gateway patterns when you have distinct client types with genuinely different data needs, or when the edge gateway becomes a bottleneck for team autonomy. Complexity has a cost.
Testing & Monitoring
APIs need testing at multiple levels: contract validation, integration correctness, and performance under load. In production, APIs need continuous monitoring for latency, error rates, and availability.
Contract testing
Pact is the most popular contract testing framework. It verifies that the API provider honors the contract expected by each consumer, without requiring a running instance of the consumer.
// Pact consumer test (JavaScript)
const { Pact, Matchers } = require('@pact-foundation/pact');
const provider = new Pact({
consumer: 'WebApp',
provider: 'UserAPI',
});
describe('User API Contract', () => {
beforeAll(() => provider.setup());
afterAll(() => provider.finalize());
it('returns a user by ID', async () => {
await provider.addInteraction({
state: 'user 42 exists',
uponReceiving: 'a request for user 42',
withRequest: {
method: 'GET',
path: '/api/v1/users/42',
headers: { Accept: 'application/json' },
},
willRespondWith: {
status: 200,
headers: { 'Content-Type': 'application/json' },
body: {
id: '42',
name: Matchers.string('Alice'),
email: Matchers.email(),
},
},
});
const response = await fetch(
`${provider.mockService.baseUrl}/api/v1/users/42`,
{ headers: { Accept: 'application/json' } }
);
expect(response.status).toBe(200);
});
});
Load testing
// k6 load test script
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '1m', target: 50 }, // Ramp up to 50 users
{ duration: '3m', target: 50 }, // Stay at 50
{ duration: '1m', target: 200 }, // Spike to 200
{ duration: '2m', target: 200 }, // Stay at 200
{ duration: '1m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95) < 500'], // 95th percentile < 500ms
http_req_failed: ['rate < 0.01'], // Error rate < 1%
},
};
export default function () {
const res = http.get('https://api.example.com/v1/users', {
headers: { Authorization: `Bearer ${__ENV.API_TOKEN}` },
});
check(res, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
'has users': (r) => JSON.parse(r.body).data.length > 0,
});
sleep(1);
}
Monitoring and SLAs
Metric Latency Percentiles
Track p50, p95, and p99 latencies per endpoint. Averages hide outliers. A p95 of 200ms means 5% of requests take longer than 200ms — those 5% might be your most important customers.
Metric Error Rate
Track 4xx and 5xx rates separately. A spike in 5xx means your service is broken. A spike in 4xx might mean a client released a buggy update, or your rate limits are too aggressive.
Practice Health Check Endpoints
Expose GET /health (basic liveness) and GET /ready (full readiness including dependencies). Health checks should be fast, unauthenticated, and not cached. Use them for load balancer checks and Kubernetes probes.
Practice Synthetic Monitoring
Run automated API calls from external locations on a schedule (every 1–5 minutes). Detects outages before real users report them. Tools: Datadog Synthetic, Pingdom, Checkly, or a simple cron + curl script.
OpenTelemetry for API tracing
# Python: instrument a Flask API with OpenTelemetry
from opentelemetry import trace
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# Setup tracer
provider = TracerProvider()
exporter = OTLPSpanExporter(endpoint="http://otel-collector:4317")
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)
# Auto-instrument Flask and outgoing HTTP requests
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
RequestsInstrumentor().instrument()
# Every request now generates a trace with:
# - HTTP method, path, status code
# - Request/response sizes
# - Downstream service calls (propagated context)
Combine the "four golden signals" for API monitoring: latency (how fast), traffic (how much), errors (how many failures), and saturation (how full is the system). Set SLOs (Service Level Objectives) for each: e.g., "p99 latency < 1s, error rate < 0.1%, availability > 99.9%."
Production Checklist
- Define the API contract first — write the OpenAPI spec or
.protofile before writing implementation code. Review the contract with consumers. - Use HTTPS everywhere — never serve API traffic over plain HTTP, even internally. TLS is non-negotiable. Use TLS 1.3 where possible.
- Authenticate every request — choose an auth strategy (OAuth 2.0 for external, mTLS or API keys for internal). Validate tokens at the gateway.
- Implement rate limiting — protect every endpoint with rate limits. Return standard headers (
X-RateLimit-*,Retry-After). Set per-key and per-endpoint limits. - Version your API — use URL path versioning (
/v1/) for simplicity. Document breaking vs non-breaking changes. Use theSunsetheader for deprecations. - Standardize error responses — adopt RFC 9457 Problem Details (successor to RFC 7807). Include
request_idin every error. Distinguish retryable from terminal errors. - Document every endpoint — maintain a complete OpenAPI spec with descriptions, examples, and schemas. Serve interactive docs via Swagger UI or Redoc.
- Add health check endpoints — implement
/health(liveness) and/ready(readiness). Use them for load balancers, Kubernetes probes, and synthetic monitoring. - Implement request validation — validate all inputs (types, ranges, formats, required fields) at the boundary. Reject invalid requests early with clear field-level error messages.
- Set up monitoring and alerting — track latency percentiles (p50, p95, p99), error rates, and throughput per endpoint. Alert on SLO breaches. Use OpenTelemetry for distributed tracing.
- Write contract tests — use Pact or similar tools to verify that API changes don't break consumers. Run contract tests in CI/CD before every deployment.
- Load test before launch — use k6 or wrk to simulate expected traffic patterns. Identify bottlenecks, establish baseline performance, and set autoscaling thresholds.
- Use an API gateway — deploy Kong, APISIX, or a cloud-managed gateway for centralized auth, rate limiting, logging, and routing. Don't reimplement these in every service.
- Log structured request data — log method, path, status code, latency, request_id, and user_id for every request. Use structured JSON logging. Correlate with traces.
- Plan for deprecation — publish a deprecation policy. Give consumers at least 6 months notice before removing any endpoint or field. Monitor deprecated endpoint usage before removal.