Elastic Production Guide

Overview

The Elastic Stack (formerly the ELK Stack) is a collection of open-source tools for ingesting, storing, searching, and visualizing data. The core components are Elasticsearch (distributed search and analytics engine), Kibana (visualization and management UI), Logstash (server-side data processing pipeline), and Beats (lightweight data shippers). When Fluentd replaces Logstash, the stack is sometimes called EFK.

Core Elasticsearch

A distributed, RESTful search and analytics engine built on Apache Lucene. Stores documents as JSON, indexes them using an inverted index, and provides near-real-time full-text search, structured queries, and aggregations across massive datasets.

Core Kibana

The web UI for the Elastic Stack. Create dashboards, explore data with Discover, build visualizations with Lens, manage indices and ILM policies, configure security, and monitor cluster health.

Ingest Logstash

A server-side data processing pipeline. Reads data from multiple sources (files, syslog, Kafka, Beats), transforms it with filters (grok, mutate, geoip), and ships it to Elasticsearch or other destinations. Heavy but flexible.

Ingest Beats

Lightweight, single-purpose data shippers installed on edge hosts. Filebeat (logs), Metricbeat (metrics), Packetbeat (network), Heartbeat (uptime), Auditbeat (audit). Ship directly to Elasticsearch or through Logstash.

Common use cases

Log analytics — centralize logs from applications, containers, and infrastructure. Search, filter, and correlate across millions of log lines.
Full-text search — power search features in applications (e-commerce catalogs, documentation sites, knowledge bases).
Observability — combine logs, metrics, and APM traces in a single platform. Elastic APM instruments applications for distributed tracing.
SIEM — Elastic Security provides threat detection, investigation, and response. Ingests security events, runs detection rules, and integrates with MITRE ATT&CK.
Infrastructure monitoring — collect system and service metrics with Metricbeat. Visualize CPU, memory, disk, and network across fleets of servers.

Elasticsearch Architecture

Elasticsearch is a distributed system. Data is stored across multiple nodes in a cluster, divided into shards for parallelism and replicated for fault tolerance. Understanding the architecture is essential for capacity planning and troubleshooting.

Cluster: "production-logs" ├─ Node 1 (master, data_hot) │ ├─ Index: logs-2026.03.19 [primary shard 0] [primary shard 1] │ └─ Index: logs-2026.03.18 [replica shard 0] ├─ Node 2 (data_hot) │ ├─ Index: logs-2026.03.19 [replica shard 0] [replica shard 1] │ └─ Index: logs-2026.03.18 [primary shard 0] ├─ Node 3 (data_warm) │ └─ Index: logs-2026.03.10 [primary shard 0] [replica shard 0] └─ Node 4 (coordinating-only) └─ No data — routes queries, aggregates results

Core concepts

Storage Index

An index is a collection of documents with similar characteristics. Analogous to a database table. Each index has a mapping (schema) defining field types. Indices are typically time-based for log data: logs-2026.03.19.

Storage Document

The basic unit of information in Elasticsearch. A JSON object stored in an index. Each document has a unique _id and is assigned to a shard based on a routing algorithm (by default, hash(_id) % num_shards).

Distribution Shards

An index is split into primary shards (default: 1). Each primary shard has zero or more replica shards on different nodes for redundancy. Shards are Lucene indices under the hood — the actual unit of storage and search.

Distribution Segments

Each shard is composed of immutable segments. When documents are indexed, they are written to an in-memory buffer, then flushed to a segment on disk. Segments are periodically merged (compacted) to reduce count and reclaim space from deleted documents.

Node roles

Role	Config	Purpose
Master-eligible	`node.roles: [master]`	Participates in cluster state management (index creation, shard allocation). Dedicated masters recommended in production (3 for quorum).
Data	`node.roles: [data]`	Stores data and executes search/aggregation. Can be further specialized: `data_hot`, `data_warm`, `data_cold`, `data_frozen`.
Ingest	`node.roles: [ingest]`	Runs ingest pipelines (pre-processing documents before indexing). Lightweight transformations like geoip, date parsing, field removal.
Coordinating-only	`node.roles: []`	Routes requests, scatters queries to data nodes, gathers and reduces results. Acts as a smart load balancer. No data, no master election.
ML	`node.roles: [ml]`	Runs machine learning jobs (anomaly detection, classification). Isolate ML workloads from search traffic.
Transform	`node.roles: [transform]`	Executes transform jobs that pivot or aggregate data into summary indices.

Inverted index

Elasticsearch uses an inverted index for full-text search. Instead of mapping documents to words (forward index), it maps each unique term to the list of documents containing that term. This allows sub-second lookups even across billions of documents. Text fields are analyzed (tokenized, lowercased, stemmed) before indexing; keyword fields are stored as-is for exact matching.

# elasticsearch.yml — production node configuration
cluster.name: production-logs
node.name: es-node-01
node.roles: [master, data_hot]

path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

network.host: 0.0.0.0
http.port: 9200
transport.port: 9300

discovery.seed_hosts:
  - es-node-01:9300
  - es-node-02:9300
  - es-node-03:9300
cluster.initial_master_nodes:
  - es-node-01
  - es-node-02
  - es-node-03

# Security (required since 8.x)
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: certs/http.p12

Indexing & Mappings

A mapping defines how documents and their fields are stored and indexed. Getting mappings right is critical — changing the mapping of an existing field requires reindexing all data.

Explicit mapping

PUT /logs-app
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 1,
    "refresh_interval": "5s"
  },
  "mappings": {
    "properties": {
      "@timestamp":  { "type": "date" },
      "message":     { "type": "text", "analyzer": "standard" },
      "level":       { "type": "keyword" },
      "service":     { "type": "keyword" },
      "host":        { "type": "keyword" },
      "duration_ms": { "type": "integer" },
      "request_id":  { "type": "keyword" },
      "user_agent":  { "type": "text", "fields": {
        "raw": { "type": "keyword" }
      }},
      "geo": {
        "properties": {
          "lat": { "type": "float" },
          "lon": { "type": "float" },
          "location": { "type": "geo_point" }
        }
      }
    }
  }
}

Common field types

Type	Use case	Notes
`text`	Full-text search	Analyzed (tokenized). Not suitable for sorting or aggregation. Use `.keyword` sub-field for exact match.
`keyword`	Exact values, filtering, aggregation	Not analyzed. IDs, status codes, tags, hostnames. Dynamic mapping sets `ignore_above: 256` on auto-generated `.keyword` sub-fields; explicitly mapped keyword fields have no limit unless you set one.
`date`	Timestamps	Stored internally as epoch millis. Supports multiple formats via `format` parameter.
`integer` / `long`	Whole numbers	Use `long` for values exceeding 2^31.
`float` / `double`	Decimal numbers	`scaled_float` is more efficient when precision is fixed (e.g., currency).
`boolean`	True/false flags	Stored as `true` / `false`.
`geo_point`	Latitude/longitude	Enables geo queries (distance, bounding box) and map visualizations in Kibana.
`nested`	Arrays of objects	Preserves the relationship between fields in each object. Without `nested`, object arrays are flattened.

Dynamic mapping

If you index a document into a non-existent index (or a field not in the mapping), Elasticsearch creates the mapping automatically. This is convenient for development but dangerous in production — a typo in a field name creates a new field, and string fields get both text and keyword sub-fields, wasting disk.

Warning

Set "dynamic": "strict" on production indices to reject documents with unmapped fields. This prevents mapping explosions from malformed data. Use index templates to enforce consistent mappings across time-based indices.

Index templates

PUT /_index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "data_stream": {},
  "priority": 100,
  "template": {
    "settings": {
      "number_of_shards": 2,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs-policy"
    },
    "mappings": {
      "dynamic": "strict",
      "properties": {
        "@timestamp": { "type": "date" },
        "message":    { "type": "text" },
        "level":      { "type": "keyword" },
        "service":    { "type": "keyword" },
        "host":       { "type": "keyword" }
      }
    }
  }
}

Index Lifecycle Management (ILM)

ILM automates index management through phases: hot (actively written/queried), warm (read-only, less frequent queries), cold (infrequent access, compressed), and delete (purged). Each phase can trigger actions like rollover, shrink, force merge, and searchable snapshot.

PUT /_ilm/policy/logs-policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb",
            "max_age": "1d"
          },
          "set_priority": { "priority": 100 }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": { "number_of_shards": 1 },
          "forcemerge": { "max_num_segments": 1 },
          "set_priority": { "priority": 50 }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "set_priority": { "priority": 0 }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Data tiers (7.10+)

Since Elasticsearch 7.10, ILM automatically migrates indices between data tiers (data_hot, data_warm, data_cold, data_frozen) based on node roles. You no longer need explicit allocate actions with custom node attributes. Just assign the correct node.roles to each node and ILM handles tier migration automatically.

Data streams

Data streams are the modern way to handle time-series data in Elasticsearch. They are an abstraction over rolling indices — you write to a single data stream name (logs-nginx-default) and Elasticsearch automatically manages backing indices, rollover, and ILM. Data streams support only create (op_type) operations by default; to update or delete individual documents, you must target the backing index directly.

Search & Query DSL

Elasticsearch's Query DSL is a JSON-based language for building search queries. Queries fall into two categories: full-text queries (analyzed, scored by relevance) and term-level queries (exact match, filtered, not scored).

Basic queries

// Match query — full-text search on analyzed fields
GET /logs-*/_search
{
  "query": {
    "match": {
      "message": "connection timeout"
    }
  }
}

// Term query — exact match on keyword fields
GET /logs-*/_search
{
  "query": {
    "term": {
      "level": "ERROR"
    }
  }
}

// Range query — numeric or date ranges
GET /logs-*/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "2026-03-19T00:00:00Z",
        "lte": "2026-03-19T23:59:59Z"
      }
    }
  }
}

Bool query (combining conditions)

GET /logs-*/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "message": "database" } }
      ],
      "filter": [
        { "term": { "level": "ERROR" } },
        { "range": { "@timestamp": { "gte": "now-1h" } } }
      ],
      "must_not": [
        { "term": { "service": "healthcheck" } }
      ],
      "should": [
        { "term": { "service": "api-gateway" } }
      ],
      "minimum_should_match": 0
    }
  },
  "size": 50,
  "sort": [{ "@timestamp": "desc" }],
  "_source": ["@timestamp", "level", "service", "message"]
}

Filter vs must

must clauses contribute to the relevance score. filter clauses do not — they simply include/exclude documents and are cached by Elasticsearch for performance. Always use filter for exact-match conditions (level, status codes, date ranges) and must only for full-text search where scoring matters.

Aggregations

GET /logs-*/_search
{
  "size": 0,
  "query": {
    "range": { "@timestamp": { "gte": "now-24h" } }
  },
  "aggs": {
    "errors_by_service": {
      "terms": { "field": "service", "size": 20 },
      "aggs": {
        "error_count": {
          "filter": { "term": { "level": "ERROR" } }
        },
        "avg_duration": {
          "avg": { "field": "duration_ms" }
        }
      }
    },
    "errors_over_time": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "1h"
      },
      "aggs": {
        "error_rate": {
          "filter": { "term": { "level": "ERROR" } }
        }
      }
    },
    "unique_users": {
      "cardinality": { "field": "user_id" }
    }
  }
}

Full-text vs structured search

Aspect	Full-text (text fields)	Structured (keyword fields)
Analysis	Tokenized, lowercased, stemmed	Stored as-is (exact value)
Query type	`match`, `multi_match`, `match_phrase`	`term`, `terms`, `range`, `exists`
Scoring	Yes — BM25 relevance scoring	No — binary match (use in `filter` context)
Use case	Search bars, log message search	Filtering by status, service, host, level
Aggregation	Not directly (use `.keyword` sub-field)	Yes — terms, histograms, cardinality

Kibana

Kibana is the visualization and management layer of the Elastic Stack. It provides a browser-based UI for exploring data, building dashboards, managing cluster settings, and configuring security.

Explore Discover

Interactive log/event explorer. Select a data view (index pattern), set a time range, and search with KQL or Lucene syntax. View individual documents, expand fields, and add filter pills. The starting point for most investigations.

Visualize Lens

Drag-and-drop visualization builder. Supports bar charts, line charts, pie charts, heatmaps, gauges, tables, and more. Suggests chart types based on your data. The primary way to create visualizations in modern Kibana (replaces legacy Visualize).

Monitor Dashboards

Combine multiple Lens visualizations, saved searches, and Markdown panels into a single view. Dashboards support global filters, time pickers, and drill-down links. Export/import dashboards as saved objects (NDJSON).

Organize Spaces

Logical groupings for dashboards, visualizations, and saved objects. Use spaces to separate environments (prod/staging) or teams (platform/security/app). Each space has its own set of data views and dashboards. RBAC controls who can access each space.

Data views (index patterns)

A data view tells Kibana which Elasticsearch indices to query. For example, logs-* matches all indices starting with logs-. Data views define the time field (typically @timestamp) and can include runtime fields for on-the-fly calculations.

Additional features

Canvas — pixel-perfect, presentation-ready dashboards. Supports custom CSS, images, and live data elements. Use for TV screens and executive reports.
Maps — geospatial data visualization. Plot geo_point fields on a map, draw heat layers, and create choropleth maps from aggregations.
Alerts — define rules that trigger actions (email, Slack, PagerDuty, webhook) when conditions are met (e.g., error rate exceeds threshold).
Dev Tools — interactive console for sending REST API requests to Elasticsearch. Essential for debugging queries and managing indices.
Stack Monitoring — monitor Elasticsearch, Kibana, Logstash, and Beats health from within Kibana. Track JVM heap, indexing rate, search latency, and thread pool stats.

# kibana.yml — production configuration
server.host: "0.0.0.0"
server.port: 5601
server.name: "kibana-prod"

elasticsearch.hosts: ["https://es-node-01:9200", "https://es-node-02:9200"]
elasticsearch.username: "kibana_system"
elasticsearch.password: "${KIBANA_ES_PASSWORD}"
elasticsearch.ssl.certificateAuthorities: ["/etc/kibana/certs/ca.crt"]

server.ssl.enabled: true
server.ssl.certificate: /etc/kibana/certs/kibana.crt
server.ssl.key: /etc/kibana/certs/kibana.key

xpack.encryptedSavedObjects.encryptionKey: "min-32-character-encryption-key-here"
xpack.security.encryptionKey: "min-32-character-encryption-key-here"
xpack.reporting.encryptionKey: "min-32-character-encryption-key-here"

logging.root.level: info

Logstash

Logstash is a server-side data processing pipeline with three stages: input (receive data), filter (transform data), and output (send data). Each stage uses plugins from a rich ecosystem.

Input Filter Output ┌────────────┐ ┌───────────────┐ ┌────────────────┐ │ beats │ │ grok │ │ elasticsearch │ │ syslog │──►│ date │──►│ stdout │ │ kafka │ │ mutate │ │ kafka │ │ file │ │ geoip │ │ s3 │ │ http │ │ useragent │ │ file │ └────────────┘ └───────────────┘ └────────────────┘

Pipeline configuration

# /etc/logstash/conf.d/nginx-logs.conf
input {
  beats {
    port => 5044
    ssl_enabled => true
    ssl_certificate => "/etc/logstash/certs/logstash.crt"
    ssl_key => "/etc/logstash/certs/logstash.key"
  }
}

filter {
  # Parse Nginx access log format
  grok {
    match => {
      "message" => '%{IPORHOST:client_ip} - %{DATA:user} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}" %{NUMBER:status:int} %{NUMBER:bytes:int} "%{DATA:referrer}" "%{DATA:user_agent}"'
    }
  }

  # Parse the timestamp
  date {
    match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
    target => "@timestamp"
    remove_field => ["timestamp"]
  }

  # Add GeoIP data
  geoip {
    source => "client_ip"
    target => "geo"
  }

  # Parse user agent string
  useragent {
    source => "user_agent"
    target => "ua"
  }

  # Remove unnecessary fields
  mutate {
    remove_field => ["message", "host", "agent"]
    convert => { "status" => "integer" }
  }

  # Tag 5xx errors
  if [status] >= 500 {
    mutate {
      add_tag => ["server_error"]
    }
  }
}

output {
  elasticsearch {
    hosts => ["https://es-node-01:9200", "https://es-node-02:9200"]
    index => "logs-nginx-%{+YYYY.MM.dd}"
    user => "logstash_writer"
    password => "${LOGSTASH_ES_PASSWORD}"
    ssl_enabled => true
    ssl_certificate_authorities => ["/etc/logstash/certs/ca.crt"]
  }
}

Multiple pipelines

Logstash supports running multiple pipelines simultaneously. Each pipeline has its own config file, worker threads, and queue. This isolates workloads and prevents a slow pipeline from blocking others.

# /etc/logstash/pipelines.yml
- pipeline.id: nginx
  path.config: "/etc/logstash/conf.d/nginx-logs.conf"
  pipeline.workers: 4
  pipeline.batch.size: 250

- pipeline.id: application
  path.config: "/etc/logstash/conf.d/app-logs.conf"
  pipeline.workers: 2
  pipeline.batch.size: 125

- pipeline.id: syslog
  path.config: "/etc/logstash/conf.d/syslog.conf"
  pipeline.workers: 2

Logstash vs ingest pipelines

Elasticsearch has built-in ingest pipelines that can do lightweight transformations (grok, date, geoip, rename) without Logstash. For simple log parsing, ingest pipelines are faster and require no extra infrastructure. Use Logstash when you need complex conditional logic, multiple inputs/outputs, stateful processing (aggregation, deduplication), or output to non-Elasticsearch destinations.

Beats

Beats are lightweight, single-purpose data shippers that run on your servers, containers, or edge devices. They consume minimal resources and send data directly to Elasticsearch or through Logstash for additional processing.

Logs Filebeat

Tails log files and forwards lines to Elasticsearch or Logstash. Handles log rotation, multiline events (stack traces), and back-pressure. Modules provide pre-built configs for Nginx, Apache, MySQL, system logs, Docker, Kubernetes, and many more.

Metrics Metricbeat

Collects system and service metrics. Ships CPU, memory, disk, network stats, plus service-specific metrics from Nginx, MySQL, PostgreSQL, Redis, Docker, Kubernetes, Prometheus endpoints, and more.

Network Packetbeat

Captures network traffic and decodes application-layer protocols (HTTP, DNS, MySQL, PostgreSQL, TLS). Provides real-time network monitoring without instrumenting applications.

Uptime Heartbeat

Probes endpoints (HTTP, TCP, ICMP) at regular intervals to monitor availability and response time. Powers Uptime monitoring in Kibana. Runs from a central location, not on each monitored host.

Security Auditbeat

Collects audit events from the Linux audit framework and file integrity monitoring. Tracks file changes, user logins, process execution, and socket connections. Feeds into Elastic Security for threat detection and compliance.

Filebeat configuration

# /etc/filebeat/filebeat.yml
filebeat.inputs:
  - type: filestream
    id: app-logs
    paths:
      - /var/log/myapp/*.log
    parsers:
      - multiline:
          pattern: '^\d{4}-\d{2}-\d{2}'
          negate: true
          match: after
    fields:
      service: myapp
      environment: production
    fields_under_root: true

  - type: container
    paths:
      - /var/lib/docker/containers/*/*.log
    processors:
      - add_docker_metadata: ~

# Modules (pre-built configs)
filebeat.modules:
  - module: nginx
    access:
      enabled: true
      var.paths: ["/var/log/nginx/access.log"]
    error:
      enabled: true

  - module: system
    syslog:
      enabled: true
    auth:
      enabled: true

# Output directly to Elasticsearch
output.elasticsearch:
  hosts: ["https://es-node-01:9200", "https://es-node-02:9200"]
  username: "filebeat_writer"
  password: "${FILEBEAT_ES_PASSWORD}"
  ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]
  indices:
    - index: "logs-nginx-%{+yyyy.MM.dd}"
      when.contains:
        fileset.module: "nginx"
    - index: "logs-app-%{+yyyy.MM.dd}"
      when.equals:
        service: "myapp"

# Or output to Logstash for additional processing
# output.logstash:
#   hosts: ["logstash-01:5044"]
#   ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

Beats vs Logstash

Aspect	Beats	Logstash
Resource usage	~30-50 MB RAM per agent	~1 GB+ RAM (JVM-based)
Deployment	On every host (edge agent)	Centralized servers
Processing	Basic (processors, modules)	Full transformation pipeline (grok, conditionals, aggregation)
Outputs	Elasticsearch, Logstash, Kafka, Redis	200+ output plugins
Buffering	In-memory + registry file	Persistent queue (disk-backed)
Best for	Simple collection, direct-to-ES shipping	Complex parsing, multiple inputs, routing logic

Common pattern

The most common production architecture is Beats → Logstash → Elasticsearch. Beats collect on the edge (lightweight), Logstash centralizes parsing and enrichment (powerful), and Elasticsearch stores and indexes. For simpler setups, Beats → Elasticsearch with ingest pipelines eliminates the Logstash tier entirely.

Security

Since Elasticsearch 8.x, security is enabled by default. TLS encryption and authentication are required out of the box. Earlier versions required manually enabling X-Pack security features.

TLS/SSL setup

Elasticsearch uses TLS for two layers: transport (node-to-node communication on port 9300) and HTTP (client-to-node on port 9200). Both must be encrypted in production.

# Generate a Certificate Authority
bin/elasticsearch-certutil ca --out elastic-stack-ca.p12 --pass ""

# Generate node certificates signed by the CA
bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12 \
  --out elastic-certificates.p12 --pass "" \
  --dns es-node-01,es-node-02,es-node-03 \
  --ip 10.0.1.10,10.0.1.11,10.0.1.12

# Generate HTTP certificate (for client connections)
bin/elasticsearch-certutil http

# Reset password for a built-in user (setup-passwords is deprecated)
bin/elasticsearch-reset-password -u elastic

Authentication realms

Realm	Type	Notes
Native	Built-in	Users stored in the `.security` index. Managed via API or Kibana. Default realm. Good for service accounts and small teams.
LDAP	External	Authenticate against Active Directory or OpenLDAP. Map LDAP groups to Elasticsearch roles.
SAML	SSO	Enterprise SSO via Okta, Microsoft Entra ID (formerly Azure AD), OneLogin. Kibana acts as the SAML Service Provider. Requires a Platinum/Enterprise license.
OIDC	SSO	OpenID Connect for modern identity providers. Similar to SAML but uses OAuth2 under the hood. Requires Platinum/Enterprise license.
PKI	Certificate	Client certificate authentication. Useful for node-to-node auth and automated systems.
API Keys	Token	Long-lived or expiring tokens for programmatic access. Scoped to specific indices and privileges. Best for applications and CI/CD.

RBAC (Role-Based Access Control)

// Create a role: read-only access to logs indices
POST /_security/role/logs_reader
{
  "cluster": ["monitor"],
  "indices": [
    {
      "names": ["logs-*"],
      "privileges": ["read", "view_index_metadata"],
      "field_security": {
        "grant": ["@timestamp", "message", "level", "service", "host"]
      }
    }
  ],
  "applications": [
    {
      "application": "kibana-.kibana",
      "privileges": ["feature_discover.read", "feature_dashboard.read"],
      "resources": ["space:production"]
    }
  ]
}

// Create a role: write access for Logstash
POST /_security/role/logstash_writer
{
  "cluster": ["manage_index_templates", "monitor", "manage_ilm"],
  "indices": [
    {
      "names": ["logs-*", "metrics-*"],
      "privileges": ["write", "create_index", "manage", "auto_configure"]
    }
  ]
}

// Create an API key for a service
POST /_security/api_key
{
  "name": "filebeat-prod-01",
  "role_descriptors": {
    "filebeat_writer": {
      "cluster": ["monitor"],
      "indices": [
        {
          "names": ["logs-filebeat-*"],
          "privileges": ["write", "create_index", "auto_configure"]
        }
      ]
    }
  },
  "expiration": "365d"
}

Critical

Never run Elasticsearch without TLS in production. Unencrypted traffic exposes credentials, query data, and index contents. The elastic superuser should only be used for initial setup — create dedicated service accounts with minimal privileges for every application, Beat, and Logstash instance.

Cluster Operations

Cluster health

Elasticsearch reports cluster health as a color: green (all primary and replica shards assigned), yellow (all primaries assigned, some replicas unassigned), red (some primary shards unassigned — data loss possible).

# Check cluster health
GET /_cluster/health

# See which shards are unassigned and why
GET /_cluster/allocation/explain

# List all indices with health and shard counts
GET /_cat/indices?v&s=index

# See shard allocation across nodes
GET /_cat/shards?v&s=index

Shard allocation and rebalancing

Elasticsearch automatically allocates shards across nodes and rebalances when nodes join or leave. You can control allocation with awareness attributes (rack, zone) and filters.

// Force awareness: spread replicas across zones
PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.awareness.attributes": "zone",
    "cluster.routing.allocation.awareness.force.zone.values": "us-east-1a,us-east-1b,us-east-1c"
  }
}

// Exclude a node from allocation (for maintenance)
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._name": "es-node-03"
  }
}

// Re-enable allocation after maintenance
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._name": ""
  }
}

Snapshot and restore

// Register a snapshot repository (S3)
PUT /_snapshot/s3-backups
{
  "type": "s3",
  "settings": {
    "bucket": "my-es-backups",
    "region": "us-east-1",
    "base_path": "production",
    "compress": true
  }
}

// Create a snapshot (all indices)
PUT /_snapshot/s3-backups/snapshot-2026-03-20?wait_for_completion=false

// Create a snapshot (specific indices)
PUT /_snapshot/s3-backups/logs-snapshot
{
  "indices": "logs-2026.03.*",
  "ignore_unavailable": true,
  "include_global_state": false
}

// Restore from snapshot
POST /_snapshot/s3-backups/snapshot-2026-03-20/_restore
{
  "indices": "logs-2026.03.19",
  "rename_pattern": "(.+)",
  "rename_replacement": "restored-$1"
}

Rolling upgrades

Disable shard allocation: PUT /_cluster/settings {"transient": {"cluster.routing.allocation.enable": "primaries"}}
Stop non-essential indexing and perform a flush: POST /_flush (synced flush was removed in 8.0; a normal flush has the same effect since 7.6).
Stop the Elasticsearch node, upgrade the package, start the node.
Wait for the node to rejoin the cluster: GET /_cat/nodes
Re-enable allocation: PUT /_cluster/settings {"transient": {"cluster.routing.allocation.enable": null}}
Wait for green health: GET /_cluster/health?wait_for_status=green
Repeat for each node.

Hot-warm-cold architecture

Tiered storage optimizes cost by placing recent, frequently-accessed data on fast SSDs (hot tier) and moving older data to cheaper storage (warm, cold). Combined with ILM, this is fully automated.

Hot Active writes + queries

Fast NVMe/SSD storage. Handles all indexing and most search traffic. Nodes tagged node.roles: [data_hot]. Typically retains 1-7 days of data.

Warm Read-only, regular queries

SSD or fast HDD. Indices are shrunk and force-merged. Nodes tagged node.roles: [data_warm]. Retains 7-30 days typically.

Cold Infrequent access

Cheap HDD or object storage. Data is searchable but slow. Nodes tagged node.roles: [data_cold]. Retains 30-365 days.

Frozen Archive

Searchable snapshots backed by object storage (S3, GCS, Azure Blob). Data is not stored locally — fetched from the snapshot on demand. Near-zero local storage cost.

Docker Deployment

Docker Compose is the fastest way to stand up a development or small production Elastic Stack. The following deploys a 3-node Elasticsearch cluster with Kibana and Filebeat.

# docker-compose.yml — 3-node Elasticsearch + Kibana + Filebeat
services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=docker-cluster
      - discovery.seed_hosts=es02,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - node.roles=master,data_hot,ingest
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD:-changeme}
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=true
      - xpack.security.http.ssl.keystore.path=certs/http.p12
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.keystore.path=certs/elastic-certificates.p12
      - xpack.security.transport.ssl.truststore.path=certs/elastic-certificates.p12
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - bootstrap.memory_lock=true
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - es01-data:/usr/share/elasticsearch/data
      - ./certs:/usr/share/elasticsearch/config/certs:ro
    ports:
      - "9200:9200"
    networks:
      - elastic
    healthcheck:
      test: ["CMD-SHELL", "curl -fsSk https://localhost:9200/_cluster/health || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 12
    restart: unless-stopped

  es02:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
    container_name: es02
    environment:
      - node.name=es02
      - cluster.name=docker-cluster
      - discovery.seed_hosts=es01,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - node.roles=master,data_hot,ingest
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD:-changeme}
      - xpack.security.enabled=true
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.keystore.path=certs/elastic-certificates.p12
      - xpack.security.transport.ssl.truststore.path=certs/elastic-certificates.p12
      - "ES_JAVA_OPTS=-Xms2g -Xmx2g"
      - bootstrap.memory_lock=true
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - es02-data:/usr/share/elasticsearch/data
      - ./certs:/usr/share/elasticsearch/config/certs:ro
    networks:
      - elastic
    restart: unless-stopped

  es03:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
    container_name: es03
    environment:
      - node.name=es03
      - cluster.name=docker-cluster
      - discovery.seed_hosts=es01,es02
      - cluster.initial_master_nodes=es01,es02,es03
      - node.roles=master,data_warm
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD:-changeme}
      - xpack.security.enabled=true
      - xpack.security.transport.ssl.enabled=true
      - xpack.security.transport.ssl.keystore.path=certs/elastic-certificates.p12
      - xpack.security.transport.ssl.truststore.path=certs/elastic-certificates.p12
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
      - bootstrap.memory_lock=true
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - es03-data:/usr/share/elasticsearch/data
      - ./certs:/usr/share/elasticsearch/config/certs:ro
    networks:
      - elastic
    restart: unless-stopped

  kibana:
    image: docker.elastic.co/kibana/kibana:8.17.0
    container_name: kibana
    environment:
      - ELASTICSEARCH_HOSTS=https://es01:9200
      - ELASTICSEARCH_USERNAME=kibana_system
      - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD:-changeme}
      - ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca.crt
      - SERVER_SSL_ENABLED=true
      - SERVER_SSL_CERTIFICATE=config/certs/kibana.crt
      - SERVER_SSL_KEY=config/certs/kibana.key
      - XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY=${ENCRYPTION_KEY}
    volumes:
      - ./certs:/usr/share/kibana/config/certs:ro
    ports:
      - "5601:5601"
    networks:
      - elastic
    depends_on:
      es01:
        condition: service_healthy
    restart: unless-stopped

  filebeat:
    image: docker.elastic.co/beats/filebeat:8.17.0
    container_name: filebeat
    user: root
    command: filebeat -e --strict.perms=false
    environment:
      - ELASTICSEARCH_HOSTS=https://es01:9200
      - ELASTICSEARCH_USERNAME=filebeat_writer
      - ELASTICSEARCH_PASSWORD=${FILEBEAT_PASSWORD:-changeme}
    volumes:
      - ./filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
      - ./certs:/usr/share/filebeat/certs:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - filebeat-data:/usr/share/filebeat/data
    networks:
      - elastic
    depends_on:
      es01:
        condition: service_healthy
    restart: unless-stopped

volumes:
  es01-data:
  es02-data:
  es03-data:
  filebeat-data:

networks:
  elastic:
    driver: bridge

Bootstrap checks

Elasticsearch enforces bootstrap checks in production mode (when network.host is not localhost). Key requirements: vm.max_map_count must be at least 262144 on the Docker host (sysctl -w vm.max_map_count=262144), memory locking must be enabled (bootstrap.memory_lock=true + Docker memlock ulimit), and file descriptor limits must be high enough (65536+).

Host preparation

Before running docker compose up, set the required kernel parameter on the Docker host:

# Set on running system
sudo sysctl -w vm.max_map_count=262144

# Persist across reboots
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf

Performance Tuning

JVM heap sizing

Elasticsearch runs on the JVM. Heap allocation is the single most impactful performance setting.

Set -Xms and -Xmx to the same value — avoids heap resize pauses during operation.
No more than 50% of physical RAM — the other 50% is used by Lucene for filesystem cache (critical for search performance).
No more than ~30-31 GB — beyond this threshold, the JVM loses compressed ordinary object pointers (CompressedOops), which wastes memory. Two 30 GB nodes beat one 64 GB node.
Use G1GC — the default since Elasticsearch 8.x. No need to tune GC settings for most workloads.

Rule of thumb

For a 64 GB server: set heap to -Xms30g -Xmx30g. The remaining 34 GB becomes filesystem cache for Lucene. For a 16 GB server: -Xms8g -Xmx8g. Never go below 1 GB heap.

Indexing performance

Bulk API Batch writes

Always use the _bulk API for indexing. Sending individual documents is 5-10x slower. Optimal bulk size is typically 5-15 MB per request (experiment to find the sweet spot for your cluster).

Refresh Interval tuning

Default refresh_interval is 1 second (near-real-time). For bulk ingestion, set to 30s or -1 (disable) during the load, then reset. Each refresh creates a new Lucene segment — fewer refreshes = less merge overhead.

Replicas Disable during load

Set number_of_replicas: 0 during initial bulk indexing. Re-enable after the load completes. Replication doubles the indexing work — skipping it during bulk loads significantly improves throughput.

Mapping Avoid dynamic mapping

Dynamic mapping detects types at index time, adding overhead. Define explicit mappings. Disable _source only if you never need to reindex or return full documents (rare).

Search optimization

Use filter context — filters are cached and skip scoring. Move all exact-match conditions out of must into filter.
Limit _source fields — return only fields you need with "_source": ["field1", "field2"].
Avoid deep pagination — from + size is O(n) and capped at 10,000 hits by default (index.max_result_window). For scrolling through large result sets, use search_after with a Point in Time (PIT) API. The scroll API is no longer recommended for this purpose.
Force merge read-only indices — merge to a single segment with POST /index/_forcemerge?max_num_segments=1. Dramatically faster searches on historical data.
Use keyword for aggregations — never aggregate on text fields. Use the .keyword sub-field or map fields as keyword from the start.

Circuit breakers

Elasticsearch has built-in circuit breakers that prevent operations from consuming too much memory and crashing the node. If you see CircuitBreakingException, you are hitting a limit.

Breaker	Default	Purpose
`indices.breaker.total.limit`	95% of heap (with `use_real_memory: true`, the default; 70% otherwise)	Total memory limit across all breakers.
`indices.breaker.fielddata.limit`	40% of heap	Field data cache (text field aggregations). Avoid loading text fields into field data.
`indices.breaker.request.limit`	60% of heap	Per-request memory (large aggregations, sorting on high-cardinality fields).
`network.breaker.inflight_requests.limit`	100% of heap	In-flight HTTP request data.

Production Checklist

Dedicated master nodes — use 3 dedicated master-eligible nodes for cluster stability. Never run master and heavy data workloads on the same node.
TLS everywhere — enable TLS on both transport and HTTP layers. Use certificates from a trusted CA (or Elasticsearch's built-in CA).
Authentication and RBAC — create dedicated users/roles for every application, Beat, and Logstash instance. Never share the elastic superuser.
Heap sizing — set -Xms = -Xmx, no more than 50% of RAM, no more than 31 GB. Leave the rest for filesystem cache.
Disable swapping — set bootstrap.memory_lock: true and configure OS-level memlock unlimited.
Increase vm.max_map_count — set to at least 262144. Required for Lucene's memory-mapped files.
Explicit mappings — define mappings for all indices. Use "dynamic": "strict" to prevent mapping explosions.
ILM policies — configure index lifecycle management for all time-series data. Automate rollover, shrink, force merge, and deletion.
Snapshot backups — configure automated snapshots to S3, GCS, or Azure Blob. Test restores regularly. Snapshots are the only reliable backup method.
Monitoring — enable Stack Monitoring in Kibana or ship metrics to a dedicated monitoring cluster. Watch heap usage, GC time, indexing rate, search latency, and thread pool rejections.
Shard sizing — aim for 20-50 GB per shard. Too many small shards waste resources; too few large shards limit parallelism. Monitor shard count per node (target under 1000).
Log rotation — configure Elasticsearch log rotation in log4j2.properties. Unbounded logs will fill disk and crash the node.
Network firewall — restrict port 9200 (HTTP) and 9300 (transport) to trusted IPs only. Never expose Elasticsearch directly to the internet.
Rolling upgrade plan — document and test the rolling upgrade procedure. Always read the breaking changes in release notes before upgrading.
Capacity planning — estimate daily ingestion volume, retention period, and replica count. Formula: total_storage = daily_volume × retention_days × (1 + num_replicas) × 1.1 (10% overhead).

Elastic Stack

Overview

Core Elasticsearch

Core Kibana

Ingest Logstash

Ingest Beats

Common use cases

Elasticsearch Architecture

Core concepts

Storage Index

Storage Document

Distribution Shards

Distribution Segments

Node roles

Inverted index

Indexing & Mappings

Explicit mapping

Common field types

Dynamic mapping

Index templates

Index Lifecycle Management (ILM)

Data streams

Search & Query DSL

Basic queries

Bool query (combining conditions)

Aggregations

Full-text vs structured search

Kibana

Explore Discover

Visualize Lens

Monitor Dashboards

Organize Spaces

Data views (index patterns)

Additional features

Logstash

Pipeline configuration

Multiple pipelines

Beats

Logs Filebeat

Metrics Metricbeat

Network Packetbeat

Uptime Heartbeat

Security Auditbeat

Filebeat configuration

Beats vs Logstash

Security

TLS/SSL setup

Authentication realms

RBAC (Role-Based Access Control)

Cluster Operations

Cluster health

Shard allocation and rebalancing

Snapshot and restore

Rolling upgrades

Hot-warm-cold architecture

Hot Active writes + queries

Warm Read-only, regular queries

Cold Infrequent access

Frozen Archive

Docker Deployment

Performance Tuning

JVM heap sizing

Indexing performance

Bulk API Batch writes

Refresh Interval tuning

Replicas Disable during load

Mapping Avoid dynamic mapping

Search optimization

Circuit breakers

Production Checklist

Inverted Index

What it is

How it works

Analyzers

Keyword vs text

Shards

Primary and replica shards

Shard allocation

Sizing guidelines

Common mistakes