Elastic Stack
Elasticsearch, Kibana, Logstash, and Beats — search, logging, observability, and SIEM
Overview
The Elastic Stack (formerly the ELK Stack) is a collection of open-source tools for ingesting, storing, searching, and visualizing data. The core components are Elasticsearch (distributed search and analytics engine), Kibana (visualization and management UI), Logstash (server-side data processing pipeline), and Beats (lightweight data shippers). When Fluentd replaces Logstash, the stack is sometimes called EFK.
Core Elasticsearch
A distributed, RESTful search and analytics engine built on Apache Lucene. Stores documents as JSON, indexes them using an inverted index, and provides near-real-time full-text search, structured queries, and aggregations across massive datasets.
Core Kibana
The web UI for the Elastic Stack. Create dashboards, explore data with Discover, build visualizations with Lens, manage indices and ILM policies, configure security, and monitor cluster health.
Ingest Logstash
A server-side data processing pipeline. Reads data from multiple sources (files, syslog, Kafka, Beats), transforms it with filters (grok, mutate, geoip), and ships it to Elasticsearch or other destinations. Heavy but flexible.
Ingest Beats
Lightweight, single-purpose data shippers installed on edge hosts. Filebeat (logs), Metricbeat (metrics), Packetbeat (network), Heartbeat (uptime), Auditbeat (audit). Ship directly to Elasticsearch or through Logstash.
Common use cases
- Log analytics — centralize logs from applications, containers, and infrastructure. Search, filter, and correlate across millions of log lines.
- Full-text search — power search features in applications (e-commerce catalogs, documentation sites, knowledge bases).
- Observability — combine logs, metrics, and APM traces in a single platform. Elastic APM instruments applications for distributed tracing.
- SIEM — Elastic Security provides threat detection, investigation, and response. Ingests security events, runs detection rules, and integrates with MITRE ATT&CK.
- Infrastructure monitoring — collect system and service metrics with Metricbeat. Visualize CPU, memory, disk, and network across fleets of servers.
Elasticsearch Architecture
Elasticsearch is a distributed system. Data is stored across multiple nodes in a cluster, divided into shards for parallelism and replicated for fault tolerance. Understanding the architecture is essential for capacity planning and troubleshooting.
Core concepts
Storage Index
An index is a collection of documents with similar characteristics. Analogous to a database table. Each index has a mapping (schema) defining field types. Indices are typically time-based for log data: logs-2026.03.19.
Storage Document
The basic unit of information in Elasticsearch. A JSON object stored in an index. Each document has a unique _id and is assigned to a shard based on a routing algorithm (by default, hash(_id) % num_shards).
Distribution Shards
An index is split into primary shards (default: 1). Each primary shard has zero or more replica shards on different nodes for redundancy. Shards are Lucene indices under the hood — the actual unit of storage and search.
Distribution Segments
Each shard is composed of immutable segments. When documents are indexed, they are written to an in-memory buffer, then flushed to a segment on disk. Segments are periodically merged (compacted) to reduce count and reclaim space from deleted documents.
Node roles
| Role | Config | Purpose |
|---|---|---|
| Master-eligible | node.roles: [master] | Participates in cluster state management (index creation, shard allocation). Dedicated masters recommended in production (3 for quorum). |
| Data | node.roles: [data] | Stores data and executes search/aggregation. Can be further specialized: data_hot, data_warm, data_cold, data_frozen. |
| Ingest | node.roles: [ingest] | Runs ingest pipelines (pre-processing documents before indexing). Lightweight transformations like geoip, date parsing, field removal. |
| Coordinating-only | node.roles: [] | Routes requests, scatters queries to data nodes, gathers and reduces results. Acts as a smart load balancer. No data, no master election. |
| ML | node.roles: [ml] | Runs machine learning jobs (anomaly detection, classification). Isolate ML workloads from search traffic. |
| Transform | node.roles: [transform] | Executes transform jobs that pivot or aggregate data into summary indices. |
Inverted index
Elasticsearch uses an inverted index for full-text search. Instead of mapping documents to words (forward index), it maps each unique term to the list of documents containing that term. This allows sub-second lookups even across billions of documents. Text fields are analyzed (tokenized, lowercased, stemmed) before indexing; keyword fields are stored as-is for exact matching.
# elasticsearch.yml — production node configuration
cluster.name: production-logs
node.name: es-node-01
node.roles: [master, data_hot]
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
transport.port: 9300
discovery.seed_hosts:
- es-node-01:9300
- es-node-02:9300
- es-node-03:9300
cluster.initial_master_nodes:
- es-node-01
- es-node-02
- es-node-03
# Security (required since 8.x)
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.keystore.path: certs/http.p12
Indexing & Mappings
A mapping defines how documents and their fields are stored and indexed. Getting mappings right is critical — changing the mapping of an existing field requires reindexing all data.
Explicit mapping
PUT /logs-app
{
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"refresh_interval": "5s"
},
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"message": { "type": "text", "analyzer": "standard" },
"level": { "type": "keyword" },
"service": { "type": "keyword" },
"host": { "type": "keyword" },
"duration_ms": { "type": "integer" },
"request_id": { "type": "keyword" },
"user_agent": { "type": "text", "fields": {
"raw": { "type": "keyword" }
}},
"geo": {
"properties": {
"lat": { "type": "float" },
"lon": { "type": "float" },
"location": { "type": "geo_point" }
}
}
}
}
}
Common field types
| Type | Use case | Notes |
|---|---|---|
text | Full-text search | Analyzed (tokenized). Not suitable for sorting or aggregation. Use .keyword sub-field for exact match. |
keyword | Exact values, filtering, aggregation | Not analyzed. IDs, status codes, tags, hostnames. Dynamic mapping sets ignore_above: 256 on auto-generated .keyword sub-fields; explicitly mapped keyword fields have no limit unless you set one. |
date | Timestamps | Stored internally as epoch millis. Supports multiple formats via format parameter. |
integer / long | Whole numbers | Use long for values exceeding 2^31. |
float / double | Decimal numbers | scaled_float is more efficient when precision is fixed (e.g., currency). |
boolean | True/false flags | Stored as true / false. |
geo_point | Latitude/longitude | Enables geo queries (distance, bounding box) and map visualizations in Kibana. |
nested | Arrays of objects | Preserves the relationship between fields in each object. Without nested, object arrays are flattened. |
Dynamic mapping
If you index a document into a non-existent index (or a field not in the mapping), Elasticsearch creates the mapping automatically. This is convenient for development but dangerous in production — a typo in a field name creates a new field, and string fields get both text and keyword sub-fields, wasting disk.
Set "dynamic": "strict" on production indices to reject documents with unmapped fields. This prevents mapping explosions from malformed data. Use index templates to enforce consistent mappings across time-based indices.
Index templates
PUT /_index_template/logs-template
{
"index_patterns": ["logs-*"],
"data_stream": {},
"priority": 100,
"template": {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"index.lifecycle.name": "logs-policy"
},
"mappings": {
"dynamic": "strict",
"properties": {
"@timestamp": { "type": "date" },
"message": { "type": "text" },
"level": { "type": "keyword" },
"service": { "type": "keyword" },
"host": { "type": "keyword" }
}
}
}
}
Index Lifecycle Management (ILM)
ILM automates index management through phases: hot (actively written/queried), warm (read-only, less frequent queries), cold (infrequent access, compressed), and delete (purged). Each phase can trigger actions like rollover, shrink, force merge, and searchable snapshot.
PUT /_ilm/policy/logs-policy
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_primary_shard_size": "50gb",
"max_age": "1d"
},
"set_priority": { "priority": 100 }
}
},
"warm": {
"min_age": "7d",
"actions": {
"shrink": { "number_of_shards": 1 },
"forcemerge": { "max_num_segments": 1 },
"set_priority": { "priority": 50 }
}
},
"cold": {
"min_age": "30d",
"actions": {
"set_priority": { "priority": 0 }
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
Since Elasticsearch 7.10, ILM automatically migrates indices between data tiers (data_hot, data_warm, data_cold, data_frozen) based on node roles. You no longer need explicit allocate actions with custom node attributes. Just assign the correct node.roles to each node and ILM handles tier migration automatically.
Data streams
Data streams are the modern way to handle time-series data in Elasticsearch. They are an abstraction over rolling indices — you write to a single data stream name (logs-nginx-default) and Elasticsearch automatically manages backing indices, rollover, and ILM. Data streams support only create (op_type) operations by default; to update or delete individual documents, you must target the backing index directly.
Search & Query DSL
Elasticsearch's Query DSL is a JSON-based language for building search queries. Queries fall into two categories: full-text queries (analyzed, scored by relevance) and term-level queries (exact match, filtered, not scored).
Basic queries
// Match query — full-text search on analyzed fields
GET /logs-*/_search
{
"query": {
"match": {
"message": "connection timeout"
}
}
}
// Term query — exact match on keyword fields
GET /logs-*/_search
{
"query": {
"term": {
"level": "ERROR"
}
}
}
// Range query — numeric or date ranges
GET /logs-*/_search
{
"query": {
"range": {
"@timestamp": {
"gte": "2026-03-19T00:00:00Z",
"lte": "2026-03-19T23:59:59Z"
}
}
}
}
Bool query (combining conditions)
GET /logs-*/_search
{
"query": {
"bool": {
"must": [
{ "match": { "message": "database" } }
],
"filter": [
{ "term": { "level": "ERROR" } },
{ "range": { "@timestamp": { "gte": "now-1h" } } }
],
"must_not": [
{ "term": { "service": "healthcheck" } }
],
"should": [
{ "term": { "service": "api-gateway" } }
],
"minimum_should_match": 0
}
},
"size": 50,
"sort": [{ "@timestamp": "desc" }],
"_source": ["@timestamp", "level", "service", "message"]
}
must clauses contribute to the relevance score. filter clauses do not — they simply include/exclude documents and are cached by Elasticsearch for performance. Always use filter for exact-match conditions (level, status codes, date ranges) and must only for full-text search where scoring matters.
Aggregations
GET /logs-*/_search
{
"size": 0,
"query": {
"range": { "@timestamp": { "gte": "now-24h" } }
},
"aggs": {
"errors_by_service": {
"terms": { "field": "service", "size": 20 },
"aggs": {
"error_count": {
"filter": { "term": { "level": "ERROR" } }
},
"avg_duration": {
"avg": { "field": "duration_ms" }
}
}
},
"errors_over_time": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "1h"
},
"aggs": {
"error_rate": {
"filter": { "term": { "level": "ERROR" } }
}
}
},
"unique_users": {
"cardinality": { "field": "user_id" }
}
}
}
Full-text vs structured search
| Aspect | Full-text (text fields) | Structured (keyword fields) |
|---|---|---|
| Analysis | Tokenized, lowercased, stemmed | Stored as-is (exact value) |
| Query type | match, multi_match, match_phrase | term, terms, range, exists |
| Scoring | Yes — BM25 relevance scoring | No — binary match (use in filter context) |
| Use case | Search bars, log message search | Filtering by status, service, host, level |
| Aggregation | Not directly (use .keyword sub-field) | Yes — terms, histograms, cardinality |
Kibana
Kibana is the visualization and management layer of the Elastic Stack. It provides a browser-based UI for exploring data, building dashboards, managing cluster settings, and configuring security.
Explore Discover
Interactive log/event explorer. Select a data view (index pattern), set a time range, and search with KQL or Lucene syntax. View individual documents, expand fields, and add filter pills. The starting point for most investigations.
Visualize Lens
Drag-and-drop visualization builder. Supports bar charts, line charts, pie charts, heatmaps, gauges, tables, and more. Suggests chart types based on your data. The primary way to create visualizations in modern Kibana (replaces legacy Visualize).
Monitor Dashboards
Combine multiple Lens visualizations, saved searches, and Markdown panels into a single view. Dashboards support global filters, time pickers, and drill-down links. Export/import dashboards as saved objects (NDJSON).
Organize Spaces
Logical groupings for dashboards, visualizations, and saved objects. Use spaces to separate environments (prod/staging) or teams (platform/security/app). Each space has its own set of data views and dashboards. RBAC controls who can access each space.
Data views (index patterns)
A data view tells Kibana which Elasticsearch indices to query. For example, logs-* matches all indices starting with logs-. Data views define the time field (typically @timestamp) and can include runtime fields for on-the-fly calculations.
Additional features
- Canvas — pixel-perfect, presentation-ready dashboards. Supports custom CSS, images, and live data elements. Use for TV screens and executive reports.
- Maps — geospatial data visualization. Plot
geo_pointfields on a map, draw heat layers, and create choropleth maps from aggregations. - Alerts — define rules that trigger actions (email, Slack, PagerDuty, webhook) when conditions are met (e.g., error rate exceeds threshold).
- Dev Tools — interactive console for sending REST API requests to Elasticsearch. Essential for debugging queries and managing indices.
- Stack Monitoring — monitor Elasticsearch, Kibana, Logstash, and Beats health from within Kibana. Track JVM heap, indexing rate, search latency, and thread pool stats.
# kibana.yml — production configuration
server.host: "0.0.0.0"
server.port: 5601
server.name: "kibana-prod"
elasticsearch.hosts: ["https://es-node-01:9200", "https://es-node-02:9200"]
elasticsearch.username: "kibana_system"
elasticsearch.password: "${KIBANA_ES_PASSWORD}"
elasticsearch.ssl.certificateAuthorities: ["/etc/kibana/certs/ca.crt"]
server.ssl.enabled: true
server.ssl.certificate: /etc/kibana/certs/kibana.crt
server.ssl.key: /etc/kibana/certs/kibana.key
xpack.encryptedSavedObjects.encryptionKey: "min-32-character-encryption-key-here"
xpack.security.encryptionKey: "min-32-character-encryption-key-here"
xpack.reporting.encryptionKey: "min-32-character-encryption-key-here"
logging.root.level: info
Logstash
Logstash is a server-side data processing pipeline with three stages: input (receive data), filter (transform data), and output (send data). Each stage uses plugins from a rich ecosystem.
Pipeline configuration
# /etc/logstash/conf.d/nginx-logs.conf
input {
beats {
port => 5044
ssl_enabled => true
ssl_certificate => "/etc/logstash/certs/logstash.crt"
ssl_key => "/etc/logstash/certs/logstash.key"
}
}
filter {
# Parse Nginx access log format
grok {
match => {
"message" => '%{IPORHOST:client_ip} - %{DATA:user} \[%{HTTPDATE:timestamp}\] "%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:http_version}" %{NUMBER:status:int} %{NUMBER:bytes:int} "%{DATA:referrer}" "%{DATA:user_agent}"'
}
}
# Parse the timestamp
date {
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
remove_field => ["timestamp"]
}
# Add GeoIP data
geoip {
source => "client_ip"
target => "geo"
}
# Parse user agent string
useragent {
source => "user_agent"
target => "ua"
}
# Remove unnecessary fields
mutate {
remove_field => ["message", "host", "agent"]
convert => { "status" => "integer" }
}
# Tag 5xx errors
if [status] >= 500 {
mutate {
add_tag => ["server_error"]
}
}
}
output {
elasticsearch {
hosts => ["https://es-node-01:9200", "https://es-node-02:9200"]
index => "logs-nginx-%{+YYYY.MM.dd}"
user => "logstash_writer"
password => "${LOGSTASH_ES_PASSWORD}"
ssl_enabled => true
ssl_certificate_authorities => ["/etc/logstash/certs/ca.crt"]
}
}
Multiple pipelines
Logstash supports running multiple pipelines simultaneously. Each pipeline has its own config file, worker threads, and queue. This isolates workloads and prevents a slow pipeline from blocking others.
# /etc/logstash/pipelines.yml
- pipeline.id: nginx
path.config: "/etc/logstash/conf.d/nginx-logs.conf"
pipeline.workers: 4
pipeline.batch.size: 250
- pipeline.id: application
path.config: "/etc/logstash/conf.d/app-logs.conf"
pipeline.workers: 2
pipeline.batch.size: 125
- pipeline.id: syslog
path.config: "/etc/logstash/conf.d/syslog.conf"
pipeline.workers: 2
Elasticsearch has built-in ingest pipelines that can do lightweight transformations (grok, date, geoip, rename) without Logstash. For simple log parsing, ingest pipelines are faster and require no extra infrastructure. Use Logstash when you need complex conditional logic, multiple inputs/outputs, stateful processing (aggregation, deduplication), or output to non-Elasticsearch destinations.
Beats
Beats are lightweight, single-purpose data shippers that run on your servers, containers, or edge devices. They consume minimal resources and send data directly to Elasticsearch or through Logstash for additional processing.
Logs Filebeat
Tails log files and forwards lines to Elasticsearch or Logstash. Handles log rotation, multiline events (stack traces), and back-pressure. Modules provide pre-built configs for Nginx, Apache, MySQL, system logs, Docker, Kubernetes, and many more.
Metrics Metricbeat
Collects system and service metrics. Ships CPU, memory, disk, network stats, plus service-specific metrics from Nginx, MySQL, PostgreSQL, Redis, Docker, Kubernetes, Prometheus endpoints, and more.
Network Packetbeat
Captures network traffic and decodes application-layer protocols (HTTP, DNS, MySQL, PostgreSQL, TLS). Provides real-time network monitoring without instrumenting applications.
Uptime Heartbeat
Probes endpoints (HTTP, TCP, ICMP) at regular intervals to monitor availability and response time. Powers Uptime monitoring in Kibana. Runs from a central location, not on each monitored host.
Security Auditbeat
Collects audit events from the Linux audit framework and file integrity monitoring. Tracks file changes, user logins, process execution, and socket connections. Feeds into Elastic Security for threat detection and compliance.
Filebeat configuration
# /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: filestream
id: app-logs
paths:
- /var/log/myapp/*.log
parsers:
- multiline:
pattern: '^\d{4}-\d{2}-\d{2}'
negate: true
match: after
fields:
service: myapp
environment: production
fields_under_root: true
- type: container
paths:
- /var/lib/docker/containers/*/*.log
processors:
- add_docker_metadata: ~
# Modules (pre-built configs)
filebeat.modules:
- module: nginx
access:
enabled: true
var.paths: ["/var/log/nginx/access.log"]
error:
enabled: true
- module: system
syslog:
enabled: true
auth:
enabled: true
# Output directly to Elasticsearch
output.elasticsearch:
hosts: ["https://es-node-01:9200", "https://es-node-02:9200"]
username: "filebeat_writer"
password: "${FILEBEAT_ES_PASSWORD}"
ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]
indices:
- index: "logs-nginx-%{+yyyy.MM.dd}"
when.contains:
fileset.module: "nginx"
- index: "logs-app-%{+yyyy.MM.dd}"
when.equals:
service: "myapp"
# Or output to Logstash for additional processing
# output.logstash:
# hosts: ["logstash-01:5044"]
# ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
Beats vs Logstash
| Aspect | Beats | Logstash |
|---|---|---|
| Resource usage | ~30-50 MB RAM per agent | ~1 GB+ RAM (JVM-based) |
| Deployment | On every host (edge agent) | Centralized servers |
| Processing | Basic (processors, modules) | Full transformation pipeline (grok, conditionals, aggregation) |
| Outputs | Elasticsearch, Logstash, Kafka, Redis | 200+ output plugins |
| Buffering | In-memory + registry file | Persistent queue (disk-backed) |
| Best for | Simple collection, direct-to-ES shipping | Complex parsing, multiple inputs, routing logic |
The most common production architecture is Beats → Logstash → Elasticsearch. Beats collect on the edge (lightweight), Logstash centralizes parsing and enrichment (powerful), and Elasticsearch stores and indexes. For simpler setups, Beats → Elasticsearch with ingest pipelines eliminates the Logstash tier entirely.
Security
Since Elasticsearch 8.x, security is enabled by default. TLS encryption and authentication are required out of the box. Earlier versions required manually enabling X-Pack security features.
TLS/SSL setup
Elasticsearch uses TLS for two layers: transport (node-to-node communication on port 9300) and HTTP (client-to-node on port 9200). Both must be encrypted in production.
# Generate a Certificate Authority
bin/elasticsearch-certutil ca --out elastic-stack-ca.p12 --pass ""
# Generate node certificates signed by the CA
bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12 \
--out elastic-certificates.p12 --pass "" \
--dns es-node-01,es-node-02,es-node-03 \
--ip 10.0.1.10,10.0.1.11,10.0.1.12
# Generate HTTP certificate (for client connections)
bin/elasticsearch-certutil http
# Reset password for a built-in user (setup-passwords is deprecated)
bin/elasticsearch-reset-password -u elastic
Authentication realms
| Realm | Type | Notes |
|---|---|---|
| Native | Built-in | Users stored in the .security index. Managed via API or Kibana. Default realm. Good for service accounts and small teams. |
| LDAP | External | Authenticate against Active Directory or OpenLDAP. Map LDAP groups to Elasticsearch roles. |
| SAML | SSO | Enterprise SSO via Okta, Microsoft Entra ID (formerly Azure AD), OneLogin. Kibana acts as the SAML Service Provider. Requires a Platinum/Enterprise license. |
| OIDC | SSO | OpenID Connect for modern identity providers. Similar to SAML but uses OAuth2 under the hood. Requires Platinum/Enterprise license. |
| PKI | Certificate | Client certificate authentication. Useful for node-to-node auth and automated systems. |
| API Keys | Token | Long-lived or expiring tokens for programmatic access. Scoped to specific indices and privileges. Best for applications and CI/CD. |
RBAC (Role-Based Access Control)
// Create a role: read-only access to logs indices
POST /_security/role/logs_reader
{
"cluster": ["monitor"],
"indices": [
{
"names": ["logs-*"],
"privileges": ["read", "view_index_metadata"],
"field_security": {
"grant": ["@timestamp", "message", "level", "service", "host"]
}
}
],
"applications": [
{
"application": "kibana-.kibana",
"privileges": ["feature_discover.read", "feature_dashboard.read"],
"resources": ["space:production"]
}
]
}
// Create a role: write access for Logstash
POST /_security/role/logstash_writer
{
"cluster": ["manage_index_templates", "monitor", "manage_ilm"],
"indices": [
{
"names": ["logs-*", "metrics-*"],
"privileges": ["write", "create_index", "manage", "auto_configure"]
}
]
}
// Create an API key for a service
POST /_security/api_key
{
"name": "filebeat-prod-01",
"role_descriptors": {
"filebeat_writer": {
"cluster": ["monitor"],
"indices": [
{
"names": ["logs-filebeat-*"],
"privileges": ["write", "create_index", "auto_configure"]
}
]
}
},
"expiration": "365d"
}
Never run Elasticsearch without TLS in production. Unencrypted traffic exposes credentials, query data, and index contents. The elastic superuser should only be used for initial setup — create dedicated service accounts with minimal privileges for every application, Beat, and Logstash instance.
Cluster Operations
Cluster health
Elasticsearch reports cluster health as a color: green (all primary and replica shards assigned), yellow (all primaries assigned, some replicas unassigned), red (some primary shards unassigned — data loss possible).
# Check cluster health
GET /_cluster/health
# See which shards are unassigned and why
GET /_cluster/allocation/explain
# List all indices with health and shard counts
GET /_cat/indices?v&s=index
# See shard allocation across nodes
GET /_cat/shards?v&s=index
Shard allocation and rebalancing
Elasticsearch automatically allocates shards across nodes and rebalances when nodes join or leave. You can control allocation with awareness attributes (rack, zone) and filters.
// Force awareness: spread replicas across zones
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.awareness.attributes": "zone",
"cluster.routing.allocation.awareness.force.zone.values": "us-east-1a,us-east-1b,us-east-1c"
}
}
// Exclude a node from allocation (for maintenance)
PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._name": "es-node-03"
}
}
// Re-enable allocation after maintenance
PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._name": ""
}
}
Snapshot and restore
// Register a snapshot repository (S3)
PUT /_snapshot/s3-backups
{
"type": "s3",
"settings": {
"bucket": "my-es-backups",
"region": "us-east-1",
"base_path": "production",
"compress": true
}
}
// Create a snapshot (all indices)
PUT /_snapshot/s3-backups/snapshot-2026-03-20?wait_for_completion=false
// Create a snapshot (specific indices)
PUT /_snapshot/s3-backups/logs-snapshot
{
"indices": "logs-2026.03.*",
"ignore_unavailable": true,
"include_global_state": false
}
// Restore from snapshot
POST /_snapshot/s3-backups/snapshot-2026-03-20/_restore
{
"indices": "logs-2026.03.19",
"rename_pattern": "(.+)",
"rename_replacement": "restored-$1"
}
Rolling upgrades
- Disable shard allocation:
PUT /_cluster/settings {"transient": {"cluster.routing.allocation.enable": "primaries"}} - Stop non-essential indexing and perform a flush:
POST /_flush(synced flush was removed in 8.0; a normal flush has the same effect since 7.6). - Stop the Elasticsearch node, upgrade the package, start the node.
- Wait for the node to rejoin the cluster:
GET /_cat/nodes - Re-enable allocation:
PUT /_cluster/settings {"transient": {"cluster.routing.allocation.enable": null}} - Wait for green health:
GET /_cluster/health?wait_for_status=green - Repeat for each node.
Hot-warm-cold architecture
Tiered storage optimizes cost by placing recent, frequently-accessed data on fast SSDs (hot tier) and moving older data to cheaper storage (warm, cold). Combined with ILM, this is fully automated.
Hot Active writes + queries
Fast NVMe/SSD storage. Handles all indexing and most search traffic. Nodes tagged node.roles: [data_hot]. Typically retains 1-7 days of data.
Warm Read-only, regular queries
SSD or fast HDD. Indices are shrunk and force-merged. Nodes tagged node.roles: [data_warm]. Retains 7-30 days typically.
Cold Infrequent access
Cheap HDD or object storage. Data is searchable but slow. Nodes tagged node.roles: [data_cold]. Retains 30-365 days.
Frozen Archive
Searchable snapshots backed by object storage (S3, GCS, Azure Blob). Data is not stored locally — fetched from the snapshot on demand. Near-zero local storage cost.
Docker Deployment
Docker Compose is the fastest way to stand up a development or small production Elastic Stack. The following deploys a 3-node Elasticsearch cluster with Kibana and Filebeat.
# docker-compose.yml — 3-node Elasticsearch + Kibana + Filebeat
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
container_name: es01
environment:
- node.name=es01
- cluster.name=docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- node.roles=master,data_hot,ingest
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD:-changeme}
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=true
- xpack.security.http.ssl.keystore.path=certs/http.p12
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.keystore.path=certs/elastic-certificates.p12
- xpack.security.transport.ssl.truststore.path=certs/elastic-certificates.p12
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
- bootstrap.memory_lock=true
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- es01-data:/usr/share/elasticsearch/data
- ./certs:/usr/share/elasticsearch/config/certs:ro
ports:
- "9200:9200"
networks:
- elastic
healthcheck:
test: ["CMD-SHELL", "curl -fsSk https://localhost:9200/_cluster/health || exit 1"]
interval: 10s
timeout: 5s
retries: 12
restart: unless-stopped
es02:
image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
container_name: es02
environment:
- node.name=es02
- cluster.name=docker-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- node.roles=master,data_hot,ingest
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD:-changeme}
- xpack.security.enabled=true
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.keystore.path=certs/elastic-certificates.p12
- xpack.security.transport.ssl.truststore.path=certs/elastic-certificates.p12
- "ES_JAVA_OPTS=-Xms2g -Xmx2g"
- bootstrap.memory_lock=true
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- es02-data:/usr/share/elasticsearch/data
- ./certs:/usr/share/elasticsearch/config/certs:ro
networks:
- elastic
restart: unless-stopped
es03:
image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
container_name: es03
environment:
- node.name=es03
- cluster.name=docker-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- node.roles=master,data_warm
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD:-changeme}
- xpack.security.enabled=true
- xpack.security.transport.ssl.enabled=true
- xpack.security.transport.ssl.keystore.path=certs/elastic-certificates.p12
- xpack.security.transport.ssl.truststore.path=certs/elastic-certificates.p12
- "ES_JAVA_OPTS=-Xms1g -Xmx1g"
- bootstrap.memory_lock=true
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- es03-data:/usr/share/elasticsearch/data
- ./certs:/usr/share/elasticsearch/config/certs:ro
networks:
- elastic
restart: unless-stopped
kibana:
image: docker.elastic.co/kibana/kibana:8.17.0
container_name: kibana
environment:
- ELASTICSEARCH_HOSTS=https://es01:9200
- ELASTICSEARCH_USERNAME=kibana_system
- ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD:-changeme}
- ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=config/certs/ca.crt
- SERVER_SSL_ENABLED=true
- SERVER_SSL_CERTIFICATE=config/certs/kibana.crt
- SERVER_SSL_KEY=config/certs/kibana.key
- XPACK_ENCRYPTEDSAVEDOBJECTS_ENCRYPTIONKEY=${ENCRYPTION_KEY}
volumes:
- ./certs:/usr/share/kibana/config/certs:ro
ports:
- "5601:5601"
networks:
- elastic
depends_on:
es01:
condition: service_healthy
restart: unless-stopped
filebeat:
image: docker.elastic.co/beats/filebeat:8.17.0
container_name: filebeat
user: root
command: filebeat -e --strict.perms=false
environment:
- ELASTICSEARCH_HOSTS=https://es01:9200
- ELASTICSEARCH_USERNAME=filebeat_writer
- ELASTICSEARCH_PASSWORD=${FILEBEAT_PASSWORD:-changeme}
volumes:
- ./filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
- ./certs:/usr/share/filebeat/certs:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
- filebeat-data:/usr/share/filebeat/data
networks:
- elastic
depends_on:
es01:
condition: service_healthy
restart: unless-stopped
volumes:
es01-data:
es02-data:
es03-data:
filebeat-data:
networks:
elastic:
driver: bridge
Elasticsearch enforces bootstrap checks in production mode (when network.host is not localhost). Key requirements: vm.max_map_count must be at least 262144 on the Docker host (sysctl -w vm.max_map_count=262144), memory locking must be enabled (bootstrap.memory_lock=true + Docker memlock ulimit), and file descriptor limits must be high enough (65536+).
Before running docker compose up, set the required kernel parameter on the Docker host:
# Set on running system
sudo sysctl -w vm.max_map_count=262144
# Persist across reboots
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
Performance Tuning
JVM heap sizing
Elasticsearch runs on the JVM. Heap allocation is the single most impactful performance setting.
- Set
-Xmsand-Xmxto the same value — avoids heap resize pauses during operation. - No more than 50% of physical RAM — the other 50% is used by Lucene for filesystem cache (critical for search performance).
- No more than ~30-31 GB — beyond this threshold, the JVM loses compressed ordinary object pointers (CompressedOops), which wastes memory. Two 30 GB nodes beat one 64 GB node.
- Use G1GC — the default since Elasticsearch 8.x. No need to tune GC settings for most workloads.
For a 64 GB server: set heap to -Xms30g -Xmx30g. The remaining 34 GB becomes filesystem cache for Lucene. For a 16 GB server: -Xms8g -Xmx8g. Never go below 1 GB heap.
Indexing performance
Bulk API Batch writes
Always use the _bulk API for indexing. Sending individual documents is 5-10x slower. Optimal bulk size is typically 5-15 MB per request (experiment to find the sweet spot for your cluster).
Refresh Interval tuning
Default refresh_interval is 1 second (near-real-time). For bulk ingestion, set to 30s or -1 (disable) during the load, then reset. Each refresh creates a new Lucene segment — fewer refreshes = less merge overhead.
Replicas Disable during load
Set number_of_replicas: 0 during initial bulk indexing. Re-enable after the load completes. Replication doubles the indexing work — skipping it during bulk loads significantly improves throughput.
Mapping Avoid dynamic mapping
Dynamic mapping detects types at index time, adding overhead. Define explicit mappings. Disable _source only if you never need to reindex or return full documents (rare).
Search optimization
- Use
filtercontext — filters are cached and skip scoring. Move all exact-match conditions out ofmustintofilter. - Limit
_sourcefields — return only fields you need with"_source": ["field1", "field2"]. - Avoid deep pagination —
from + sizeis O(n) and capped at 10,000 hits by default (index.max_result_window). For scrolling through large result sets, usesearch_afterwith a Point in Time (PIT) API. The scroll API is no longer recommended for this purpose. - Force merge read-only indices — merge to a single segment with
POST /index/_forcemerge?max_num_segments=1. Dramatically faster searches on historical data. - Use
keywordfor aggregations — never aggregate ontextfields. Use the.keywordsub-field or map fields askeywordfrom the start.
Circuit breakers
Elasticsearch has built-in circuit breakers that prevent operations from consuming too much memory and crashing the node. If you see CircuitBreakingException, you are hitting a limit.
| Breaker | Default | Purpose |
|---|---|---|
indices.breaker.total.limit | 95% of heap (with use_real_memory: true, the default; 70% otherwise) | Total memory limit across all breakers. |
indices.breaker.fielddata.limit | 40% of heap | Field data cache (text field aggregations). Avoid loading text fields into field data. |
indices.breaker.request.limit | 60% of heap | Per-request memory (large aggregations, sorting on high-cardinality fields). |
network.breaker.inflight_requests.limit | 100% of heap | In-flight HTTP request data. |
Production Checklist
- Dedicated master nodes — use 3 dedicated master-eligible nodes for cluster stability. Never run master and heavy data workloads on the same node.
- TLS everywhere — enable TLS on both transport and HTTP layers. Use certificates from a trusted CA (or Elasticsearch's built-in CA).
- Authentication and RBAC — create dedicated users/roles for every application, Beat, and Logstash instance. Never share the
elasticsuperuser. - Heap sizing — set
-Xms=-Xmx, no more than 50% of RAM, no more than 31 GB. Leave the rest for filesystem cache. - Disable swapping — set
bootstrap.memory_lock: trueand configure OS-levelmemlockunlimited. - Increase
vm.max_map_count— set to at least 262144. Required for Lucene's memory-mapped files. - Explicit mappings — define mappings for all indices. Use
"dynamic": "strict"to prevent mapping explosions. - ILM policies — configure index lifecycle management for all time-series data. Automate rollover, shrink, force merge, and deletion.
- Snapshot backups — configure automated snapshots to S3, GCS, or Azure Blob. Test restores regularly. Snapshots are the only reliable backup method.
- Monitoring — enable Stack Monitoring in Kibana or ship metrics to a dedicated monitoring cluster. Watch heap usage, GC time, indexing rate, search latency, and thread pool rejections.
- Shard sizing — aim for 20-50 GB per shard. Too many small shards waste resources; too few large shards limit parallelism. Monitor shard count per node (target under 1000).
- Log rotation — configure Elasticsearch log rotation in
log4j2.properties. Unbounded logs will fill disk and crash the node. - Network firewall — restrict port 9200 (HTTP) and 9300 (transport) to trusted IPs only. Never expose Elasticsearch directly to the internet.
- Rolling upgrade plan — document and test the rolling upgrade procedure. Always read the breaking changes in release notes before upgrading.
- Capacity planning — estimate daily ingestion volume, retention period, and replica count. Formula:
total_storage = daily_volume × retention_days × (1 + num_replicas) × 1.1(10% overhead).