reduce network hops and infrastructure cost.
3. Storage Layer: Elasticsearch cluster configured with dedicated roles (master, data_hot, data_warm, data_cold, coordinating).
4. Schema: Enforce ECS. All logs must be structured JSON at the source.
Implementation Steps
1. Structured Logging at Source (TypeScript)
Parsing unstructured logs in Logstash is computationally expensive and fragile. The optimal pattern is structured logging in the application code.
// src/logging/logger.ts
import pino from 'pino';
import { ECSFields, LogLevel } from '@elastic/ecs-pino-format';
// Configure Pino to output ECS-compliant JSON
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
base: {
service: { name: 'payment-service' },
environment: process.env.NODE_ENV || 'production',
},
formatters: {
level: (label: string) => ({ log: { level: label } }),
},
timestamp: pino.stdTimeFunctions.isoTime,
});
export { logger };
// Usage in application
import { logger } from './logger';
export async function processPayment(transactionId: string, amount: number) {
logger.info({
event: {
dataset: 'payment.processed',
action: 'create',
},
transaction: { id: transactionId, amount },
user: { id: 'user_123' },
}, 'Payment processed successfully');
}
2. Filebeat Configuration
Filebeat reads the structured JSON and ships it directly. This eliminates the need for heavy Grok parsing in Logstash if the source is well-structured.
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/app/*.json
json.keys_under_root: true
json.add_error_key: true
json.message_key: message
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
output.logstash:
hosts: ["logstash-primary:5044"]
pipeline.id: "payment-service-pipeline"
loadbalance: true
3. Logstash Pipeline (For Enrichment)
Use Logstash only for enrichment, not basic parsing.
# logstash/conf.d/payment.conf
input {
beats {
port => 5044
ssl => true
ssl_certificate_authorities => ["/etc/logstash/certs/ca.crt"]
ssl_certificate => "/etc/logstash/certs/logstash.crt"
ssl_key => "/etc/logstash/certs/logstash.key"
}
}
filter {
if [pipeline] == "payment-service-pipeline" {
# Add geo-location for IP addresses
geoip {
source => "[source][ip]"
target => "[source][geo]"
}
# User agent parsing
useragent {
source => "[http][request][user_agent]"
target => "[user_agent]"
}
# Drop debug logs in production
if [log][level] == "debug" and [environment] == "prod" {
drop { }
}
}
}
output {
elasticsearch {
hosts => ["https://es-hot-01:9200", "https://es-hot-02:9200"]
api_key => "${ES_API_KEY}"
index => "logs-payment-%{+YYYY.MM.dd}"
pipeline => "ilm-payment-policy"
}
}
4. Elasticsearch Index Template and ILM Policy
This is the core of the optimization. The ILM policy defines lifecycle phases, and the template enforces mapping.
// ilm-policy.json
{
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_primary_shard_size": "50gb",
"max_age": "1d"
},
"set_priority": { "priority": 100 }
}
},
"warm": {
"min_age": "7d",
"actions": {
"set_priority": { "priority": 50 },
"allocate": {
"include": { "data": "warm" },
"number_of_replicas": 1
},
"forcemerge": {
"max_num_segments": 1
}
}
},
"cold": {
"min_age": "30d",
"actions": {
"set_priority": { "priority": 0 },
"allocate": {
"include": { "data": "cold" }
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
// index-template.json
{
"index_patterns": ["logs-payment-*"],
"template": {
"settings": {
"index": {
"lifecycle": {
"name": "payment-ilm-policy",
"rollover_alias": "logs-payment"
},
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "30s",
"codec": "best_compression"
}
},
"mappings": {
"dynamic_templates": [
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": { "type": "keyword" }
}
}
],
"properties": {
"@timestamp": { "type": "date" },
"message": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } },
"trace.id": { "type": "keyword" },
"transaction.id": { "type": "keyword" }
}
}
}
}
Pitfall Guide
1. Dynamic Mapping Explosions
- Mistake: Allowing Elasticsearch to auto-create fields for every unique key in incoming logs.
- Impact: High cardinality fields (e.g., user IDs, request IDs) create millions of field entries. This bloats the cluster state, causing
OutOfMemoryError and preventing the master node from functioning.
- Fix: Use strict mapping in index templates. Set
dynamic: strict for sensitive indices or use dynamic_templates to force unknown strings to keyword and limit field counts.
2. Logstash as a Bottleneck
- Mistake: Chaining multiple Logstash instances with heavy Grok filters and Ruby scripts.
- Impact: Logstash is single-threaded per pipeline worker. Complex filters block threads, causing backpressure. Filebeat queues fill up, and logs are dropped.
- Fix: Offload parsing to the application layer using structured logging. Use Ingest Nodes for simple transformations. If Logstash is required, tune
pipeline.workers and pipeline.batch.size based on CPU cores and memory.
3. Ignoring Shard Sizing
- Mistake: Creating indices with too many small shards or too few massive shards.
- Impact: Small shards increase overhead (each shard consumes heap memory and file handles). Massive shards (>50GB) cause slow recovery, unbalanced load, and slow queries.
- Fix: Target shard sizes between 10GB and 50GB. Use ILM rollover based on
max_primary_shard_size to maintain optimal shard dimensions.
4. Grok Regex Backtracking
- Mistake: Using inefficient regular expressions in Grok filters.
- Impact: Regex backtracking can consume 100% CPU on a Logstash node, halting ingestion.
- Fix: Test regex patterns with tools like
grokdebug. Prefer specific patterns over greedy matches. Use match instead of grok where possible, or switch to discrete options.
5. Storing Raw Logs Without Parsing
- Mistake: Ingesting raw text logs and relying on Kibana's discover interface for ad-hoc parsing.
- Impact: Queries on
text fields are slow and resource-intensive. You cannot aggregate or filter efficiently.
- Fix: Parse logs at ingestion. Extract fields into structured JSON. Store the raw message in a
message field for fallback, but query against extracted fields.
6. Network Bandwidth Saturation
- Mistake: Shipping uncompressed logs from hundreds of nodes to a central cluster.
- Impact: Network congestion affects application traffic. Ingestion latency spikes.
- Fix: Enable compression in Filebeat/Logstash output. Use local aggregation where possible. Monitor network throughput and tune
bulk_max_size.
7. Security Misconfiguration
- Mistake: Running ELK without TLS, authentication, or RBAC in production.
- Impact: Data exfiltration, unauthorized access to sensitive logs, and cluster manipulation.
- Fix: Enable X-Pack security. Enforce TLS for all internal and external traffic. Use API keys or service accounts for ingestion. Implement RBAC to restrict Kibana access.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High Volume (>10TB/day) | Dedicated Logstash Cluster + Data Tiers | Isolates processing load; hot nodes focus on indexing/querying; warm/cold tiers reduce hardware costs. | High initial infra cost; low operational cost per GB. |
| Medium Volume (1-10TB/day) | Ingest Nodes + ILM | Eliminates Logstash overhead; Ingest Nodes scale with data nodes; simpler architecture. | Moderate cost; efficient resource utilization. |
| Low Volume / Startup | Single Node + Filebeat Direct | Rapid deployment; minimal ops overhead; sufficient for debugging. | Low cost; limited scalability. |
| Compliance / Audit | WORM Index + Cold Storage | Immutable logs; long-term retention on cheap storage; strict access controls. | Higher storage cost for compliance; mitigates risk. |
| Real-time Alerting | ES with TSDB or Watcher | Time-series data store optimizes metric queries; Watcher enables threshold alerts. | Moderate cost; high value for incident response. |
Configuration Template
Docker Compose for Local Development:
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- ES_JAVA_OPTS=-Xms1g -Xmx1g
ports:
- "9200:9200"
volumes:
- es_data:/usr/share/elasticsearch/data
kibana:
image: docker.elastic.co/kibana/kibana:8.12.0
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
ports:
- "5601:5601"
depends_on:
- elasticsearch
filebeat:
image: docker.elastic.co/beats/filebeat:8.12.0
volumes:
- ./filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
- /var/log:/var/log:ro
depends_on:
- elasticsearch
volumes:
es_data:
driver: local
Quick Start Guide
- Initialize Cluster: Run
docker compose up -d to start Elasticsearch and Kibana. Wait for health check GET /_cluster/health to return green.
- Configure Filebeat: Create
filebeat.yml with output pointing to http://localhost:9200. Enable the system module or custom log input.
- Load Assets: Run
filebeat setup -e to load index templates, dashboards, and ILM policies into Elasticsearch.
- Start Ingestion: Run
filebeat -e to begin shipping logs. Verify data arrival in Kibana via Stack Management > Index Management.
- Visualize: Navigate to Discover in Kibana, create an index pattern for
filebeat-*, and start querying logs. Use the pre-loaded dashboards for system metrics.
Note: This quick start is for development. Production deployments require TLS, authentication, multi-node clusters, and persistent storage backed by reliable block storage.