Essential Elasticsearch Commands for Cluster Health & Troubleshooting EFK Stack Issues

4 min readNov 8, 2024

In the world of DevOps, monitoring your logging stack is crucial. When working with the EFK (Elasticsearch, Fluentd, Kibana) stack, having a reliable set of Elasticsearch commands and troubleshooting techniques is key to maintaining smooth operations. In this article, I’ll walk you through some essential Elasticsearch commands for cluster health and share common EFK stack issues with debugging tips to tackle them effectively.

Part 1: Essential Elasticsearch Commands for Checking Cluster Health

1. Checking Cluster Health

The /_cluster/health endpoint provides an overview of the cluster's health, status, and availability.

GET _cluster/health

Explanation: This command shows the cluster status (green, yellow, or red), the number of active nodes, and unassigned shards. A “yellow” status indicates some replicas are unassigned but the cluster is operational, while “red” suggests data loss or cluster issues.

2. Viewing Cluster Stats

Use the following command to get a detailed view of your cluster, including node information, disk usage, and indices.

GET _cluster/stats

Explanation: This command provides comprehensive cluster stats, such as data size and memory allocation per node, which is useful for identifying bottlenecks and planning scaling operations.

3. Inspecting Node Stats

Node-level stats can reveal if any node is overutilized, helping to diagnose memory, CPU, or storage issues.

GET _nodes/stats

Explanation: This provides stats for each node in the cluster, including CPU, memory usage, and filesystem data. It’s essential for identifying performance issues on specific nodes.

4. Quick Overview of Nodes

If you need a summary view of all nodes, use the following command:

GET _cat/nodes?v

Explanation: This command outputs details about each node in the cluster, such as CPU usage, memory allocation, and IP addresses, giving a quick insight into node health.

5. Checking Index Health

Managing indices is a critical part of Elasticsearch maintenance. Use the _cat/indices API for an overview of all indices.

GET _cat/indices?v

Explanation: This provides information on each index, including status, size, and document count. Look out for red or yellow statuses, which indicate underlying issues with shard allocation or indexing.

6. Reviewing Shard Distribution

Check shard distribution with the following command to ensure they are balanced across nodes:

GET _cat/shards?v

Explanation: This displays all shards with their assigned nodes, status, and primary or replica roles. Uneven distribution may suggest issues with shard allocation and require balancing.

7. Monitoring Running Tasks

Running tasks can often slow down your cluster, so monitoring them can help identify performance bottlenecks.

GET _tasks

Explanation: This command lists ongoing tasks within the cluster, including search and indexing operations. You can cancel long-running tasks to alleviate load by specifying the task ID.

Part 2: Common Issues in the EFK Stack and How to Debug Them

1. Elasticsearch Performance Issues

High CPU or Memory Usage

Elasticsearch can consume significant resources, especially under heavy indexing. High memory usage may indicate a need to tune JVM settings or adjust heap sizes.

Debugging Tip: Use GET _nodes/stats to identify the problematic nodes. Consider increasing the heap size or adjusting garbage collection settings.

Shard Allocation Failures

If a cluster shows unassigned shards, it might be due to insufficient resources or misconfigured shard allocation settings.

Debugging Tip: Use the following command to explain why shards aren’t allocated:

GET _cluster/allocation/explain

The output will highlight the reason behind unassigned shards, such as insufficient disk space or conflicting shard settings.

2. Fluentd and Fluent Bit Issues

Data Not Forwarding to Elasticsearch

If logs aren’t reaching Elasticsearch, Fluentd or Fluent Bit might be misconfigured or experiencing connection issues.

Debugging Tip: Check Fluentd/Fluent Bit logs to verify the Elasticsearch endpoint and authentication details. A common mistake is using the wrong Elasticsearch URL or forgetting to set the ssl_verify option correctly for secured clusters.

High Latency or Data Loss

Data might be lost or delayed if Fluentd encounters buffering issues or the Elasticsearch instance is overwhelmed.

Debugging Tip: In Fluentd, set proper flush_interval and retry_limit values to manage buffering. Monitoring Fluentd’s buffer status is also helpful.

3. Kibana Issues

Unable to Connect to Elasticsearch

If Kibana cannot connect to Elasticsearch, there may be an issue with the Kibana configuration file (kibana.yml).

Debugging Tip: Verify that the elasticsearch.hosts URL in kibana.yml points to the correct Elasticsearch URL. Check for any SSL certificate issues if your cluster is configured with TLS.

Index Patterns Not Found

If Kibana cannot find index patterns, it may be due to missing data or incorrect time filters.

Debugging Tip: Refresh index patterns in Kibana by going to the “Index Patterns” section under “Stack Management.” Ensure that you have selected the correct time range to view data.

By using these commands and troubleshooting tips, you can effectively manage your Elasticsearch cluster and tackle common issues in the EFK stack. Monitoring cluster health and quickly identifying bottlenecks can go a long way in keeping your logging pipeline smooth and reliable.

If you have any questions or need more in-depth debugging tips, feel free to reach out to me. I’d be happy to connect and share insights!

Connect with me on LinkedIn