Why there is difference in Elasticsearch store usage between Managed cluster nodes?

jonghpark · ‎03 Jan 2023

Question

I can see that disk usage differs between the cluster nodes, and I want to know why it is not equally distributed.
For example,

Node1 (id=1) current disk space usage: 70%
Node2 (id=2) current disk space usage: 60%
Node3 (id=3) current disk space usage: 50%

Answer

That is because the data is distributed based on the number of shards and not necessarily by the size of the data itself.
For example, in the multi-node cluster, we can see that the number of Shards distributed is ~61 across all the nodes, but the Used Disk value is different.

Apart from that Elasticsearch does not actively re-balance the data if the watermark thresholds set inside Elasticsearch are not "violated".
watermark.low = 85%, watermark.high = 90%, and watermark.flood_stage = 95%

For more details, read: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#modules-cluster

ChadTurner · ‎10 Jan 2023

great write up to a common question. Thanks @jonghpark