I can see that disk usage differs between the cluster nodes, and I want to know why it is not equally distributed.
- Node1 (id=1) current disk space usage: 70%
- Node2 (id=2) current disk space usage: 60%
- Node3 (id=3) current disk space usage: 50%
That is because the data is distributed based on the number of shards and not necessarily by the size of the data itself.
For example, in the multi-node cluster, we can see that the number of Shards distributed is ~61 across all the nodes, but the Used Disk value is different.
Apart from that Elasticsearch does not actively re-balance the data if the watermark thresholds set inside Elasticsearch are not "violated".
watermark.low = 85%, watermark.high = 90%, and watermark.flood_stage = 95%
For more details, read: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#modules-cluster