on 03 Jan 202302:39 AM - edited on 03 Jan 202309:58 AM by Karolina_Linda
I can see that disk usage differs between the cluster nodes, and I want to know why it is not equally distributed. For example,
Node1 (id=1) current disk space usage: 70%
Node2 (id=2) current disk space usage: 60%
Node3 (id=3) current disk space usage: 50%
That is because the data is distributed based on the number of shards and not necessarily by the size of the data itself. For example, in the multi-node cluster, we can see that the number of Shards distributed is ~61 across all the nodes, but the Used Disk value is different.
Apart from that Elasticsearch does not actively re-balance the data if the watermark thresholds set inside Elasticsearch are not "violated". watermark.low = 85%, watermark.high = 90%, and watermark.flood_stage = 95%