on 03 Jan 2023 02:39 AM - edited on 03 Jan 2023 09:58 AM by Karolina_Linda
I can see that disk usage differs between the cluster nodes, and I want to know why it is not equally distributed.
For example,
That is because the data is distributed based on the number of shards and not necessarily by the size of the data itself.
For example, in the multi-node cluster, we can see that the number of Shards distributed is ~61 across all the nodes, but the Used Disk value is different.
Apart from that Elasticsearch does not actively re-balance the data if the watermark thresholds set inside Elasticsearch are not "violated".
watermark.low = 85%, watermark.high = 90%, and watermark.flood_stage = 95%
For more details, read: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#modules-cluster