Shards did not rebalance immediately after hardware failure on data node #116203

Harif-Rahman · 2024-11-04T19:01:12Z

Elasticsearch Version

5.6.16

Installed Plugins

No response

Java Version

bundled

OS Version

Linux cluster-1-master-2 4.14.326-245.539.amzn2.x86_64 #1 SMP Tue Sep 26 09:59:02 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

We encountered an issue where Elasticsearch did not automatically rebalance shards after a hardware failure on one of our data nodes. This caused an extended period of degraded performance until manual intervention was performed.

EC2 instance went unhealthy on 2024-10-28 20:40 UTC
EC2 instance became healthy back on 2024-10-28 21:10 UTC

Elastic search rebalanced the shards at 2024-10-28 21:10 UTC only.

Wanted to know the reason for high time to rebalance the shard. Have attached the master node logs

es_oct_28.log

ES cluster settings

Cluster level settings.

cluster.name: fc-use1-00-conversation-cluster-1

Discovery settings.

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts: [XXX]

discovery.zen.ping.timeout: 15s

discovery.zen.fd.ping_interval: 10s
discovery.zen.fd.ping_timeout: 10s
discovery.zen.fd.ping_retries: 6

action.auto_create_index: true

indices.memory.index_buffer_size: 30%

indices.store.throttle.max_bytes_per_sec: 100mb

Node level settings.

node.data: false
node.master: true
node.name: cluster-1-master-1

cluster.routing.allocation.awareness.force.zone.values: zone-a,zone-b,zone-c,zone-d,zone-e,zone-f
cluster.routing.allocation.awareness.attributes: zone

http.enabled: true

Loopback interface

Steps to Reproduce

Steps to Reproduce:

Simulate a hardware failure on one of the data nodes by stopping or disconnecting the instance.
Observe the state of shard allocation and rebalancing.
Notice that Elasticsearch does not immediately initiate shard rebalancing across available nodes.

Logs (if relevant)

No response

DaveCTurner · 2024-11-05T09:53:49Z

Thank you very much for your interest in Elasticsearch. Unfortunately the issue you have reported relates to Elasticsearch version 5.6.16 which is very old and has passed end-of-life. We will not investigate issues related to unsupported versions here on Github, so I am closing this to indicate that no action is needed from the Elasticsearch development team. It's possible that you will find a volunteer to help you with this issue on the community forums, but our strong recommendation would be to upgrade to a supported version of Elasticsearch as a matter of some urgency. If you can reproduce your issue on a supported version then please open a fresh bug report.

Quoting the bug report form:

Please also check your OS is supported, and that the version of Elasticsearch has not passed end-of-life. If you are using an unsupported OS or an unsupported version then the issue is likely to be closed.

Harif-Rahman added >bug needs:triage Requires assignment of a team area label labels Nov 4, 2024

DaveCTurner closed this as not planned Won't fix, can't repro, duplicate, stale Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shards did not rebalance immediately after hardware failure on data node #116203

Shards did not rebalance immediately after hardware failure on data node #116203

Harif-Rahman commented Nov 4, 2024

DaveCTurner commented Nov 5, 2024

Shards did not rebalance immediately after hardware failure on data node #116203

Shards did not rebalance immediately after hardware failure on data node #116203

Comments

Harif-Rahman commented Nov 4, 2024

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Cluster level settings.

Discovery settings.

discovery.zen.ping.timeout: 15s

Node level settings.

Loopback interface

Steps to Reproduce

Logs (if relevant)

DaveCTurner commented Nov 5, 2024