You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have sporadic evidence that the healthcheck timeout doesn't work properly.
Seems to be caused by a few subtle bugs.
Refactor ClusterMonitor to allow testing
Ensure that a new item isn't added to the cluster when re-starting with old tasks (addNewTaskToCluster)
Test that the correct number of items in is the cluster and is being monitored
Refactor cluster state out of ClusterMonitor. Has-A, not manages-a
Remove check for too many executors method. Should be in the scheduler
Normally, if you kill an executor, mesos detects it and sends a TASK_KILLED message straight away. Reconciliation works fine.
But sometimes, like when zookeeper has the wrong state (state says executor is running, but it is not running) it seems like the healthchecks get stuck somewhere. Suggest writing system test, based upon this, to see if we can replicate.
We have sporadic evidence that the healthcheck timeout doesn't work properly.
Seems to be caused by a few subtle bugs.
Normally, if you kill an executor, mesos detects it and sends a TASK_KILLED message straight away. Reconciliation works fine.
But sometimes, like when zookeeper has the wrong state (state says executor is running, but it is not running) it seems like the healthchecks get stuck somewhere. Suggest writing system test, based upon this, to see if we can replicate.
Possibly seen in:
#285
The text was updated successfully, but these errors were encountered: