Can't put MachineNode into 'scheduled for maintenance' mode (regression) #1089

skryzhny · 2016-11-03T14:46:24Z

Original Jira issue(CODENVY-413) description:

Several times we have a problems with a MachineNode, and need to do maintenance on it.
MachineNode is having problem and scheduled to reboot. So I don't want to create new containers on it.
On other hand, already existed containers on that MacineNode is working OK.
For now now, only way to do some maintenance on MachineNode is exclude it from swarm's list of nodes. It will prevent IDE from running commands on MachineNode's containers and stop workspaces properly.
I want to have ability somehow mark a MachineNode as 'scheduled for maintenance' or so.
In this 'scheduled for maintenance' mode new docker containers need to be denied from starting on this MachineNode.
Possible way to achieve this is described in:
https://github.com/docker/swarm/issues/1508
https://github.com/docker/swarm/issues/2134

After we put Machine node to maintenance state by adding corresponding label using script attached to original issue, we have a such problem:
Builds are still scheduled to this MachineNode, while runs are denied from that MachneNode.
As a result after like half-hours that MachioneNode became least used and all attempts of start workspace chose to build container on this node. But runs of container on this node is disabled, so workspace can't start.

Need to make reliable test for this functionality and include it in release QA cycle.

Reproduction Steps:

Start Multinode env

Run at least 3 workspaces per MachineNode

Put MachineNode in 'sheduled for maintenance' mode

Expected behavior:

All builds and runs of workspaces scheduled to other node[s]

Observed behavior:

Builds are still scheduled to this node, while runs scheduled to other.

Codenvy version: 5.0.0-M6
OS and version: Centos 7.2

Additional information:

Problem only started happening recently, didn't happen in an older version of Codenvy: No
Problem can be reliably reproduced: Yes

The text was updated successfully, but these errors were encountered:

mmorhun · 2016-11-23T09:38:36Z

When I investigate this issue I found, that we cannot reload docker config on a node more then twice. It means, that if we mark node for maintenance we must wait until all workspaces on this node will be stopped and restart docker. Otherwise, if we just cancel maintenance, we'll fail to mark node for maintenance again.
Also this problem may cause other problems if we'll want to change docker config on node dinamically.
I created issue for this: #1211

mmorhun · 2016-11-23T14:34:34Z

Actually, maintenance mode works, but when we add corresponding label to a node it cause some strange behaviour of swarm and other nodes, which makes using of it useless.
I created separated issue for this problem: #1215 , and after fix it, maintenance should work as expected.

mmorhun · 2016-11-24T12:53:37Z

Also in docker swarm mode native flag AVAILABILITY is added to control maintenance of nodes.

skryzhny added the severity/blocker label Nov 3, 2016

vkuznyetsov added kind/bug sprint/next-sprint status/open-for-dev team/enterprise labels Nov 3, 2016

vkuznyetsov assigned mmorhun Nov 3, 2016

vkuznyetsov added sprint/current-sprint and removed sprint/next-sprint labels Nov 10, 2016

mmorhun added status/in-progress and removed status/open-for-dev labels Nov 15, 2016

mmorhun closed this as completed Nov 23, 2016

mmorhun removed the status/in-progress label Nov 23, 2016

vkuznyetsov removed the sprint/current-sprint label Nov 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't put MachineNode into 'scheduled for maintenance' mode (regression) #1089

Can't put MachineNode into 'scheduled for maintenance' mode (regression) #1089

skryzhny commented Nov 3, 2016

mmorhun commented Nov 23, 2016

mmorhun commented Nov 23, 2016 •

edited

Loading

mmorhun commented Nov 24, 2016

Can't put MachineNode into 'scheduled for maintenance' mode (regression) #1089

Can't put MachineNode into 'scheduled for maintenance' mode (regression) #1089

Comments

skryzhny commented Nov 3, 2016

mmorhun commented Nov 23, 2016

mmorhun commented Nov 23, 2016 • edited Loading

mmorhun commented Nov 24, 2016

mmorhun commented Nov 23, 2016 •

edited

Loading