Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't put MachineNode into 'scheduled for maintenance' mode (regression) #1089

Closed
skryzhny opened this issue Nov 3, 2016 · 3 comments
Closed

Comments

@skryzhny
Copy link
Contributor

skryzhny commented Nov 3, 2016

Original Jira issue(CODENVY-413) description:

Several times we have a problems with a MachineNode, and need to do maintenance on it.
MachineNode is having problem and scheduled to reboot. So I don't want to create new containers on it.
On other hand, already existed containers on that MacineNode is working OK.
For now now, only way to do some maintenance on MachineNode is exclude it from swarm's list of nodes. It will prevent IDE from running commands on MachineNode's containers and stop workspaces properly.
I want to have ability somehow mark a MachineNode as 'scheduled for maintenance' or so.
In this 'scheduled for maintenance' mode new docker containers need to be denied from starting on this MachineNode.
Possible way to achieve this is described in:
https://github.com/docker/swarm/issues/1508
https://github.com/docker/swarm/issues/2134

After we put Machine node to maintenance state by adding corresponding label using script attached to original issue, we have a such problem:
Builds are still scheduled to this MachineNode, while runs are denied from that MachneNode.
As a result after like half-hours that MachioneNode became least used and all attempts of start workspace chose to build container on this node. But runs of container on this node is disabled, so workspace can't start.

Need to make reliable test for this functionality and include it in release QA cycle.

Reproduction Steps:

Start Multinode env

Run at least 3 workspaces per MachineNode

Put MachineNode in 'sheduled for maintenance' mode

Expected behavior:

All builds and runs of workspaces scheduled to other node[s]

Observed behavior:

Builds are still scheduled to this node, while runs scheduled to other.

Codenvy version: 5.0.0-M6
OS and version: Centos 7.2

Additional information:

Problem only started happening recently, didn't happen in an older version of Codenvy: No
Problem can be reliably reproduced: Yes

@mmorhun
Copy link
Contributor

mmorhun commented Nov 23, 2016

When I investigate this issue I found, that we cannot reload docker config on a node more then twice. It means, that if we mark node for maintenance we must wait until all workspaces on this node will be stopped and restart docker. Otherwise, if we just cancel maintenance, we'll fail to mark node for maintenance again.
Also this problem may cause other problems if we'll want to change docker config on node dinamically.
I created issue for this: #1211

@mmorhun
Copy link
Contributor

mmorhun commented Nov 23, 2016

Actually, maintenance mode works, but when we add corresponding label to a node it cause some strange behaviour of swarm and other nodes, which makes using of it useless.
I created separated issue for this problem: #1215 , and after fix it, maintenance should work as expected.

@mmorhun
Copy link
Contributor

mmorhun commented Nov 24, 2016

Also in docker swarm mode native flag AVAILABILITY is added to control maintenance of nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants