True zero-downtime deployment by request draining #21

wowu · 2023-12-16T11:06:04Z

Currently there's no way of telling Traefik that old container is going to be stopped, so it might route requests to a container that is shutting down. I'm creating this issue to track the progress of figuring out what is the best way of implementing this.

The problem was mentioned in this StackOverflow question: https://stackoverflow.com/questions/75918681/how-to-avoid-downtime-when-using-docker-rollout-with-traefik

Current idea

The easiest way seems to fail healthchecks before the container is going to be stopped, so Traefik is not routing new requests to the unhealthy container(s). This can be achieved by adding ! test -f /drain to container healthcheck, that is "fail if there exists a file named drain in /", and docker-rollout can create this file before stopping the old container.

I'm not sure if this behavior should be hardcoded in the tool, as there might be better ways of implementing request draining for proxies other than Traefik / nginx. Implementing hook support would allow docker-rollout users to implement true zero downtime deployment in two steps:

Add && ! test -f /drain to current container healtcheck in compose file
Add a hook like --before-stop "docker exec $1 touch /drain && sleep 10" to create the file manually

The text was updated successfully, but these errors were encountered:

rogerdz · 2024-01-10T02:55:37Z

https://doc.traefik.io/traefik/middlewares/http/retry/
maybe this can help ?

immortaly007 · 2024-10-04T10:28:28Z

I like the current idea, but wouldn't the "sleep" timout need to be at least the interval of the health check times the "retry" amount for the health check? Otherwise the container might not enter the "unhealthy" state before the shutdown sequence starts.

Maybe another option (specifically for traefik): in the --before-stop hook, add the label traefik.enable=false. (note that I have not tested how quickly traefik would pick up this change).

wowu · 2024-10-07T07:23:25Z

Thats a useful insight, thanks! We should describe the requirements for sleep in docs if we go for the hooks solution, or wait a required time in docker-rollout itself.

Docker container labels are immutable (at least for now: moby/moby#21721), so we cannot use them to deregister the container from Traefik.

wowu · 2024-11-03T14:18:02Z

I created a sample implementation in #36, it would be amazing if someone could test it with their setup to confirm it works correctly 😄

wowu mentioned this issue Feb 18, 2024

zero downtime is a promise not held in reality: can you add a hook (so we can reload the proxy)? #23

Closed

pedroterzero mentioned this issue Jun 14, 2024

Old instance 50% of the time #32

Closed

wowu linked a pull request Nov 3, 2024 that will close this issue

feat: true zero-downtime deployment with request draining #36

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

True zero-downtime deployment by request draining #21

True zero-downtime deployment by request draining #21

wowu commented Dec 16, 2023

rogerdz commented Jan 10, 2024

immortaly007 commented Oct 4, 2024 •

edited

Loading

wowu commented Oct 7, 2024

wowu commented Nov 3, 2024

True zero-downtime deployment by request draining #21

True zero-downtime deployment by request draining #21

Comments

wowu commented Dec 16, 2023

Current idea

rogerdz commented Jan 10, 2024

immortaly007 commented Oct 4, 2024 • edited Loading

wowu commented Oct 7, 2024

wowu commented Nov 3, 2024

immortaly007 commented Oct 4, 2024 •

edited

Loading