Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Hot Reload #25

Closed
elhackeado opened this issue Apr 10, 2024 · 9 comments
Closed

Feature Request: Hot Reload #25

elhackeado opened this issue Apr 10, 2024 · 9 comments
Labels
F-Configuration Functionality relating to configuration F-Reload Functionality relating to graceful reloading

Comments

@elhackeado
Copy link

Feature Description:

Hot reloading functionality will enable river to dynamically reload configuration file without requiring a restart of the application or service. This capability enhances system flexibility, uptime, and ease of maintenance by allowing administrators to make configuration changes on-the-fly while the application is still running.

How Nginx implemented it ?

In order for nginx to re-read the configuration file, a HUP signal should be sent to the master process. The master process first checks the syntax validity, then tries to apply new configuration, that is, to open log files and new listen sockets. If this fails, it rolls back changes and continues to work with old configuration. If this succeeds, it starts new worker processes, and sends messages to old worker processes requesting them to shut down gracefully. Old worker processes close listen sockets and continue to service old clients. After all clients are serviced, old worker processes are shut down.

Let’s illustrate this by example. Imagine that nginx is run on FreeBSD and the command

ps axw -o pid,ppid,user,%cpu,vsz,wchan,command | egrep '(nginx|PID)'
produces the following output:

  PID  PPID USER    %CPU   VSZ WCHAN  COMMAND
33126     1 root     0.0  1148 pause  nginx: master process /usr/local/nginx/sbin/nginx
33127 33126 nobody   0.0  1380 kqread nginx: worker process (nginx)
33128 33126 nobody   0.0  1364 kqread nginx: worker process (nginx)
33129 33126 nobody   0.0  1364 kqread nginx: worker process (nginx)

If HUP is sent to the master process, the output becomes:

  PID  PPID USER    %CPU   VSZ WCHAN  COMMAND
33126     1 root     0.0  1164 pause  nginx: master process /usr/local/nginx/sbin/nginx
33129 33126 nobody   0.0  1380 kqread nginx: worker process is shutting down (nginx)
33134 33126 nobody   0.0  1368 kqread nginx: worker process (nginx)
33135 33126 nobody   0.0  1368 kqread nginx: worker process (nginx)
33136 33126 nobody   0.0  1368 kqread nginx: worker process (nginx)

One of the old worker processes with PID 33129 still continues to work. After some time it exits:

  PID  PPID USER    %CPU   VSZ WCHAN  COMMAND
33126     1 root     0.0  1164 pause  nginx: master process /usr/local/nginx/sbin/nginx
33134 33126 nobody   0.0  1368 kqread nginx: worker process (nginx)
33135 33126 nobody   0.0  1368 kqread nginx: worker process (nginx)
33136 33126 nobody   0.0  1368 kqread nginx: worker process (nginx)

[SOURCE] https://nginx.org/en/docs/control.html

Any limitations with Nginx's approach ?

Too frequent hot reloading would make connections unstable and lose business data.

When NGINX executes the reload command, the old worker process will keep processing the existing connections and automatically disconnect once it processes all remaining requests. However, if the client hasn’t processed all requests, they will lose business data of the remaining requests forever. Of course, this would raise client-side users’ attention.

In some circumstances, the recycling time of the old worker process takes so long that it affects regular business.

For example, when we proxy WebSocket protocol, we can’t know whether a request has been processed because NGINX doesn’t parse the header frame. So even though the worker process receives the quit command from the master process, it can’t exit until these connections raise exceptions, time out, or disconnect.

Here is another example, when NGINX performs as the reverse proxy for TCP and UDP, it can’t know how often a request is being requested before it finally gets shut down.

Therefore, the old worker process usually takes a long time, especially in industries like live streaming, media, and speech recognition. Sometimes, the recycling time of the old worker process could reach half an hour or even longer. Meanwhile, if users frequently reload the server, it will create many shutting down processes and finally lead to NGINX OOM, which could seriously affect the business.

APISIX solved this problem in their own way, do checkout this article before taking any design decision. https://api7.ai/blog/how-nginx-reload-work

@moderation
Copy link

Envoy proxy has implements hot restart and it is used at scale. Envoy hot restart from Envoy creator @mattklein123 and recent documentation

@taikulawo
Copy link

Envoy proxy has implements hot restart and it is used at scale. Envoy hot restart from Envoy creator @mattklein123 and recent documentation

Author said is about reload configuration, not restart binary itself. there has some different.

@jamesmunns
Copy link
Collaborator

jamesmunns commented Apr 12, 2024

As a note, pingora already starts hot-reload: https://github.com/cloudflare/pingora/blob/main/docs/user_guide/start_stop.md edit: also https://github.com/cloudflare/pingora/blob/main/docs/user_guide/graceful.md

It is likely river will take a similar path, doing a hot-reload (e.g. starting and stopping the binary, but maintaining connections).

It's possible this could be implemented in a way that doesn't require starting a new process, but as this is implemented within pingora itself, it's likely River will mimic their implementation 1:1.

@jamesmunns jamesmunns added F-Configuration Functionality relating to configuration F-Reload Functionality relating to graceful reloading labels Apr 12, 2024
@jamesmunns jamesmunns modified the milestones: Backlog, Kickstart Spike 1 Apr 12, 2024
@jamesmunns
Copy link
Collaborator

Putting this in the "Backlog" milestone, as I'm not sure if this will make it into River before the end of April, but it might.

@Et7f3
Copy link

Et7f3 commented Apr 13, 2024

I also add other techniques that can be applied to process like docker container https://iximiuz.com/en/posts/multiple-containers-same-port-reverse-proxy/

@elhackeado
Copy link
Author

As a note, pingora already starts hot-reload: https://github.com/cloudflare/pingora/blob/main/docs/user_guide/start_stop.md edit: also https://github.com/cloudflare/pingora/blob/main/docs/user_guide/graceful.md
@jamesmunns I believe Pingora's Graceful Upgrade is the way to go. Since Pingora is already battle tested in production, At this point I would rather rely on Pingora's way of doing it rather than bringing something new which may mature over time.

@studersi
Copy link

There is also a different Rust-based reverse proxy project with strong focus on changing configurations without any downtime or lost connections: https://github.com/sozu-proxy/sozu.

Quote from their website (https://github.com/sozu-proxy/sozu):

SŌZU is a HTTP reverse proxy built in Rust, that can handle fine grained configuration changes at runtime without reloads, and designed to never ever stop.

I am not sure how they achieve it exactly but their implementation might be worth looking into when designing this feature.

@jamesmunns jamesmunns modified the milestones: Backlog, Kickstart Spike 2 May 24, 2024
@jamesmunns
Copy link
Collaborator

Noting that this has been scheduled for the just-starting-now milestone, should have some progress on this in the next weeks.

@jamesmunns
Copy link
Collaborator

This was implemented by #49, please feel free to open an issue if there are any follow-on needs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-Configuration Functionality relating to configuration F-Reload Functionality relating to graceful reloading
Projects
None yet
Development

No branches or pull requests

6 participants