Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve or eliminate the TCP Kubernetes worker scheme #278

Open
ghjm opened this issue Dec 16, 2020 · 0 comments
Open

Improve or eliminate the TCP Kubernetes worker scheme #278

ghjm opened this issue Dec 16, 2020 · 0 comments

Comments

@ghjm
Copy link
Contributor

ghjm commented Dec 16, 2020

Currently, Receptor has two ways of running Kubernetes jobs. It can launch a pod that runs the same kind of worker as a work-command (ie, the Pod command just expects to receive stdin and then produce stdout), and use the k8s logger function to retrieve the resulting stdout. Or, it can launch a worker that is expected to make a one-time TCP connection back to the Receptor process, and then process stdin/stdout via that connection.

The advantage of the TCP scheme is that Receptor work streams are not tied up with Kubernetes internals. For example, a worker that produces many gigabytes of output via the Kubernetes logger is probably imposing on Kubernetes the duty to store all that data somewhere.

The disadvantage of the TCP scheme is that in its current form, it cannot survive a Receptor restart. With other types of work such as work-command or the logger version of work-kubernetes, Receptor will resume monitoring already-running jobs after a Receptor restart. This is not possible with the current TCP scheme because at Receptor shutdown, the worker loses its TCP connection.

So our choices are either:

  • Drop the TCP worker scheme and only use the k8s logger, or
  • Come up with a new and more complex TCP worker scheme that solves this problem.

To do the latter, we would need some protocol via the TCP stream that allows reconnecting and resuming a stream from a given byte position, and we would need workers to support this. The most user friendly way of handling this would probably be to have a worker pod include two containers, one of which is a normal stdin/stdout worker and the other of which is a Receptor-provided image that handles the TCP communication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant