Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix graceful shutdown for tasks #8718

Open
jihoonson opened this issue Oct 22, 2019 · 4 comments
Open

Fix graceful shutdown for tasks #8718

jihoonson opened this issue Oct 22, 2019 · 4 comments

Comments

@jihoonson
Copy link
Contributor

Affected Version

0.14, 0.15, 0.16

Description

Since #6828, task shutdown is always the graceful shutdown. This was for some task types to clean up their resources. For example, Hadoop task kills its Hadoop job when it's stopped (#6828). The parallel indexing task kills its all running sub tasks (#7041).

However, this is different from how the stream ingestion task had been using graceful shutdown. On graceful shutdown, they immediately start persisting all segments in memory onto disk and publishing segments before they stop. This is to create a checkpoint when they stop, not for resource cleanup. As a result, #6828 unexpectedly changed the behavior of Kafka/Kinesis indexing service when the supervisor kills tasks.

I think we need to distinguish immediate stop and graceful stop again. Immediate stop always involves resource cleanup. Graceful stop may involve extra work in addition to necessary resource cleanup.

@ankit0811
Copy link
Contributor

@jihoonson let me know if you need any help here
Happy to contribute :)

@jihoonson
Copy link
Contributor Author

@ankit0811 cool, I'm not planning to fix this for now. Are you interested in taking it?

@ankit0811
Copy link
Contributor

Sure I can pick this up
To be sure I understand this correctly, we need clear segregation between the graceful shutdown
depending on the type of cleanup
For realtime cleanup means persisting the segments
For batch cleanup mean destroy the job to save resources?

So essentially a split of responsibilities depending on the type of ingestion?

@jihoonson
Copy link
Contributor Author

I think tasks can be categorized into two groups, i.e., restorable tasks and non-restorable tasks. Restorable tasks should store their last status on disk when graceful shutdown is called, but can stop without it if force shutdown is called. Non-restorable tasks just clean up their resources on both graceful shutdown and force shutdown.
Realtime tasks are restorable while batch tasks are not currently. You can tell a given task is restorable or not by calling Task.canRestore().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants