Fix graceful shutdown for tasks #8718

jihoonson · 2019-10-22T19:20:14Z

Affected Version

0.14, 0.15, 0.16

Description

Since #6828, task shutdown is always the graceful shutdown. This was for some task types to clean up their resources. For example, Hadoop task kills its Hadoop job when it's stopped (#6828). The parallel indexing task kills its all running sub tasks (#7041).

However, this is different from how the stream ingestion task had been using graceful shutdown. On graceful shutdown, they immediately start persisting all segments in memory onto disk and publishing segments before they stop. This is to create a checkpoint when they stop, not for resource cleanup. As a result, #6828 unexpectedly changed the behavior of Kafka/Kinesis indexing service when the supervisor kills tasks.

I think we need to distinguish immediate stop and graceful stop again. Immediate stop always involves resource cleanup. Graceful stop may involve extra work in addition to necessary resource cleanup.

ankit0811 · 2019-10-29T19:02:51Z

@jihoonson let me know if you need any help here
Happy to contribute :)

jihoonson · 2019-10-29T20:19:57Z

@ankit0811 cool, I'm not planning to fix this for now. Are you interested in taking it?

ankit0811 · 2019-10-29T21:47:54Z

Sure I can pick this up
To be sure I understand this correctly, we need clear segregation between the graceful shutdown
depending on the type of cleanup
For realtime cleanup means persisting the segments
For batch cleanup mean destroy the job to save resources?

So essentially a split of responsibilities depending on the type of ingestion?

jihoonson · 2019-10-30T21:08:00Z

I think tasks can be categorized into two groups, i.e., restorable tasks and non-restorable tasks. Restorable tasks should store their last status on disk when graceful shutdown is called, but can stop without it if force shutdown is called. Non-restorable tasks just clean up their resources on both graceful shutdown and force shutdown.
Realtime tasks are restorable while batch tasks are not currently. You can tell a given task is restorable or not by calling Task.canRestore().

jihoonson added Bug Area - Batch Ingestion Area - Streaming Ingestion labels Oct 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix graceful shutdown for tasks #8718

Fix graceful shutdown for tasks #8718

jihoonson commented Oct 22, 2019

ankit0811 commented Oct 29, 2019

jihoonson commented Oct 29, 2019

ankit0811 commented Oct 29, 2019

jihoonson commented Oct 30, 2019

Fix graceful shutdown for tasks #8718

Fix graceful shutdown for tasks #8718

Comments

jihoonson commented Oct 22, 2019

Affected Version

Description

ankit0811 commented Oct 29, 2019

jihoonson commented Oct 29, 2019

ankit0811 commented Oct 29, 2019

jihoonson commented Oct 30, 2019