Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local task job: add overtime mechanism, to not kill task prematurely when auxiliary processes are running #39890

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

mobuchowski
Copy link
Contributor

@mobuchowski mobuchowski commented May 28, 2024

Currently, LocalTaskJobRunner kills worker processes after they have their state set to SUCCESS and are still running. This can happen in few ways - most common are either state is set externally, or there are relatively slow listeners or other processes (mini scheduler) running.

This PR introduces mechanism in which we give the task some additional to finish their job - by default 20 seconds, configurable - after which the process is terminated.

@boring-cyborg boring-cyborg bot added area:providers area:Scheduler including HA (high availability) scheduler provider:openlineage AIP-53 labels May 28, 2024
@mobuchowski mobuchowski force-pushed the listener-task-timeout branch from 460fadf to 32e46dd Compare May 30, 2024 12:57
@mobuchowski mobuchowski marked this pull request as ready for review May 30, 2024 13:01
@mobuchowski mobuchowski requested a review from potiuk May 30, 2024 13:01
@mobuchowski mobuchowski force-pushed the listener-task-timeout branch 2 times, most recently from 430c4fa to e2acd51 Compare June 4, 2024 16:52
@mobuchowski mobuchowski changed the title local task job: add timeout, to not kill on_task_instance_success listener by heartbeat_callback local task job: add overtime mechanism, to not kill task prematurely when auxiliary processes are running Jun 4, 2024
@mobuchowski mobuchowski force-pushed the listener-task-timeout branch from e2acd51 to ec1b194 Compare June 5, 2024 21:25
@mobuchowski mobuchowski force-pushed the listener-task-timeout branch 2 times, most recently from f0128ce to 3d4661d Compare June 10, 2024 14:40
@potiuk
Copy link
Member

potiuk commented Jun 11, 2024

Nice and simple. I think we should have another committer to take a look as well.

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code-wise looks good.

@mobuchowski mobuchowski force-pushed the listener-task-timeout branch from 3d4661d to 7b0404b Compare June 12, 2024 14:53
@mobuchowski mobuchowski merged commit fa65a20 into main Jun 13, 2024
51 checks passed
@eladkal eladkal deleted the listener-task-timeout branch June 29, 2024 17:32
@ephraimbuddy ephraimbuddy added this to the Airflow 2.10.0 milestone Jul 1, 2024
@ephraimbuddy ephraimbuddy added the type:improvement Changelog: Improvements label Jul 1, 2024
@ephraimbuddy ephraimbuddy added type:bug-fix Changelog: Bug Fixes and removed type:improvement Changelog: Improvements labels Jul 1, 2024
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
kaxil added a commit to astronomer/airflow that referenced this pull request Dec 3, 2024
This PR ports the overtime feature on `LocalTaskJob` (added in apache#39890) to the Supervisor.
It allows to terminate Task process to terminate when it exceeding the configured success overtime threshold which is useful when we add Listenener to the Task process.

closes apache#44356

Also added `TaskState` to update state and send end_date from task process to the supervisor.
kaxil added a commit to astronomer/airflow that referenced this pull request Dec 3, 2024
This PR ports the overtime feature on `LocalTaskJob` (added in apache#39890) to the Supervisor.
It allows to terminate Task process to terminate when it exceeding the configured success overtime threshold which is useful when we add Listenener to the Task process.

closes apache#44356

Also added `TaskState` to update state and send end_date from task process to the supervisor.
kaxil added a commit to astronomer/airflow that referenced this pull request Dec 3, 2024
This PR ports the overtime feature on `LocalTaskJob` (added in apache#39890) to the Supervisor.
It allows to terminate Task process to terminate when it exceeding the configured success overtime threshold which is useful when we add Listenener to the Task process.

closes apache#44356

Also added `TaskState` to update state and send end_date from task process to the supervisor.
kaxil added a commit to astronomer/airflow that referenced this pull request Dec 3, 2024
This PR ports the overtime feature on `LocalTaskJob` (added in apache#39890) to the Supervisor.
It allows to terminate Task process to terminate when it exceeding the configured success overtime threshold which is useful when we add Listenener to the Task process.

closes apache#44356

Also added `TaskState` to update state and send end_date from task process to the supervisor.
kaxil added a commit to astronomer/airflow that referenced this pull request Dec 3, 2024
This PR ports the overtime feature on `LocalTaskJob` (added in apache#39890) to the Supervisor.
It allows to terminate Task process to terminate when it exceeding the configured success overtime threshold which is useful when we add Listenener to the Task process.

closes apache#44356

Also added `TaskState` to update state and send end_date from task process to the supervisor.
kaxil added a commit to astronomer/airflow that referenced this pull request Dec 3, 2024
This PR ports the overtime feature on `LocalTaskJob` (added in apache#39890) to the Supervisor.
It allows to terminate Task process to terminate when it exceeding the configured success overtime threshold which is useful when we add Listenener to the Task process.

closes apache#44356

Also added `TaskState` to update state and send end_date from task process to the supervisor.
kaxil added a commit to astronomer/airflow that referenced this pull request Dec 3, 2024
This PR ports the overtime feature on `LocalTaskJob` (added in apache#39890) to the Supervisor.
It allows to terminate Task process to terminate when it exceeding the configured success overtime threshold which is useful when we add Listenener to the Task process.

closes apache#44356

Also added `TaskState` to update state and send end_date from task process to the supervisor.
kaxil added a commit to astronomer/airflow that referenced this pull request Dec 3, 2024
This PR ports the overtime feature on `LocalTaskJob` (added in apache#39890) to the Supervisor.
It allows to terminate Task process to terminate when it exceeding the configured success overtime threshold which is useful when we add Listenener to the Task process.

closes apache#44356

Also added `TaskState` to update state and send end_date from task process to the supervisor.
kaxil added a commit to astronomer/airflow that referenced this pull request Dec 3, 2024
This PR ports the overtime feature on `LocalTaskJob` (added in apache#39890) to the Supervisor.
It allows to terminate Task process to terminate when it exceeding the configured success overtime threshold which is useful when we add Listenener to the Task process.

closes apache#44356

Also added `TaskState` to update state and send end_date from task process to the supervisor.
kaxil added a commit that referenced this pull request Dec 3, 2024
This PR ports the overtime feature on `LocalTaskJob` (added in #39890) to the Supervisor.
It allows to terminate Task process to terminate when it exceeding the configured success overtime threshold which is useful when we add Listenener to the Task process.

closes #44356

Also added `TaskState` to update state and send end_date from task process to the supervisor.
LefterisXefteris pushed a commit to LefterisXefteris/airflow that referenced this pull request Jan 5, 2025
This PR ports the overtime feature on `LocalTaskJob` (added in apache#39890) to the Supervisor.
It allows to terminate Task process to terminate when it exceeding the configured success overtime threshold which is useful when we add Listenener to the Task process.

closes apache#44356

Also added `TaskState` to update state and send end_date from task process to the supervisor.
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
This PR ports the overtime feature on `LocalTaskJob` (added in apache#39890) to the Supervisor.
It allows to terminate Task process to terminate when it exceeding the configured success overtime threshold which is useful when we add Listenener to the Task process.

closes apache#44356

Also added `TaskState` to update state and send end_date from task process to the supervisor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers area:Scheduler including HA (high availability) scheduler provider:openlineage AIP-53 type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants