Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-72: Task SDK support for on_task_instance_* listeners, make OpenLineage compatible #45294

Merged
merged 1 commit into from
Feb 11, 2025

Conversation

mobuchowski
Copy link
Contributor

@mobuchowski mobuchowski commented Dec 30, 2024

With AIP-72, there is no access to the database session from the worker process, and the runtime objects have some differences to the db models. This PR contains three commits that deal with that situation:

  • First commit adjusts on_task_instance_* listeners interface to AIP-72: drops session argument and makes task_instance argument an instance of RuntimeTaskInstance class, not database model
  • Second commit adds basic support for calling listeners in Task SDK, additionally adding some context fields that allow OL and other listeners not use DB.
  • Third commit adjusts OpenLineage integration to work with new interface and Task SDK.

Some followup work:

  • wrap listener calls into Activity to make logging better visible from UI, and distinct from task logs
  • add separate interface for API to capture manual/API task changes. Impossible to reuse current interface since object would be a different type (not RuntimeTaskInstance)
  • add support for more types of listeners, more in task Ensure Listeners work with Task SDK #45491

closes #45423

@boring-cyborg boring-cyborg bot added area:Executors-core LocalExecutor & SequentialExecutor area:providers area:Scheduler including HA (high availability) scheduler area:task-sdk provider:openlineage AIP-53 labels Dec 30, 2024
@mobuchowski mobuchowski force-pushed the tasksdk-call-listeners branch 4 times, most recently from f07a78e to 81b886b Compare January 2, 2025 08:20
@potiuk
Copy link
Member

potiuk commented Jan 2, 2025

@mobuchowski - can you please rebase that one -> we found and issue with @jscheffl with the new caching scheme - fixed in #45347 that would run "main" version of the tests. I am asking in all affected PRs to rebase.

@potiuk potiuk force-pushed the tasksdk-call-listeners branch from 81b886b to 15fd95a Compare January 2, 2025 12:26
@potiuk
Copy link
Member

potiuk commented Jan 2, 2025

Actually - I rebased it now.

@mobuchowski
Copy link
Contributor Author

I will rebase very soon as I'm working on some of the test failures anyway 🙂

@mobuchowski mobuchowski force-pushed the tasksdk-call-listeners branch 7 times, most recently from 35a94e2 to 86297af Compare January 6, 2025 16:35
@mobuchowski mobuchowski marked this pull request as ready for review January 6, 2025 16:52
@mobuchowski mobuchowski force-pushed the tasksdk-call-listeners branch from 86297af to 6d88389 Compare January 6, 2025 17:04
@mobuchowski mobuchowski changed the title [draft] AIP-72: call on_task_instance_* listeners AIP-72: call on_task_instance_* listeners Jan 6, 2025
@mobuchowski mobuchowski force-pushed the tasksdk-call-listeners branch 4 times, most recently from c2e2165 to 305fff0 Compare January 24, 2025 12:21
@mobuchowski mobuchowski force-pushed the tasksdk-call-listeners branch 8 times, most recently from 3dc5ae4 to 39172c3 Compare February 4, 2025 09:26
@vikramkoka
Copy link
Contributor

@mobuchowski
Hope all is well. Just checking in how this is coming along?

You had also mentioned that you would be raising another PR once #45732 was merged. Since that is already merged, is that follow-up PR already in progress.

Thank you!

@mobuchowski mobuchowski force-pushed the tasksdk-call-listeners branch from 39172c3 to 0705a85 Compare February 7, 2025 19:12
@mobuchowski mobuchowski requested a review from jscheffl as a code owner February 7, 2025 19:12
… TaskSDK, make OpenLineage provider support Airflow 3's listener interface

Signed-off-by: Maciej Obuchowski <[email protected]>
@mobuchowski mobuchowski force-pushed the tasksdk-call-listeners branch from 0705a85 to 61da140 Compare February 7, 2025 19:39
@mobuchowski
Copy link
Contributor Author

Hey @vikramkoka - rebased this PR and fixed conflicts, it should be good to merge.

Haven't started on followup to #45732 but will try to find some time next week. I don't think we need to hold up this PR for that, especially since rebasing 2000 lines PR on fast moving target takes significant amount of time.

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works. Should have a follow-up PR changing OL identifier for DagRun and TaskInstance to use run_id.

@mobuchowski mobuchowski merged commit 0047a68 into apache:main Feb 11, 2025
91 checks passed
potiuk added a commit to potiuk/airflow that referenced this pull request Feb 12, 2025
So far cleaning airflow installation only happened in canary runs
and it caused some PRs not failing when they should - for exmaple
the apache#45294 was green when it should fail because uuid6 package was
not removed before installing old version of Airlfow.

Cleaning airflow installation is fast with uv so we should be
ok with running it always for compatibility tests.
potiuk added a commit to potiuk/airflow that referenced this pull request Feb 12, 2025
So far cleaning airflow installation only happened in canary runs
and it caused some PRs not failing when they should - for exmaple
the apache#45294 was green when it should fail because uuid6 package was
not removed before installing old version of Airlfow.

Cleaning airflow installation is fast with uv so we should be
ok with running it always for compatibility tests.
potiuk added a commit that referenced this pull request Feb 12, 2025
…46693)

So far cleaning airflow installation only happened in canary runs
and it caused some PRs not failing when they should - for exmaple
the #45294 was green when it should fail because uuid6 package was
not removed before installing old version of Airlfow.

Cleaning airflow installation is fast with uv so we should be
ok with running it always for compatibility tests.
ambika-garg pushed a commit to ambika-garg/airflow that referenced this pull request Feb 17, 2025
… TaskSDK, make OpenLineage provider support Airflow 3's listener interface (apache#45294)

Signed-off-by: Maciej Obuchowski <[email protected]>
ambika-garg pushed a commit to ambika-garg/airflow that referenced this pull request Feb 17, 2025
…pache#46693)

So far cleaning airflow installation only happened in canary runs
and it caused some PRs not failing when they should - for exmaple
the apache#45294 was green when it should fail because uuid6 package was
not removed before installing old version of Airlfow.

Cleaning airflow installation is fast with uv so we should be
ok with running it always for compatibility tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Executors-core LocalExecutor & SequentialExecutor area:providers area:Scheduler including HA (high availability) scheduler area:task-sdk provider:openlineage AIP-53
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure Task Instance Listeners work with Task SDK
7 participants