Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jumping tasks in grid #27523

Closed
2 tasks done
fokmess opened this issue Nov 5, 2022 · 20 comments
Closed
2 tasks done

Jumping tasks in grid #27523

fokmess opened this issue Nov 5, 2022 · 20 comments
Assignees
Labels

Comments

@fokmess
Copy link

fokmess commented Nov 5, 2022

Apache Airflow version

2.4.2

What happened

Some tasks can be reordering during run in grid

Jumping_Tasks

Снимок экрана 2022-11-06 в 00 02 24

Снимок экрана 2022-11-06 в 00 02 19

What you think should happen instead

No response

How to reproduce

When i start generating tasks from query on postgreSQL and run any dag this bug show up

Operating System

Ubuntu 20.04.5 LTS

Versions of Apache Airflow Providers

apache-airflow-providers-common-sql==1.0.0
apache-airflow-providers-elasticsearch==4.2.1
apache-airflow-providers-ftp==3.0.0
apache-airflow-providers-google==8.1.0
apache-airflow-providers-http==3.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-postgres==5.0.0

Deployment

Virtualenv installation

Deployment details

No response

Anything else

Every run on random tasks groups

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@fokmess fokmess added area:core kind:bug This is a clearly a bug labels Nov 5, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 5, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@bbovenzi bbovenzi added area:UI Related to UI/UX. For Frontend Developers. and removed area:core labels Nov 7, 2022
@bbovenzi
Copy link
Contributor

bbovenzi commented Nov 7, 2022

I'll look into this. Quick question, does this happen for all of your dags or only specific ones?

@fokmess
Copy link
Author

fokmess commented Nov 8, 2022

This happens with task_groups in dags that are dynamic generated from one python file

globals()[dag_id] = create_dag(...)

@bbovenzi
Copy link
Contributor

bbovenzi commented Nov 9, 2022

Ahh ok. @ashb I remember that you wrote up the task ordering logic. Sounds like there might be an issue parsing the tasks for programmatically generated DAGs?

Also, this is probably a duplicate of #23542

@ashb
Copy link
Member

ashb commented Nov 9, 2022

Maybe we need to add a "fallback" sort to sort by task_id as a last resort so that the order is stable.

@pierrejeambrun
Copy link
Member

@fokmess I see that you are willing to make a PR to fix this, I am assigning you. 😄

Don't hesitate to ask for some pointers if needed

@ashb
Copy link
Member

ashb commented Nov 16, 2022

I've got a fix for this, in working out where the problem was I have found that the fix is one line of code (and a new test)

@ashb ashb self-assigned this Nov 16, 2022
@ashb
Copy link
Member

ashb commented Nov 17, 2022

I'm actually not sure about my fix. Yes it does make the sort order stable, but right now the order should be using the "file definition order" -- i.e. tasks added to the dag first appear in the sort first. And that "feels" like a better default sort for most people that forcing it to use task_id sorting

@fokmess What is the "input" for your dynamic DAG? Is it based on something that might be "unordered" (though please note that dicts in python from 3.6+ are always ordered by insertion order, so anything like loading json or yaml from a file will also be ordered)

@ashb
Copy link
Member

ashb commented Nov 17, 2022

Ah:

When i start generating tasks from query on postgreSQL and run any dag this bug show up

So yeah, I guess we need to decide if this is something we should change in Airflow, or is the right fix here to be say "add an ORDER BY to your query to generate it in a stable order"

I'm leaning towards the later.

@potiuk
Copy link
Member

potiuk commented Nov 17, 2022

So yeah, I guess we need to decide if this is something we should change in Airflow, or is the right fix here to be say "add an ORDER BY to your query to generate it in a stable order"

I'm leaning towards the later.

Agree - sorting by "deifinition order" is generally way better than alphabetic sorting in DAGs. We have similar issue about sorting the template_fields - #27026 (comment) where I believe we should keep "natural order" not sorting nor parameterisation. Everywhere for DAG Author (or DAG Generating programmer if it is from a DB) this gives the user full control over the order it will appear in UI without adding any new options and parameters. If they want to sort those alphabetically - they can still do it at writing/generation time. Nothing blocks them from doing so.

@pierrejeambrun
Copy link
Member

Removing the area:UI for area:webserver as this is most likely where changes will happen (if any)

@pierrejeambrun pierrejeambrun added area:webserver Webserver related Issues and removed area:UI Related to UI/UX. For Frontend Developers. labels Nov 17, 2022
@ashb
Copy link
Member

ashb commented Nov 19, 2022

Chaning my idea of a fix from code fix to documentation against docs/apache-airflow/howto/dynamic-dag-generation.rst

@ashb ashb added kind:documentation and removed kind:bug This is a clearly a bug area:webserver Webserver related Issues labels Nov 19, 2022
@JCoder01
Copy link
Contributor

I'm seeing this behavior too in a couple of my dags. I don't load anything from a database, and the dags generated just using with Dag(...) as dag:

@bt-
Copy link

bt- commented Nov 28, 2022

I am running into the same thing with my DAG. I have the tasks within my TaskGroups dynamically generated based on a list parsed from a yaml file. After watching more closely, it is my TaskGroups that move around in the grid view. The tasks within the TaskGroup aren't moving.

@JCoder01
Copy link
Contributor

Taking a longer look at mine, I'm seeing the same thing. This occurs on a dag with a task group, when the task group is expanded the task group and downstream tasks move around. It alternates between being correctly ordered and the expanded task group and the downstream tasks appearing before the completed upstream tasks.

@uranusjr uranusjr added area:UI Related to UI/UX. For Frontend Developers. kind:documentation and removed kind:documentation labels Nov 29, 2022
@NilsJPWerner
Copy link

I have the same issue as @JCoder01. When the task groups are expanded they reorder constantly. I can post a smaller example that show the behavior.

@NilsJPWerner
Copy link

with DAG(
  #misc
) as dag:
    run_id = "{{ dag_run.conf.get('run_id') or data_interval_end.int_timestamp }}"
    updater = create_updater_tasks(run_id)
    morning_block = create_morning_block_tasks()
    updater >> morning_block


def create_updater_tasks(run_id: str):
    track_matching = StataOperator(name="track_matching")

    with TaskGroup("create_expost_data_with_wps") as create_expost_data_group:
        DummyOperator(task_id="do_stuff_1")

    with TaskGroup("create_exante_data") as create_exante_data_group:
        DummyOperator(task_id="do_stuff_2")

    merge_expost_exante_data = StataOperator(name="merge_expost_exante_data", )

    start_updater = DummyOperator(task_id="start_updater")
    end_updater = DummyOperator(task_id="end_updater")

    start_updater >> track_matching
    track_matching >> create_expost_data_group >> merge_expost_exante_data
    track_matching >> create_exante_data_group >> merge_expost_exante_data >> end_updater
    return end_updater

def create_morning_block_tasks():
    with TaskGroup("morning_block") as morning_block_tasks:
        DummyOperator(task_id="do_stuff_3")

And here are some examples of it jumping around
Screen Shot 2022-12-14 at 3 34 53 PM
Screen Shot 2022-12-14 at 3 35 15 PM

@potiuk potiuk added this to the Airflow 2.5.1 milestone Dec 30, 2022
@potiuk
Copy link
Member

potiuk commented Dec 30, 2022

FYI @ashb @bbovenzi - the example above changes the diagnosis a bit. Seems that this is not only when there are dynamic DAGs and lack of ordering of task creation (which I agree is not a bug on Airflow but a bug in DAG generation). Seems that this cases is also something that might be caused by lack of consistent ordering when there are multiple TaskGroups. Marked it for 2.5.1 as this one seems like esy fix for that case.

potiuk added a commit to potiuk/airflow that referenced this issue Dec 30, 2022
The description is more clear now what Dynamic DAG generation is
vs. Dynamic Task Mapping and note is added to the users to pay
attention about the stable sorting that should be applied when
generating DAGS.

Related: apache#27523
potiuk added a commit that referenced this issue Dec 30, 2022
The description is more clear now what Dynamic DAG generation is
vs. Dynamic Task Mapping and note is added to the users to pay
attention about the stable sorting that should be applied when
generating DAGS.

Related: #27523
ephraimbuddy pushed a commit that referenced this issue Jan 12, 2023
The description is more clear now what Dynamic DAG generation is
vs. Dynamic Task Mapping and note is added to the users to pay
attention about the stable sorting that should be applied when
generating DAGS.

Related: #27523
(cherry picked from commit 36d887b)
@eladkal eladkal removed this from the Airflow 2.6.1 milestone Apr 28, 2023
@JCoder01
Copy link
Contributor

JCoder01 commented May 1, 2023

@NilsJPWerner, @fokmess, are you still seeing this? I'm on 2.5.3 and the ordering seems consistent now.

@eladkal eladkal added Can't Reproduce The problem cannot be reproduced pending-response and removed Can't Reproduce The problem cannot be reproduced labels May 1, 2023
@potiuk
Copy link
Member

potiuk commented May 1, 2023

closing as fixed. We can reopen in case it is not .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests