Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about TaskInstance.is_queueable #102

Closed
blake-nicholson opened this issue Jul 2, 2015 · 3 comments
Closed

Question about TaskInstance.is_queueable #102

blake-nicholson opened this issue Jul 2, 2015 · 3 comments

Comments

@blake-nicholson
Copy link

Would you mind clarifying the following condition in TaskInstance.is_queueable:

if self.execution_date > datetime.now() - self.task.schedule_interval:

I would have expected:

if self.execution_date > datetime.now():

Consider the use case of having a daily task. Let's suppose the task ran early this morning for 2015-07-01. However, the task failed for some reason. After fixing the issue, I go to backfill using a command such as:

airflow backfill -s 2015-07-01 -e 2015-07-01 dag_id

As the code is currently written, the backfill will not execute, due to the conditional that I mention above. I could hack it to work by passing in yesterday's date to backfill and then using "{{ tomorrow_ds }}" in my tasks, but this seems like it shouldn't be necessary.

If you're okay with the above change, I'm happy to submit a PR. If I should be taking a different approach, please let me know. Thanks!

@artwr
Copy link
Contributor

artwr commented Jul 2, 2015

@mistercrunch can probably provide a more detailed explanation, but here is my take :

The way our ETL is setup we use execution_date as the date corresponding to the data we want to process (There are a few exceptions, but it is mostly true). In this schema, the task for July 1st does not run on July 1st, because all of the data dependencies have not usually landed yet. We wait until that schedule_interval is over and then start processing the data.
In your example, it means that the daily task for July 1st will actually kick off in the first seconds of July 2nd when this condition is met.

I hope this helps.

@mistercrunch
Copy link
Member

I just added a note in the docs to clarify this here http://pythonhosted.org/airflow/scheduler.html:

Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.

Most likely you want to stamp your partitions with 2016-01-01 for the period covering 2016-01-01 - 2016-01-01T23:59:59, and you want the job that processes that period to be trigger soon after 2016-01-02. That's how Airflow does it anyways.

@blake-nicholson
Copy link
Author

Got it, thanks! That's a bit different than I had planned to use it, but will modify my approach.

mobuchowski pushed a commit to mobuchowski/airflow that referenced this issue Jan 4, 2022
* Add check for file path when extracting task location

Signed-off-by: wslulciuc <[email protected]>

* continued: Add check for file path when extracting task location

Signed-off-by: wslulciuc <[email protected]>

* Update message for task location error

Signed-off-by: wslulciuc <[email protected]>

* continued: Update message for task location error

Signed-off-by: wslulciuc <[email protected]>

* Add check for url building git url

Signed-off-by: wslulciuc <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants