Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow attaching to previously launched task in ECSOperator #16685

Merged
merged 11 commits into from
Jul 15, 2021

Conversation

pmalafosse
Copy link
Contributor

@pmalafosse pmalafosse commented Jun 28, 2021

FYI @darwinyip

This PR adds a parameter reattach_prev_task to ECSOperator.

Before:
Until now we could use 'reattach' which was reattaching a running ECS Task (if there was one running) of the same 'family' instead of creating a new one. The problem was that if we had workflows using the same ECS Task Definition in several tasks, it didn't know which one to reattach and we could only use concurrency=1 in some pipelines for example (when we launch the same ECS task in parallel from Airflow with different configurations).

Now:
Now with reattach_prev_task instead, when we launch a new ECS task, it will store temporarily the ECS Task ARN in XCOM. If there is an issue during the run (typically connection problem between Airflow and ECS for long-running tasks or Airflow worker restarting which was then still running those tasks in the background without Airflow being aware of it):

  • self._start_task will store the ECS task ARN in XCOM (in a 'fake' task_id equal to f"{self.task_id}_task_arn"
  • in the next execution, it will check if this task ARN is still running and if so it will reattach it to the operator, otherwise it will create a new one
  • when the operator runs succesfully it will delete the XCOM value

I didn't change the logic of 'reattach' to do that directly because I didn't know if it had been designed for other use cases

Update 2021-07-01:
After discussing with @darwinyip I made the change to 'reattach' directly instead of creating a new flag

@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Jun 28, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 28, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@pmalafosse
Copy link
Contributor Author

FYI @feluelle @o-nikolas @markhopson as you were involved in other PRs for this ECSOperator

@darwinyip
Copy link
Contributor

darwinyip commented Jul 1, 2021

This flag doesn't seem like a good idea. I agree with the underlying issue this is trying to solve.
What about just updating reattach directly with this new logic? The flag only has a simple use case in mind with no consideration for concurrency, and that was okay at that time.

A more abrupt approach would be forcing every ECS task to take the name airflow + dag_id + task_id and have reattach check on that.

@pmalafosse
Copy link
Contributor Author

This flag doesn't seem like a good idea. I agree with the underlying issue this is trying to solve.
What about just updating reattach directly with this new logic? The flag only has a simple use case in mind with no consideration for concurrency, and that was okay at that time.

A more abrupt approach would be forcing every ECS task to take the name airflow + dag_id + task_id and have reattach check on that.

Yes that makes sense, I will update 'reattach' logic directly then, I was not sure if it had an other user case I hadn't thought about.

Unfortunately we can't define the ECS task_id from our side, it is a unique ID generated by Amazon (and if we could there would be problems with retries and clearing tasks anyway).

@pmalafosse
Copy link
Contributor Author

I just made the change to 'reattach' instead @darwinyip

@kaxil
Copy link
Member

kaxil commented Jul 1, 2021

cc @subashcanapathy If anyone in your team can review this, it would be great as it affects AWS Hooks

@subashcanapathy
Copy link
Contributor

@o-nikolas @ferruzzi Tagging for review

@ferruzzi
Copy link
Contributor

ferruzzi commented Jul 6, 2021

Looks fine to me. I'm surprised there aren't existing helpers for xcom get/put.

airflow/providers/amazon/aws/operators/ecs.py Outdated Show resolved Hide resolved
airflow/providers/amazon/aws/operators/ecs.py Show resolved Hide resolved
airflow/providers/amazon/aws/operators/ecs.py Outdated Show resolved Hide resolved
@pmalafosse
Copy link
Contributor Author

Thanks a lot for the review @ferruzzi @o-nikolas , I made the small changes you mentioned

@o-nikolas
Copy link
Contributor

Thanks a lot for the review @ferruzzi @o-nikolas , I made the small changes you mentioned

Thanks! I reviewed the latest commit, and it looks like there's one remaining line where you should be using the new constants. Let me know if I'm mistaken.

@pmalafosse
Copy link
Contributor Author

@o-nikolas thanks for spotting that! There were 2 lines not using the constants yet indeed. Fixed now

Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the contribution :)

@kaxil
Copy link
Member

kaxil commented Jul 14, 2021

Can you rebase on latest main please @pmalafosse

@pmalafosse
Copy link
Contributor Author

done @kaxil it's the first time I do that so I'm not sure I did it right
I followed those instructions FYI https://medium.com/@topspinj/how-to-git-rebase-into-a-forked-repo-c9f05e821c8a

@kaxil
Copy link
Member

kaxil commented Jul 14, 2021

done @kaxil it's the first time I do that so I'm not sure I did it right
I followed those instructions FYI https://medium.com/@topspinj/how-to-git-rebase-into-a-forked-repo-c9f05e821c8a

You did it right :) -- let's wait for the CI to pass.

@kaxil kaxil changed the title reattach_prev_task ECSOperator Allow attaching to previously launched task in ECSOperator Jul 14, 2021
@pmalafosse
Copy link
Contributor Author

Sorry there were some conflicts after the rebase that I hadn't fixed completely, fixed now (I ran pytest locally and it passed) @kaxil .

@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Jul 14, 2021
@kaxil kaxil merged commit fc0250f into apache:main Jul 15, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 15, 2021

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers okay to merge It's ok to merge this PR as it does not require more tests provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants