Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5362] Reorder imports #5944

Merged
merged 1 commit into from
Oct 2, 2019

Conversation

KevinYang21
Copy link
Member

@KevinYang21 KevinYang21 commented Aug 29, 2019

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    Imports in the projects are not ordered properly and there's no unified way to sort them. This causes big headaches resolving conflicts in rebasing/cherry-picking and the lookings of the code base. What're included in the PR:
  1. Add flake8 isort plugin in to check import soring in pre-commit/CI
  2. Added isort config in setup.cfg.
  3. Sort existing imports following the isort rules.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    No code change, just changing imports.

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

@KevinYang21 KevinYang21 force-pushed the kevin_yang_import_format branch 11 times, most recently from 3addf68 to 4421f97 Compare August 30, 2019 18:59
@KevinYang21 KevinYang21 changed the title [WIP][AIRFLOW-XXXX] Reorder imports [WIP][AIRFLOW-5362] Reorder imports Aug 31, 2019
@KevinYang21 KevinYang21 changed the title [WIP][AIRFLOW-5362] Reorder imports [AIRFLOW-5362] Reorder imports Aug 31, 2019
@KevinYang21
Copy link
Member Author

@ashb PTAL, this should make it easier for your cherry-picking during release :D Owe this to you.

@KevinYang21 KevinYang21 requested review from mik-laj, ashb and BasPH August 31, 2019 01:58
@KevinYang21
Copy link
Member Author

What I'm not that sure is if/where I should put instruction about isort so that people can just sort their files in one command. CONTRIBUTING.md, CI output or somewhere else?

@potiuk
Copy link
Member

potiuk commented Aug 31, 2019

Really nice @KevinYang21 ! I love it.

It's a huge change though and it will have a lot of conflicts with #5786 so we have to think carefully which one to merge first (I believe it will be easier to have isort re-run after the pylint changes).

I do not think we need explicitly state the instructions about isort in documentation as long as we add sorting script as another pre-commit check. We already have several pre-commit checks that modify the sources (for example the licence checks).

Pre-commit works in the way that if it modifies any of the files while running - the whole pre-commit fails and it asks the user to add/commit the modified files. Also if someone does not use pre-commits (yet!) the checks fail in CI because some files were modified. Then in CI there is a very clear instruction "please run pre-commit and then add/commit changed files". I think it's better than comprehensive documentation because if travis fails you simply get instructions what to do :). And with properly implemented pre-commit check it will be super-easy to run by anyone even if they don't have pre-commits installed as git hooks. They would simply need to run 'pre-commit run sort--imports' if we add it with this id.

I can help with adding it if needed :). With isort I think we do not even need docker image to run - you can simply specify isort as additional_dependencies as ['isort'] of python pre-commit hook and add isort configuration and specify the isort command to execute (see https://pre-commit.com/#pre-commit-configyaml---hooks) . In such case pre-commit will automatically create virtualenv, add isort as dependency and run the command specified (and if it won't modify any of the files it will be automatically "passed").

@potiuk potiuk self-requested a review August 31, 2019 14:14
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome @KevinYang21!

As Jarek mentions it's probably easier to merge the other PR first as all (most?) of these changes are automatic.

setup.py Show resolved Hide resolved
@@ -16,14 +16,14 @@
'scatterChart', 'discreteBarChart', 'multiBarChart']


from . import ipynb
from .cumulativeLineChart import cumulativeLineChart
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you exclude anything under airflow/_vendor from changes please?

Perhaps skip=airflow/_vendor in setup.cfg might do the trick? Should do given skip = build,.tox,venv is used in some of isort's examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another thought.

I think one big change to order all imports in one go is not a good idea. We should probably use the same approach as we did for pylint.

Since we can have all of this as pre-commit hooks - we could do filtering in the same way we do for pylint pre-commit hooks/ci_scripts using the very same "pylint_todo.txt" file.
I think those two are fairly connected and we should use the list to only introduce such changes when we remove files from the pylint_todo.txt. And this way we make sure it will be gradually introduced.

Once we add it as pre-commit hook, it will have the property that someone fixing pylint will also have to make sure that imports are sorted. I also think about adding pydocstyle in exactly the same way -> so make sure the style is defined and checked for all files that are not on the pylint_todo.txt. This way reviewers will not have to even check if the documentation style we have is good and follows our agreed conventions. I think this should be automated rather than taking precious time of reviewers :).

What do you think (@KevinYang21 @BasPH @ashb @mik-laj ) ? I can add it quickly while I will be adding pydocstyle (I plan to test it while fixing @mik-laj comments for #5786) and I can add isort in very similar way there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this change should be combined with automatic code formatting. Some automatic formatting tools also sort imports. If we now introduce import sorting, we will limit the choice of a tool that will be compatible, or we will have to re-sort all imports this time in a manner compatible with another tool.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should treat pylint + pydoc + import sort differently than whole source file formatting change.

Black formatting (or whatever we choose) changes whole files. Introducing it will basically rewrite ALL the lines of code we have. It will make it next to impossible to merge in-progress PRs (and in some cases those changes will be very difficult to resolve conflicts in those rebases).

On the other hand - sorting imports/adding pylint changes/updating documentation is a very small and localised change to a small part of the code - even if it is done in all files.
So changing import orders or adding docs is far less invasive than formatting change (you can still easily merge other changes after you applied them).

That's why I think it's worth to sort imports now separately. Even if formatting tool will change the sort order later, it will be done together with plenty of other changes for those files, so this will be pretty much no-issue if also sorting order for imports change then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change shouldn't affect all PRs, only ones that new imports.

With the git blame --ignore-revs-file option (I thought there was a default way to set that but it's a client only setting) I would prefer changes like this to be a single commit rather than multiple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashb - If you refer to import ordering I think import sort is rather non-invasive so I do not think it falls into the 'git blame' case. It is rather localised and does not really affect the "meat" of the source code. But If you refer to reformatting - I agree that re-formatting the code if we decide to do it should be done as single big change and one that we can exclude with --ignore-revs-file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. Some goodies from Chromium team: git hyper-blame command - one that can ignore commits stored in committed .git-blame-ignore-revs (https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html)

:D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will exclude unnecessary packages. Given how automatic isort can be esp. with pre-commit, I prefer to roll the change out in one commit.

@KevinYang21
Copy link
Member Author

@potiuk The other PR can be merged first for sure--beauty of isort is that it comes with a sorting engine so I can redo this PR in a couple mins. And the pre-commit approach sounds good, I'll take care of it.

@KevinYang21 KevinYang21 force-pushed the kevin_yang_import_format branch 4 times, most recently from f680276 to b0f68ff Compare September 5, 2019 23:00
@potiuk
Copy link
Member

potiuk commented Sep 6, 2019

OK. Sure. Let me merge my PR and you can redo yours afterwards. I am ok with one single commit for everything as well as linking it with pylint changes. Both work for me.

@KevinYang21
Copy link
Member Author

@potiuk I added a python pre-commit hook for isort, do you mind help me double check if it is what you had in mind please?

@ashb Added the skip list and tested locally.

I'll redo the PR right before I merge it.

@KevinYang21 KevinYang21 force-pushed the kevin_yang_import_format branch from b0f68ff to 1b243d7 Compare September 6, 2019 21:11
- id: isort
name: Run isort to sort imports
language: python
entry: isort -rc .
Copy link
Member

@potiuk potiuk Sep 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove -rc . please. Pre-commit by default passes changed filenames as command arguments (default configuration is pass_filenames: true).

This has a number of advantages:

  • in regular pre-commit it will only pass the names of files that changed/were added in this commit/series of commit when pushing (this makes it much faster for smaller commits)
  • when you have big number of changes it will automatically split the command to several commands run in parallel (equal to the number of threads you have available on your PC) - distributing the changed files between those. If you have 4 CPUs with 2 threads each it will split the big list of files between 8 isort commands and run them in parallel. This means that if you remove all your changes now and run pre-commit run --all-files it will run 8 times faster than isort -rc ..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roger that 👍 I was under the impression that happens only after u set the files but obviously I should go read doc first :P

@KevinYang21 KevinYang21 force-pushed the kevin_yang_import_format branch 4 times, most recently from 9c45828 to b389e83 Compare September 11, 2019 23:18
@KevinYang21
Copy link
Member Author

@potiuk PTAL, will rebase right before merge.

@KevinYang21 KevinYang21 force-pushed the kevin_yang_import_format branch from b389e83 to 049538a Compare September 12, 2019 00:35
from typing import Optional, Callable
from airflow import version
from airflow.utils.log.logging_mixin import LoggingMixin
isort:skip_file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this now show up in the API section of the rendered docs?

Is there a reason we are skipping this file but making changes to it? Did perhaps we need a blank line first?

Suggested change
isort:skip_file
isort:skip_file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There're a lot implied dependencies inside multiple imports in this file. I tried a couple times try to reorder a couple but it keeps getting me more errors so I decided to skip it with the maximum changes I tried that won't break it so I don't have to add expection comment in multiple lines. And I'll for sure add that blank line :P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And sry I'm missing the docs part. Do you mind help pointing me to this rendered docs you referred to please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. I just merged the pylint changes few days ago - so rebasing this one and re-running the isort should be good to go @KevinYang21

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thank you

@ashb
Copy link
Member

ashb commented Sep 30, 2019

https://pypi.org/project/zimports/ might help with making this automatic as part of the pre-commit hook.

@KevinYang21 KevinYang21 force-pushed the kevin_yang_import_format branch from 049538a to 7b44ff3 Compare October 1, 2019 22:42
@KevinYang21 KevinYang21 force-pushed the kevin_yang_import_format branch from 7b44ff3 to 1a85aa9 Compare October 2, 2019 00:34
@codecov-io
Copy link

Codecov Report

Merging #5944 into master will decrease coverage by 0.01%.
The diff coverage is 84.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5944      +/-   ##
==========================================
- Coverage   80.06%   80.04%   -0.02%     
==========================================
  Files         610      610              
  Lines       35286    35261      -25     
==========================================
- Hits        28251    28226      -25     
  Misses       7035     7035
Impacted Files Coverage Δ
airflow/contrib/operators/pubsub_operator.py 100% <ø> (ø) ⬆️
airflow/lineage/datasets.py 84.72% <ø> (ø) ⬆️
airflow/executors/local_executor.py 82.17% <ø> (ø) ⬆️
airflow/example_dags/example_gcs_to_gcs.py 100% <ø> (ø) ⬆️
airflow/contrib/hooks/redis_hook.py 100% <ø> (ø) ⬆️
...irflow/contrib/operators/gcp_container_operator.py 100% <ø> (ø) ⬆️
airflow/contrib/hooks/ftp_hook.py 69.02% <ø> (ø) ⬆️
airflow/contrib/operators/gcp_tasks_operator.py 0% <ø> (ø) ⬆️
airflow/kubernetes/volume.py 100% <ø> (ø) ⬆️
...low/contrib/example_dags/example_winrm_operator.py 0% <ø> (ø) ⬆️
... and 315 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02b478e...1a85aa9. Read the comment docs.

@KevinYang21
Copy link
Member Author

@ashb Ya if we use flake8-import-order then zimports can serve as the sorting engine. Actually I started with flake8-import-order and moved to isort primarily because isort got the sorting check and engine in one package. I would still prefer to use isort for that reason and also for that it has been around for much longer than zimports, which I assume more people are comfortable with it and is more mature. Does it make sense?

@potiuk Rebased again, mind take a quick pass before I merge it? Thank you.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice!

@potiuk potiuk merged commit 8898920 into apache:master Oct 2, 2019
@potiuk
Copy link
Member

potiuk commented Oct 2, 2019

Thanks @KevinYang21 -> I merged and cherry-picked (I rerun isort) to v1-10-test. This way it will be easier to cherry-pick changes.

@potiuk
Copy link
Member

potiuk commented Oct 2, 2019

Turned out that sorting v1-10-test causes some circular imports which are rather difficult to fix without splitting some of the files as it was done in 2.0.0 so I reverted it back in v1-10-test :(

kaxil pushed a commit that referenced this pull request Oct 2, 2019
@houqp
Copy link
Member

houqp commented Oct 7, 2019

@potiuk @kaxil looks like commit 8898920 got merged into master, then was force updated to d719e1f later on. This broke our internal pipeline. Do you know what happened?

@kaxil
Copy link
Member

kaxil commented Oct 7, 2019

@houqp Yes, the main reason was the PR before this (#6096) was merged with "Rebase + Merge" which caused many un-wanted commits (instead of a single commit) and hence we had to squash those commits and force-push.

What was your internal pipeline doing???

@houqp
Copy link
Member

houqp commented Oct 10, 2019

@kaxil we have an internal pipeline that pulls latest commit from master for deployment purpose. Whenever there is a force update on the master, it invalidates our internal tree, which requires human intervention to get it unblocked.

@potiuk
Copy link
Member

potiuk commented Oct 10, 2019

It was my fault @houqp (sorry @kaxil you had to do that). I mistakenly rebased a change instead of squashing and Kaxil had to correct my mistake. Apologies ! @kaxil - maybe we can simply completely disable rebasing option in GitHub ? I sometimes switch to it and GitHub remembers the last setting unfortunately :(

potiuk pushed a commit that referenced this pull request Nov 5, 2019
potiuk pushed a commit that referenced this pull request Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants