Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add precommit in workflow to standardize code style #755

Closed
charlesdong1991 opened this issue Sep 6, 2019 · 14 comments
Closed

Add precommit in workflow to standardize code style #755

charlesdong1991 opened this issue Sep 6, 2019 · 14 comments

Comments

@charlesdong1991
Copy link
Contributor

This is not a function but more like coding/contributing related.

I know now quite many open source projects add black for instance in their pre-commit pipeline to kind of standardize the code style. I tried to black several files and found some look more 'pretty'. Not sure if it is something Koalas wants to have as well.

If you like it, I would be happy to work on this.

@ueshin
Copy link
Collaborator

ueshin commented Sep 6, 2019

Basically I'd prefer to have a source code formatter to standardize the code style.

I'm wondering whether the pre-commit hook can format only the updated codes since it's possibly difficult to review so many code changes in a PR for enabling the formatter. If it can, we will eventually have fully formatted codes after a while. If we won't have so many changes in the PR, then it'd be no problem.

Also, do you know other formatters? We might want to compare formatters if any.

@charlesdong1991
Copy link
Contributor Author

charlesdong1991 commented Sep 6, 2019

Thanks for your comment! @ueshin

I'm wondering whether the pre-commit hook can format only the updated codes since it's possibly difficult to review so many code changes in a PR for enabling the formatter. If it can, we will eventually have fully formatted codes after a while. If we won't have so many changes in the PR, then it'd be no problem.

Indeed, most of files I believe will need to be reformatted, and it's difficult to review at once. What I plan to do is to have a WIP PR to implement the pre-commit hook, and then create PRs for each or several files using a selected formatter to avoid huge amount of changes at the same time, once all files or most of files are reformatted, then we merge the pre-cmmit hook PR and then afterwards, all new changes have to follow the new formatting (actually users can just do black filename.py before they push the PR to avoid failure on code format). And since this will be basically only reformatting, so reviewing will become pretty easy, i think as long as the tests can be passed, and then should be good. In the long-term, I think having a source code formatter is nice for a project, especially an open source project.

Also, do you know other formatters? We might want to compare formatters if any.

Sorry about this, I only have experience in black and it's quite popular and pretty good. I think google also has a yapf formatter, but I never used it, so it's hard to compare it for me.

@ueshin
Copy link
Collaborator

ueshin commented Sep 6, 2019

Sounds great. I agree that the formatter is nice for a project.
Sounds like black is good enough for us. yapf says it's still in "alpha" stage.
The way how to apply the formatter to the whole project might need to be considered, though.
WDYT? @HyukjinKwon

Let me try the black in my local for a while in the meantime.

@ueshin
Copy link
Collaborator

ueshin commented Sep 6, 2019

Oh, almost all files changed. We definitely shouldn't do in one PR. haha

@charlesdong1991
Copy link
Contributor Author

yeah, i tried locally as well, since some warnings already showed in pycharm for me, so I would expect almost all files have to be changed. I am submitting a PR for pre-commit, and then shall I create PRs to reformat one batch of files in each PR?

Does this sound good to you? @ueshin @HyukjinKwon

@HyukjinKwon
Copy link
Member

How many files and lines should it be changed? I think it might be fine since we don't currently backport and maintain minor or maintenance releases unless it changes everything with some thousands changes.

@charlesdong1991
Copy link
Contributor Author

almost all *.py files i think, I will create a Draft PR, and present those for all. You could take a look and decide what to do next. @HyukjinKwon

@deepyaman
Copy link
Contributor

@ueshin @HyukjinKwon @charlesdong1991 Is standardized formatting using pre-commit and black something you all are still open to/looking to do? To address some of the comments I’ve read above:

  • Yes, the diff will be huge and almost everything. That being said, you shouldn’t really need to verify much, since the functionality remains the same, verified by Black checking the AST.
  • Black can’t/doesn’t modify comments, since they have no meaning to the AST passer. The only place I’ve seen this cause issues is where you have inline comments with special meaning (e.g. # pylint: disable=...), but I don’t see that from what little I’ve seen the codebase so far. You all obviously know better here.
  • It looks like you all had an issue with the 3.5 build in ENH: Add config for pre-commit #756. We’ve handled this previously (see https://github.com/quantumblacklabs/kedro/blob/develop/.pre-commit-config.yaml#L102-L107).

Overall a big fan of Black—and the fact that it eliminates time spent formatting Python/code style discussions altogether—and happy to pick this back up if you all agree!

@HyukjinKwon
Copy link
Member

Please open a PR. I am interested in taking a look!

@deepyaman
Copy link
Contributor

Great! I'll try and pick this up in the next week.

@deepyaman
Copy link
Contributor

Please open a PR. I am interested in taking a look!

@HyukjinKwon Sorry for the delay in getting to this. #1284 is ready for review, to see (1) if this is something you all want to purse and (2) what the next steps would be in terms of cleaning it up, or (potentially) customizing it to fit the project needs.

@deepyaman
Copy link
Contributor

FWIW, just noticed Pandas also uses Black now (0.25.*, 1.*.*). See https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#python-pep8--black.

@HyukjinKwon
Copy link
Member

Hm, does pandas also apply the change to the whole code base?

@charlesdong1991
Copy link
Contributor Author

charlesdong1991 commented Feb 20, 2020

yeah, pandas started using black quite a while ago, and it was the initial reason to propose this here @deepyaman

@HyukjinKwon FYI, I recall that pandas first applied black for batches of files first (so several PRs for reformatting), and then in the meantime add black as part of linting check in CI, so CI won't pass if users do not black their PR or if the current codebase have codes incompatible with black, and then merge it and black the rest of code alongside (so need to keep rebasing).

It is pretty much alike the PR @deepyaman has, but maybe with several pre-cursor PRs for reformatting first could make this one smaller and easier to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants