-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change _iterate_file_descrs to _get_file_items #10142
Change _iterate_file_descrs to _get_file_items #10142
Conversation
Returning a materialized list instead of an iterator lets us do simple progress reporting in the future. As the list of FileItems is very lightweight, there shouldn't be a performance hit.
Why not just call |
Good question. Either way, it kind of ends up being the same thing: we either edit code so that we do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
If Pierre agrees let's merge this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhhh, this seems to fail CI? 😅
Is that an actual failure? And if so, do you understand why?
Will check it out, thank you @DanielNoord. |
@DanielNoord - this failure actually exists in commit cb12577, immediately prior to my two new commits. I went back a bit just using git reset, and unless there are steps that I'm missing, at appears that this error has been present since at least 713e563 (Thu Oct 5 22:12:56 2023 +0200, Merge branch 'maintenance/3.0.x' into merge-maintenance-into-main):
I haven't read the tests, but I'm assuming that each test can be run independently (no global state trash etc). When I run all of the tests as at that commit 713e563, I get a bunch of failures, including that bad test Just for fun, rechecking cb12577 which was the base of my branch:
I got few failures than the older commit, but that same test failed (on my machine). |
If we change the
Here's git blame for that function, it's been around a while:
Cheers! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the list of FileItems is very lightweight
I'm not sure this is always true, and I'm not sure it's the main consideration. We need to understand why it was done this way before changing it (the first implementation is not often an iterator for no good reasons).
I don't have the time to read everything to confirm it right now but it seems this is required for multiprocessing, I dug into the git history and here's the MR that introduced this:
Last comment might be interesting: #3016 (comment)
Hi @Pierre-Sassoulas , thanks for checking. I was basing this on my review of the code, but that may not have been deep enough. You know, the only reason I suggested this change was to allow for a progress message, “linting 1000 files” eg. Instead of making any changes, I could just make a 2 line function to get the count of items, using mostly the same logic as in the iterator method, which is only called when needed. Would that feel better to you? If yes, this PR can be closed and I can add that in my smaller progress logging draft PR. |
Sorry for not answering very fast, life got in the way. Do you mean something like a file counter inside the iterator loop to be able to know how many file were processed ? Sounds good to me because I think keeping the iterator is better. Multiprocessing didn't get much love yet, so I'm not definitely certain an iterator is required for multiprocessing, and it's just me being cautious and assuming the author and reviewers of #3016 knew what they were doing. |
Not a problem @Pierre-Sassoulas , I appreciate that this is open source so we all do what we can. I’ll try to submit a smaller PR soon, let’s keep this open until that one is submitted just so people know it’s in progress. Cheers, jz |
Closing this per @Pierre-Sassoulas 's feedback. I've opened a simpler PR for progress reporting, #10178 , which doesn't require any iterator changes. Thanks 👋 |
Returning a materialized list instead of an iterator lets us do simple progress reporting in the future. As the list of FileItems is very lightweight, there shouldn't be a performance hit.
Type of Changes
Refs
This work was initially done as part of draft PR #10134 , and it makes sense as a small refactor of the internal method.