-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-72904: Add glob.translate()
function
#106703
Conversation
If a sequence of path separators is given to the new argument, `translate()` produces a pattern that matches similarly to `pathlib.Path.glob()`. Specifically: - A `*` pattern segment matches precisely one path segment. - A `**` pattern segment matches any number of path segments - If `**` appears in any other position within the pattern, `ValueError` is raised. - `*` and `?` wildcards in other positions don't match path separators. This change allows us to factor out a lot of complex code in pathlib.
~20% globbing speedup: $ ./python -m timeit -s 'from pathlib import Path; p = Path()' 'list(p.glob("**/*", follow_symlinks=False))'
2 loops, best of 5: 175 msec per loop # before
2 loops, best of 5: 146 msec per loop # after |
fnmatch.translate()
fnmatch.translate()
Co-authored-by: Jason R. Coombs <[email protected]>
fnmatch.translate()
glob.translate()
function
I've moved this to a new It was easy enough to implement a recursive argument, so I did that and made its default It's much harder to implement an include_hidden argument, so I've left that for now. I don't feel great about it, tbh. |
Right, after some futzing around I'm going to mark this PR as ready again. In (In #109879 I've re-implemented In In |
Timings:
So |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A desk review:
A
Co-authored-by: Adam Turner <[email protected]>
I've realised that the docs are a bit skew-whiff. Fix is in a separate PR: #110418 |
Is it correct to keep duplicated path separators? >>> glob.translate('a//b')
'(?s:a//b)\\Z' |
>>> os.makedirs('a/b')
>>> glob.glob('a//b')
['a//b'] So I reckon yes? |
Also, the number of additional slashes is meaningful in some cases, e.g. in Windows UNC paths or POSIX paths starting with two forward slashes. I don't think a pattern like |
Hey @encukou, do you think I can merge this, or should I wait for a more complete review from someone? |
Oh, I should have been more clear that I wouldn't get to a thorough review in any reasonable time. |
Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`. This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `*` pattern segment matches precisely one path segment. When *recursive* is set to true, `**` pattern segments match any number of path segments, and `**` cannot appear outside its own segment. In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code. Co-authored-by: Jason R. Coombs <[email protected]> Co-authored-by: Adam Turner <[email protected]>
Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`. This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `*` pattern segment matches precisely one path segment. When *recursive* is set to true, `**` pattern segments match any number of path segments, and `**` cannot appear outside its own segment. In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code. Co-authored-by: Jason R. Coombs <[email protected]> Co-authored-by: Adam Turner <[email protected]>
Add
glob.translate()
function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implementmatch()
andglob()
.This function differs from
fnmatch.translate()
in that wildcards do not match path separators by default, and that a*
pattern segment matches precisely one path segment. When recursive is set to true,**
pattern segments match any number of path segments, and**
cannot appear outside its own segment.In pathlib, this change speeds up directory walking (because
_make_child_relpath()
does less work), makes path objects smaller (they don't need a_lines
slot), and removes the need for some gnarly code.📚 Documentation preview 📚: https://cpython-previews--106703.org.readthedocs.build/