-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for sharing state between pathlib subclasses #100479
Comments
This will be needed to implement Lines 296 to 297 in a68e585
|
Exactly how big would the performance regression be? Have you tried running some benchmarks? |
Not yet; I'll do this before I mark the PR as 'ready' and share the numbers. |
My instinct is to say that it would be a shame to make pathlib performance worse, since it's already got a reputation in some quarters for having surprisingly bad performance in some ways compared to
In my opinion, we should be trying to fix this rather than exacerbating the problem :) |
Absolutely fair! Do you know of particular cases of (or complaints about) pathlib being slow? I'm aware that recursive globbing in pathlib can be slow, but not much else. We're increasingly calling I'm of course keen for pathlib to be as fast as possible, but I don't want to prevent the future addition of an |
I have no idea how many of these complaints, if any, still apply to the current implementation of pathlib, nor how many of these are realistically resolvable :) |
Mega, thank you. I'll study these carefully and write benchmarks. Some initial thoughts:
|
I'll try to take a look soon! |
Add `pathlib.PurePath.with_segments()`, which creates a path object from arguments. This method is called whenever a derivative path is created, such as from `pathlib.PurePath.parent`. Subclasses may override this method to share information between path objects. Co-authored-by: Alex Waygood <[email protected]>
* main: pythongh-99113: Add PyInterpreterConfig.own_gil (pythongh-104204) pythongh-104146: Remove unused var 'parser_body_declarations' from clinic.py (python#104214) pythongh-99113: Add Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (pythongh-104205) pythongh-104108: Add the Py_mod_multiple_interpreters Module Def Slot (pythongh-104148) pythongh-99113: Share the GIL via PyInterpreterState.ceval.gil (pythongh-104203) pythonGH-100479: Add `pathlib.PurePath.with_segments()` (pythonGH-103975) pythongh-69152: Add _proxy_response_headers attribute to HTTPConnection (python#26152) pythongh-103533: Use PEP 669 APIs for cprofile (pythonGH-103534) pythonGH-96803: Add three C-API functions to make _PyInterpreterFrame less opaque for users of PEP 523. (pythonGH-96849)
) Add `pathlib.PurePath.with_segments()`, which creates a path object from arguments. This method is called whenever a derivative path is created, such as from `pathlib.PurePath.parent`. Subclasses may override this method to share information between path objects. Co-authored-by: Alex Waygood <[email protected]>
Feature or enhancement
This enhancement proposes that we allow state to be shared between related instances of subclasses of
pathlib.PurePath
andpathlib.Path
.Pitch
Now that #68320 is resolved, users can subclass
pathlib.PurePath
andpathlib.Path
directly:However, some user implementations of classes - such as
TarPath
orS3Path
- would require underlyingtarfile.TarFile
orbotocore.Resource
objects to be shared between path objects (etc and etc_hosts in the example above). Such sharing of resources is presently rather difficult, as there's no single instance method used to derive new path objects.This feature request proposes that we add a new
PurePath.makepath()
method, which is called whenever one path object is derived from another, such as injoinpath()
,iterdir()
, etc. The default implementation of this method looks something like:Users may redefine this method in a subclass, in conjunction with a customized initializer:
I propose the name "makepath" for this method due to its close relationship with the existing "joinpath" method:
a.joinpath(b) == a.makepath(a, b)
.Performance
edit: this change has been taken care of elsewhere, and so implementing this feature request should no longer have much affect on performance.
This change will affect the performance of some pathlib operations, because it requires us to remove the_from_parsed_parts()
constructor, which is an internal optimization used in cases where path parsing and normalization can be skipped (for example, in theparents
sequence). I suggest that, within the standard library, pathlib is not a particularly performance-sensitive module -- few folks reach to pathlib for reason of speed. Within pathlib itself, the savings from optimizing these "pure" methods are usually drowned out by the I/O costs of "impure" methods. With the appeal of this feature in mind, I believe the performance cost is justified.However, if the performance degradation is considered unacceptable, there's a possible alternative: add a normalize keyword argument to the path initializer and tomakepath()
. This would require some serious internal surgery to make work, and might be difficult to communicate to users, so at this stage it's not my preferred route forward.Previous discussion
https://discuss.python.org/t/make-pathlib-extensible/3428/47 (and replies)
Linked PRs
blueprint
argument topathlib.PurePath
#100481pathlib.PurePath.with_segments()
#103975The text was updated successfully, but these errors were encountered: