-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should UPath.__new__
return pathlib.Path
instances for local paths
#90
Comments
UPath.__new__
return pathlib.Path
instances for local paths
I thought, perhaps mistakenly, that this previous behavior was a good thing: import pathlib
from upath import UPath
path = UPath(".")
assert isinstance(path, pathlib.Path)
assert isinstance(path, UPath) The idea, or at least so I thought, was to enable tools that support pathlib objects to easily support upath objects with little or no code changes. If needed to check for identity of the class, something like this might work: type(path) in (pathlib.WindowsPath, pathlib.PosixPath) I'm sure there were good reasons why this behavior was removed, but I'm not sure why, and would greatly appreciate any clarification for my own understanding. I was never sure that my previous metaclass approach was the right way to do it because the instance check stuff felt hacky, but it did seem to achieve the behavior that I thought I wanted. For reference, here's a simplified version of what that looked like: class UPathMeta(ABCMeta):
def __instancecheck__(cls, instance):
return isinstance(instance, pathlib.Path)
def __subclasscheck__(cls, subclass):
return issubclass(subclass, pathlib.Path)
class UPath(pathlib.Path, metaclass=UPathMeta):
... On a related note, I also think it would be cleaner to put the class resolution logic into I'm really just curious and interested in the pros and cons of various approaches, so I'd love any feedback. |
Hi! Thanks for the repository, we were trying to find a way to work with multiple schemes and use the same API as pathlib, and this seems to be perfect for our usecase. We're trying to use UPath as well, and are encountering the same issue: >>> isinstance(UPath("."), UPath)
False This is causing some issues as we use |
Getting back to this... Would it be possible to get the old behavior back? |
assert isinstance(UPath("s3://..."), Path)
assert isinstance(UPath("."), Path) and not isinstance(UPath("."), UPath) So, I don't think it is great to change the behavior through metaclasses, because that messes with static analysis tooling such as type checkers. Perhaps the type checking issues would be resolved if the signature of https://github.com/fsspec/universal_pathlib/blob/main/upath/core.py#L152 would be changed to return |
I agree. This implicitly rules out restoring the old behavior. @thomasw21 @danielgafni and @brl0, have I understood correctly, that the issues you're facing all boil down to: >>> isinstance(UPath("."), UPath) # for local paths
False If yes, can any of you construct a case in which an intermediate Basically the class hierarchy would then look like this:
I will work on a PR this Sunday, so we can test if it would work for everyone. Cheers, |
I was able to fix my issue by decorrelating constructors and typing. I would say as a general case that for a given class C |
I partially agree: it's confusing the way it's implemented right now, since Though the promotion to specialized subclasses is a common pattern, also used in stdlib pathlib: https://github.com/python/cpython/blob/7279fb64089abc73c03247fb8191082ee42a9671/Lib/pathlib.py#L1154-L1157
Happy to hear that you found a workaround. It'll be interesting to check if it's not needed anymore after this issue is addressed. |
Ah it's not needed anymore (I guess it'd be a good to have, but not blocking), we used @dataclass
class MyRemoteConfig:
path: UPath The original issue is that @dataclass
class MyRemoteConfig:
path: Path (we still use UPath as a constructor so that we can actually pass string with different schemes) |
I have similar issues with validating values in Dagster. Which are supposed to be UPath, but are actually Path for local paths. |
The usage in the standard library allows for My current thinking is that this behavior is a bad thing:
My current preference would be to revert to the old behavior, but that might be because I'm not clear on how that is breaking static type checkers. But since the consensus seems to be to avoid that approach, I would probably lean toward option 2B, while acknowledging the concern that it might run into some bugs in I'd be curious what @barneygale would suggest as the most appropriate approach to inheriting from and being compatible with |
I'm not very familiar with this project so forgive me if I get something wrong! For Python 3.13 (>12 months away) I'm hoping to add a pth: pathlib.AbstractPath = UPath(...)
However,
FWIW, I never liked this. If I were to design pathlib from scratch, I would remove the Taking that approach in s3_pth: universal_pathlib.S3Path = universal_pathlib.S3Path('foo/bar', bucket='blah')
any_pth: pathlib.AbstractPath = universal_pathlib.parse_uri('...') ... but obviously that's quite different! So take it or leave it :) |
Oh sorry, I see that there are already dedicated |
Thanks for the feedback!
Is
we kind of do this via the different implementations for specific filesystems. Regarding `pathlib.AbstractPath`: I would be happy to contribute to this. `universal_pathlib` could be an early adopter to check if the implementation covers all use cases. |
This might be a terrible idea, but another option might be to backport the latest |
Fully agree. It's just quite a bit of work to port the important parts of the test suite. But definitely doable.
We're basically doing this for python |
Unlikely, but not impossible. When users subclass Users may then point out that there are two ways to check if a path uses Windows semantics (for example):
The first of these will not work for subclasses of If this situation proves very confusing, we might add a flavour argument to But it's hard to project that far into the future, and the change would need to be extremely well motivated. Ticket: python/cpython#100502
Fantastic, thank you. My plan at the moment is to add a Ticket: python/cpython#89812 |
Summarizing from #84
When creating a
UPath
instance universal_pathlib returns apathlib.Path
instance for local filesystems, which can cause problems with type checkers:Possible solutions:
(1) user explicitly type annotates
No changes. We could just document it very explicitly.
(2) always return a UPath instance
I think the reason for returning a pathlib.Path instance for local filesystems is to guarantee that there are no changes in behavior when using UPath as a replacement.
(2A) create an intermediate subclass
We could still guarantee pathlib behavior by creating an intermediate UPath class
(2B) we always return the fsspec LocalFileSystem backed UPath instance for local paths
This would be a simple change, but I think we should then port the CPython pathlib test suite to universal_pathlib, to guarantee as good as we can that the behavior is identical. I'm worried that symlink support, the windows UNC path stuff, and other weird edge-cases will probably cause quite a few issues.
(3) apply some magic
We could provide
UPath
as an alias for the UPath class but cast it totype[pathlib.Path]
to trick the static typecheckers. And we could fallback to some metaclass magic to make runtime typechecking work, but this would basically (partially?) revert the cleanup done here: #56The text was updated successfully, but these errors were encountered: