Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird result with as_uri() on s3 files #115

Closed
mberlinger3 opened this issue Jul 11, 2023 · 4 comments · Fixed by #133
Closed

Weird result with as_uri() on s3 files #115

mberlinger3 opened this issue Jul 11, 2023 · 4 comments · Fixed by #133
Labels
bug 🐛 Something isn't working

Comments

@mberlinger3
Copy link

mberlinger3 commented Jul 11, 2023

The as_uri() attribute of an s3path looks like it is returning a file of an encoded url; however, it should really be likely a reimplementation of the .as_posix() attribute which would return the same as the input below.

f = UPath(f's3://{bucket}/{key}') 

f.as_uri()==f"file://s3%3A//{bucket}/{key}"
f.as_posix()==f's3://{bucket}/{key}'

Found this because for some reason polars doesnt like UPaths unless the .as_posix() property is used.

@ap--
Copy link
Collaborator

ap-- commented Jul 12, 2023

Hi @mberlinger3,

thank you for opening an issue! It seems as_uri is broken in general for UPath.
An easy way would be to implement it as:

class UPath(...):
    def as_uri(self):
        return str(self)

would you be interested in adding tests and a patch in a PR?

Cheers,
Andreas 😃

@mberlinger3
Copy link
Author

mberlinger3 commented Jul 12, 2023

Hi @mberlinger3,

thank you for opening an issue! It seems as_uri is broken in general for UPath. An easy way would be to implement it as:

class UPath(...):
    def as_uri(self):
        return str(self)

would you be interested in adding tests and a patch in a PR?

Cheers, Andreas smiley

I think this would make more sense on the non-local (cloud?) implementations because it looks like the as_uri() method is currently inherited from the standard pathlib implementation which is correct for local files. Perhaps a specific wrapper in the UPath class to explicitly call the pathlib implementation and separate methods for the differing schemes.

@ap--
Copy link
Collaborator

ap-- commented Jul 13, 2023

Local paths that aren't provided via a file uri are currently returned as their corresponding pathlib.PosixPath and pathlib.WindowsPath instances. (which is going to change in the near future, see: #90)

As it stands right now, as_uri seems to be broken in general for all UPath subclasses:

>>> import upath
>>> upath.UPath("file://a/b/c.dat").as_uri()
'file://file%3A//a/b/c.dat'

So a general fix in upath.UPath.as_uri seems appropriate. If we add a test to upath/tests/cases.py we can ensure that we do the right thing for the implemented subclasses and address potential issues there with custom implementations.

@mberlinger3
Copy link
Author

mberlinger3 commented Jul 13, 2023

I see what is issue is here. If you instantiate UPath using a uri as above it breaks; however, if you use a file path then as_uri works as expected.

>>> from upath import UPath
>>> print( UPath("/a/b/c.dat").as_uri() )
file:///a/b/c.dat

This could likely be solved by a simple conditional to check if a scheme exists and pass it to pathlib if it doesn't. We could probably implement the scheme check as a simple property.

An alternative solution would be something like what s3path does. They have a method S3Path().from_uri(uri) that takes in a uri while the base init assumes a path syntax.

@ap-- ap-- added the bug 🐛 Something isn't working label Aug 2, 2023
@ap-- ap-- closed this as completed in #133 Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants