-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support filesystem abstraction with fsspec or epath. #8336
Comments
I think this will be a great feature to integrate in PyG. I'm not familiar with epath, but I've used fsspec in PyTorch Lightning and it works really nice. Do you have an idea on whether we should go with fsspec or other libraries by chance? |
My personal preference is epath, but it is because I enjoy the pathlib API. The fsspec library seems to be more popular based on stars and also would be more of a 1:1 change based on the current use of built-in FS usage we have in PyG at the moment. E.g.,
vs.
|
@rusty1s Any thoughts? |
As far as I understand, |
For what it's worth (as someone who would make heavy use of remote filesystems if this was implemented), I would be strongly in favor of fsspec, mostly because as far as I can tell, fsspec is what is currently used by base PyTorch and would minimize the number of dependencies needed for a base PyG install. |
@rachitk I didn't know that PyTorch 2.1.0 added @asherbondy I think |
I actually didn't realize this was a new addition in 2.1.0 myself - I thought it was a much older addition but I see now that it wasn't actually added as a dependency until March of this year. |
Thanks for the discussion! I was also leaning towards fsspec as that seems to be slightly more popular in PyTorch ecosystem and would be easier refactor as mentioned. Now that it seems there is consensus I'm happy to make a CL. |
**Changes made:** - Add fs_utils.py for API that offers 1:1 replacement for current filesystem usage. - Update usages of open, glob, makedirs, normpath to use new API. - Update test/datasets/* to use in-memory filesystem, simplifying test and testing fsspec API.
@rusty1s & @akihironitta: I had no experience with fsspec before so I needed to wrap my head around it a bit and run into some practical cases. The PR I've uploaded now is enough details of the API that I'm thinking of and some refactored code and tests for that. Any test previously using the get_dataset test fixture will not be working anymore as it now expects the test to write into the in-memory filesystem. So please take a review of the API and changes to see if we can get maintainer approval to continue the refactor. #8379 |
🛠 Proposed Refactor
I propose that we replace usage of core python I/O e.g., open, exists, is_file, makedirs, with equivalent filesystem abstraction that allows dispatch to appropriate filesystem backends. This would allow users of your library to easily work with remote filesystems as they work with local filesystems.
E.g.,
Suggest a potential alternative/fix
Either of two popular libraries can be used, fsspec or epath could be used to replace the existing calls to built-in functions that assume local filesystem.
I am happy to make a PR for this if I know the preference of filesystem abstraction library and any other concerns.
The text was updated successfully, but these errors were encountered: