-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solve issue #282: Update FSSpecFileLister, IoPathFileLister #383
Conversation
Hi @ykabusalah! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
@NivekT one test failed for macos build but I see that it has failed on previous pull requests as well. Let me know if there is anything I need to do for this. Thanks. |
Don't worry about that one. I will have a look at the rest of your PR soon. Thanks! |
Awesome. Sounds good. Any chance you'd be able to take a look at it by tomorrow? |
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this patch.
yield fs.protocol + "://" + abs_path | ||
for root in self.datapipe: | ||
fs, path = fsspec.core.url_to_fs(root) | ||
is_local = fs.protocol == "file" or not root.startswith(fs.protocol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind solving #348 as well?
You can basically check if fs.protocol
is a list. If it is, we can run any(root.startswith(protocol) for protocol in fs.protocol)
continue | ||
|
||
# ensure the file name has the full fsspec protocol path | ||
if file_name.startswith(fs.protocol): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
if root.startswith(fs.protocol): | ||
yield fs.protocol + "://" + abs_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here might need to be changed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what way should this be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @ejguan is referring to his other comment here:
#383 (comment)
It will be great if you can solve #348 as well in this PR, and when you do, the logic here will have to change as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if root.startswith(fs.protocol): | |
yield fs.protocol + "://" + abs_path | |
if isinstance(fs.protocol, list): | |
for protocol in fs.protocol: | |
if root.startswith(protocol): | |
yield protocol + "://" + abs_path | |
break | |
elif root.startswith(fs.protocol): | |
yield fs.protocol + "://" + abs_path |
test/test_local_io.py
Outdated
def test_io_path_file_lister_iterdatapipe_with_list(self): | ||
datapipe = IoPathFileLister(root=[self.temp_sub_dir.name, self.temp_sub_dir_2.name]) | ||
|
||
# check all file paths within sub_folder are listed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: similar to my other comment about comparing sorted list instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting the PR! Just a few small comments.
This fixes issue #348 and updates Testing Logic Summary
Changes
|
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good to me in general. Two more comments below.
else: | ||
protocol_list = fs.protocol | ||
|
||
is_local = fs.protocol == "file" or not any(root.startswith(protocol) for protocol in fs.protocol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean protocol_list
, right?
… in a for loop to stop yielding values when one we wanted is found
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for working on this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing these issues! LGTM with two nit comments below.
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Fixes #282
Summary
Updated FSSpecFileLister and IoPathFileLister to optionally take a list of strings as root paths. As per the contributors guideline, added tests to the testsuite to cover changes and ran the pre-commit format checker.
Changes
Updated FSSpecFileLister to take either a str or sequence of strings as root path argument (in _init_ arguments)
Updated _init_ of FSSpecFileLister to check if root argument is string or list and make into IterDataPipe
Updated _iter_ of FSSPecFileLister to look through all root paths when yielding files
Added unit test in test_fsspec.py to test for multiple root path input to FSSpecFileLister
Updated IoPathFileLister to take either a str or sequence of strings as root path argument (in _init_ arguments)
Updated _init_ of IoPathFileLister to check if root argument is string or list and make into IterDataPipe
Updated _iter_ of IoPathFileLister to look through all root paths when yielding files
Added unit test in test_local_io.py to test for multiple root path input to IoPathFileLister