-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torchdata datapipes drop addition #725
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM. Thank you for adding this DataPipe. Could you please fix the lint Error?
For formatting, you can follow the instruction here: https://github.com/pytorch/data/blob/main/CONTRIBUTING.md#code-style. And, run pre-commit run --all-files
to execute auto formatting.
For mypy, you can execute mypy --config-file mypy.ini
to see the Error.
if isinstance(old_item, tuple): | ||
new_item = tuple(x for i, x in enumerate(old_item) if i not in self.indices) | ||
elif isinstance(old_item, list): | ||
new_item = [x for i, x in enumerate(old_item) if i not in self.indices] | ||
elif isinstance(old_item, dict): | ||
new_item = {k: v for (k, v) in old_item.items() if k not in self.indices} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say we have input datapipe that yields tuple of size 3. But, we provide 3 to indices
. Should we raise Error in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can raise a warning by doing something like this and running a quick check over the indices:
#filter indices all present check
try:
for i in self.indices:
old_item[i]
except (IndexError, KeyError):
warnings.warn(
"At least one index in the filter is not present in the item being returned,"
" please be aware that expected columns/keys may be missing."
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just one small step to add this DataPipe to the documentation.
Add Dropper
into docs/source/torchdata.datapipes.iter.rst
. I think the category "Selecting" is the most fitting, you can put it right before "Filter".
@dbish has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@dbish has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Please read through our contribution guide prior to
creating your pull request.
Fixes #656
Changes
part 1 of the feature requests to manipulate datapipe columns here: #656
this adds a drop functionality to iter datapipes