-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding parameterized dataset pickling tests #1732
Conversation
@NivekT @ejguan I'm seeing some failures when trying to pickle the IWSLT datasets that look like the following. I believe
|
@Nayef211 The API |
Gotcha thanks for the heads up. I was only seeing these errors locally, but all the tests are passing after I installed the latest |
Thanks @Nayef211 for adding these tests. Actually I had something much simple in mind (pseudo code below without putting it in proper test suit). Let me know if this makes sense? from torchtext.datasets import DATASETS
for f in DATASETS.values():
dp = f()
test_pickable(dp)
def test_pickable(dp):
if type(dp)==tuple:
for dp_split in dp:
pickle.dump(dp, open("temp.pkl",w)) # ensure that no exception is raised here
else:
pickle.dump(dp, open("temp.pkl",w)) # ensure that no exception is raised here
|
Thanks for the suggestion here @parmeet. This implementation does look a lot simpler. I wonder if we want to test reloading the datapipe from the pickle file and verify that the order of the dataset makes sense? Or do we not think that's relevant to this test? |
Not sure if I understand what you meant by order?
Hmm, I don't know if this is required. If Pickle dump works, it should give back exact same object, I wonder if it is necessary to check the sanity of loading pickled datapipe? @ejguan, @NivekT do you have some suggestions here or do you think that just ensuring the Pickle works is sufficient for our use-case? |
Sorry I meant to say the contents of the datapipe which are tested using something as follows for sample, expected_sample in zip_equal(samples, expected_samples):
self.assertEqual(sample, expected_sample)
I guess if we do go with this approach, testing the loading is just one additional line and will allow us to verify that both the saving and loading mechanism are working as intended. But happy to hear suggestions from the torchdata folks. |
For the sake of simplicity I think I will go with the approach @parmeet suggested. Maybe we can follow up with a test on a single dataset that verifies the contents of the datapipes are correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @Nayef211 for adding this test :)
there seems to be some test failures for MNLI dataset? cc: @VirgileHlav |
I had a small bug in the implementation that happened when making the functions in MNLI global. Should be fixed now |
@NivekT @ejguan @VitalyFedyunin I'm seeing some test failures that look related to the recent changes made to |
Description
Testing
pytest test/datasets/common.py