-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching results of the filter
will result to inconsistent cache state
#1735
Comments
@VitalyFedyunin just wondering if this could be the cause of the test failures I'm seeing in my PR? #1732 (comment) |
Yes, @VitalyFedyunin already told me about this. One thing I am wondering is why it is causing failures in STSB and wikitext103 only? We do filtering in between on_disk_cache and end_caching for all the compressed datasets. Should we change to order in other datasets too? |
Yes please change all datasets. I think it fails when you try to |
This PR fixes the issue #1737. Seems like empty/non-existent files/path are creating the problem and it is not really necessary to put filter after end caching. In-fact this issue brought up the bugs in our mock testing for the failing datasets :). The reason we are keeping filter in-between is because we do not want to dump all the files to disk but only the one necessary to build the dataset. |
I just afraid that this |
Currently it blocks, but we just got lucky:
text/torchtext/datasets/stsb.py
Line 85 in caaa8e3
Please change the order of
filter
andend_caching
The text was updated successfully, but these errors were encountered: