Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IWSLT2017 Download Issue in torchtext v0.11.0 #1620

Closed
A-Pandey20 opened this issue Feb 18, 2022 · 11 comments
Closed

IWSLT2017 Download Issue in torchtext v0.11.0 #1620

A-Pandey20 opened this issue Feb 18, 2022 · 11 comments

Comments

@A-Pandey20
Copy link

Downloaling IWSLT2017 raises Internal Error. This is the error message:

RuntimeError: Internal error: confirm_token was not found in Google drive link.

I tried to use both the function and module version, but both raise the same error:

train_iter, valid_iter, test_iter = torchtext.datasets.IWSLT2017()
train_iter, valid_iter, test_iter = torchtext.datasets.iwslt2017.IWSLT2017()

Please have a look and fix it as soon as possible.

@erip
Copy link
Contributor

erip commented Feb 18, 2022

Duplicate of #1359.

@austinvhuang
Copy link

I'm running into the same issue with IWSLT2016 as well with torchtext 0.11.2. I wasn't having an issue when I ran the code a few weeks ago, it's only in the last few days that I've noticed this.

@A-Pandey20
Copy link
Author

Yes, same with me. Neither one of the IWSLTs are working. Even for me they worked in the last week. Must be some cloud issue where the data is stored.

@erip
Copy link
Contributor

erip commented Feb 21, 2022

I think it is basically caused by the torchtext code being somewhat flimsy wrt expected responses from gdrive. There's a good chance you're getting throttled by upstream and this is being misreported. We've recently migrated away from homegrown code for GDrive handling to the more robust torchdata approach, but that doesn't help for users stuck on 0.11.x...

@A-Pandey20
Copy link
Author

Ohh yes, makes sense. This might be the actual problem..

@austinvhuang
Copy link

@erip any idea if the throttling is happening across all downloads or somehow specific to a client? IWSLT2016 data loading hasn't worked for the last two weeks for me now - wondering if gdrive has just cut off access from paperspace hosts at this point.

Any resolution suggestions? Should we be using the master branch instead of the 0.11.2 release?

@erip
Copy link
Contributor

erip commented Feb 27, 2022

@austinvhuang I suspect it's IP based, but that is a total guess. You could definitely try the main branch, but that's not going to change gdrive server responses, just what your client reports for diagnostics.

@austinvhuang
Copy link

@erip I seem to be able to download datasets manually following the URL links in here https://github.com/pytorch/text/blob/c31a400990513d180fdbd4ae5d597781b6b7063a/test/asset/raw_datasets.jsonl so it seems like this is not necessarily google drive throttling downloads by IP but only a lack of this confirm_token response.

I'm not sure what confirm_token is for, but could it just be ignored as @parmeet suggests here?#1359

@NivekT
Copy link
Contributor

NivekT commented Mar 17, 2022

If I understand the issue correctly, updating to version 0.12.0 should resolve the issue (though that requires installing torchdata). See this comment.

Let me know if it doesn't work with 0.12.0.

@Nayef211
Copy link
Contributor

@austinvhuang were you able to verify if updating to version 0.12.0 resolved your issue?

@austinvhuang
Copy link

@Nayef211 yes looks like it's resolved now, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants