-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Downloading MNIST dataset with torchvision gives HTTP Error 403 #1938
Comments
Thanks for reporting! I can reproduce the issue locally, and downloading from the browser works. I don't yet know what the root cause is though. |
I think we might need to pass vision/torchvision/datasets/utils.py Lines 59 to 100 in c3e2b01
cc @cpuhrsch @vincentqb @zhangguanheng66 for awareness |
this is because the download links for mnist at https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py#L33-L36 are hosted on yann.lecun.com and that server has moved under CloudFlare protection. @fmassa we need to maybe mirror and change the URLs to maybe the PyTorch S3 bucket or something |
so could we make a hot-fix somehow? |
@Borda I haven't tried the current hotfix I mentioned, but I think it might be possible, would you be able to try it and send a PR? Otherwise I'll look into it early next week (I'm working towards ECCV deadline tomorrow) And I would rather avoid hosting the datasets ourselves, as this would give precedence on us storing the datasets. |
Is there any way to have a quick fix without using the master? |
@eduardo4jesus You can explicitly add headers as stated above, something alike: opener = urllib.request.URLopener()
opener.addheader('User-Agent', some_user_agent)
opener.retrieve(
url, fpath,
reporthook=gen_bar_updater()
) (line 81 and onwards in |
See pytorch/vision#1938 Signed-off-by: Nicolas V Castet <[email protected]>
@eduardo4jesus You could patch your model script at the top using:
It will use that user agent for the entire script assuming the opener does not get overwritten somewhere else. |
To make it work for python 2 as well:
|
so for python 3 I now use the following snipplet:
You would need to modify the |
I've just got the same problem. Waiting for the answer without changing codes... (ROOKIE ALERT) |
Clone this to your working dir: |
The problem ist that Yann LeCun’s side changed hoster if I got it right, and this one checks if the HTTP headers are set.
I currently work around with the following code:
from torchvision import datasets
import torchvision.transforms as transforms
import urllib
num_workers = 0
batch_size = 20
basepath = 'some/base/path'
transform = transforms.ToTensor()
def set_header_for(url, filename):
opener = urllib.request.URLopener()
opener.addheader('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')
opener.retrieve(
url, f'{basepath}/{filename}')
set_header_for('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', 'train-images-idx3-ubyte.gz')
set_header_for('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', 'train-labels-idx1-ubyte.gz')
set_header_for('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', 't10k-images-idx3-ubyte.gz')
set_header_for('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', 't10k-labels-idx1-ubyte.gz')
train_data = datasets.MNIST(root='data', train=True,
download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
download=False, transform=transform)
You need to change base path of course
… On 05.03.2020, at 05:26, Nikita Makarin ***@***.***> wrote:
I've the same issue when I'm trying to get datasets:
import torch
import torchvision
from torchvision import transforms, datasets
train = datasets.MNIST("", train=True, download=True,
transform=transforms.Compose([transforms.ToTensor()]))
test = datasets.MNIST("", train=False, download=True,
transform=transforms.Compose([transforms.ToTensor()]))
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#1938?email_source=notifications&email_token=AAN2AFNSOADTTTO6F3JRBLDRF4SZFA5CNFSM4LBCIY62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN3VCJQ#issuecomment-595022118>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAN2AFI4ZQEJJ2HEPJCBHP3RF4SZFANCNFSM4LBCIY6Q>.
|
@nvcastet, Thank you so much for the clarification. At that point I misunderstood that I would have to go into
|
This should have been fixed now, there is no need to update torchvision. All should be working as before, without any change on the user side. This was fixed on the server hosting the original dataset (thanks @soumith !). As such, I'm closing this issue but let us know if you still face this issue. |
Did anyone got |
Didn't work for me either |
Yep yesterday it was working for me but now is not, I dont know what happend, is there a solution for this? |
@BernardoOlisan @ChengguiSun the solution from @mlelarge works great, the following worked for me in a notebook !wget www.di.ens.fr/~lelarge/MNIST.tar.gz
!tar -zxvf MNIST.tar.gz
from torchvision.datasets import MNIST
from torchvision import transforms
mnist_train = MNIST('./', download=False,
transform=transforms.Compose([
transforms.ToTensor(),
]), train=True) |
Thank you ! @mlelarge and @alisterburt |
nope still got the exact same error |
@BernardoOlisan do you get the same error when trying to download from that link in a browser? Just tried again locally and all worked fine... |
You have to have a folder called torchvision.datasets.MNIST(root='./', download=False,
transform=transforms.Compose([
transforms.ToTensor(),
]), train=True) If you have your code torchvision.datasets.MNIST(root='./data', download=False,
transform=transforms.Compose([
transforms.ToTensor(),
]), train=True) then the folder has to be structured as ├── data
│ ├── MNIST
│ ├── processed
│ └── raw
└── Python script |
The solution proposed by @mlelarge works fine for me, but in my case I have:
then, I do:
and everything works well. |
This should be fixed (again) in the next torchvision nightly, and the fix will be present in the next minor release of torchvision, which should be out soon. See #3544 for more details |
Perfect, thank you @fmassa ! |
@alisterburt, thank you. I used that solution too. Sometimes it works, sometimes it doesn't. It works for me tonight. It's just very slow to download the dataset.
|
try this
|
Because |
@alisterburt I encountered the following problems after trying SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Download\GnuWin32-wget/etc/wgetrc
--2021-08-04 19:11:37-- http://www.di.ens.fr/~lelarge/MNIST.tar.gz
正在解析主机 www.di.ens.fr... 129.199.99.14
Connecting to www.di.ens.fr|129.199.99.14|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 302 Found
位置:https://www.di.ens.fr/~lelarge/MNIST.tar.gz [跟随至新的 URL]
--2021-08-04 19:11:37-- https://www.di.ens.fr/~lelarge/MNIST.tar.gz
Connecting to www.di.ens.fr|129.199.99.14|:443... 已连接。
无法建立 SSL 连接。
tar: Error opening archive: Failed to open 'MNIST.tar.gz'
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-2-48b8be9a8697> in <module>
8 transform=transforms.Compose([
9 transforms.ToTensor(),
---> 10 ]), train=True)
c:\users\peng\anaconda3\envs\pytorch_1.8\lib\site-packages\torchvision\datasets\mnist.py in __init__(self, root, train, transform, target_transform, download)
80
81 if not self._check_exists():
---> 82 raise RuntimeError('Dataset not found.' +
83 ' You can use download=True to download it')
84
RuntimeError: Dataset not found. You can use download=True to download it |
Have you tried to decompress the file first? |
No, I just try it in the ipython(anaconda/pytorch 1.8) in my windows. But I try it in my macbook successfully. I don't know why. |
@Borda - seems like this issue has resurfaced |
I try
and I get 403s
|
Still getting this error with torchvision==0.16.2. trainset = torchvision.datasets.MNIST(
"./downloads/mnist",
download=True,
train=True,
)
|
This is still happening intermittently when reading the files from < datasets_url = 'http://yann.lecun.com/exdb/mnist/'
> datasets_url = 'https://storage.googleapis.com/cvdf-datasets/mnist/' The same files are present there and in fact many places, for example: https://github.com/golbin/TensorFlow-MNIST/raw/master/mnist/data/ plus other places. I think it's a sad case of the original site being either overrun or broken. |
Still seeing this issue on Mac OS. None of the above mentioned methods worked. Code: train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, transform=transform) Fixed using this stackoverflow issue: |
Yes, finding it impossible to bypass the 403 error. I've tried using aforementioned custom headers, but no luck |
I can bypass the 403 error on colab, but hit a HTTPError 302, which as I understand means I am getting redirected but the redirect fails, which may be dues still to cookies not being set correctly
|
🐛 Bug
I'm getting a 403 error when I try to download MNIST dataset with
torchvision
0.4.2.To Reproduce
Environment
Additional context
https://app.circleci.com/jobs/github/PyTorchLightning/pytorch-lightning/6877
The text was updated successfully, but these errors were encountered: