Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Flaky OnnxBackendRealModelTest.test_bvlc_alexnet_cpu - connection timeout #12049

Closed
marcoabreu opened this issue Aug 6, 2018 · 6 comments · Fixed by #12633
Closed

Flaky OnnxBackendRealModelTest.test_bvlc_alexnet_cpu - connection timeout #12049

marcoabreu opened this issue Aug 6, 2018 · 6 comments · Fixed by #12633

Comments

@marcoabreu
Copy link
Contributor

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1330/pipeline

________________ OnnxBackendRealModelTest.test_bvlc_alexnet_cpu ________________



test_self = <mxnet_backend_test.OnnxBackendRealModelTest testMethod=test_bvlc_alexnet_cpu>

device = 'CPU'



    def run(test_self, device):  # type: (Any, Text) -> None

        if model_test.model_dir is None:

>           model_dir = self._prepare_model_data(model_test)



/usr/local/lib/python3.5/dist-packages/onnx/backend/test/runner/__init__.py:239: 

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

/usr/local/lib/python3.5/dist-packages/onnx/backend/test/runner/__init__.py:182: in _prepare_model_data

    urlretrieve(model_test.url, download_file.name)

/usr/lib/python3.5/urllib/request.py:217: in urlretrieve

    block = fp.read(bs)

/usr/lib/python3.5/http/client.py:448: in read

    n = self.readinto(b)

/usr/lib/python3.5/http/client.py:488: in readinto

    n = self.fp.readinto(b)

/usr/lib/python3.5/socket.py:575: in readinto

    return self._sock.recv_into(b)

/usr/lib/python3.5/ssl.py:929: in recv_into

    return self.read(nbytes, buffer)

/usr/lib/python3.5/ssl.py:791: in read

    return self._sslobj.read(len, buffer)

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 



self = <ssl.SSLObject object at 0x7fc9b0117908>, len = 4490

buffer = <memory at 0x7fc9b26a3288>



    def read(self, len=1024, buffer=None):

        """Read up to 'len' bytes from the SSL object and return them.

    

            If 'buffer' is provided, read into this buffer and return the number of

            bytes read.

            """

        if buffer is not None:

>           v = self._sslobj.read(len, buffer)

E           ConnectionResetError: [Errno 104] Connection reset by peer



/usr/lib/python3.5/ssl.py:575: ConnectionResetError

----------------------------- Captured stdout call -----------------------------

Start downloading model bvlc_alexnet from https://s3.amazonaws.com/download.onnx/models/opset_7/bvlc_alexnet.tar.gz

Failed to prepare data for model bvlc_alexnet: [Errno 104] Connection reset by peer

============= 1 failed, 189 passed, 656 skipped in 500.47 seconds ==============
@lanking520
Copy link
Member

Hi @marcoabreu , which PR trigger this, I cannot find the corresponding code and this problem looks like an internet connection failure

@Roshrini
Copy link
Member

Roshrini commented Aug 7, 2018

@lanking520 This test is coming from ONNX test backend. So, model download code is in ONNX repo.
https://github.com/onnx/onnx/blob/master/onnx/backend/test/runner/__init__.py#L178
Maybe will need to make change to this code to fix it

@Roshrini
Copy link
Member

Roshrini commented Aug 7, 2018

In Ci, we are using ONNX 1.2.1 as currently we are supporting that version. Seems like retry mechanism was added in master branch, not available in 1.2.1
https://github.com/onnx/onnx/blob/rel-1.2.1/onnx/backend/test/runner/__init__.py#L177

@lupesko
Copy link
Contributor

lupesko commented Aug 9, 2018

@Roshrini is the retry logic in ONNX 1.2.2? should we update the MXNet master to use ONNX 1.2.2?

@Roshrini
Copy link
Member

Roshrini commented Aug 9, 2018

@lupesko To upgrade to ONNX 1.2.2, we might need to make changes to operators. Didn't check how many operator definitions/other things are changed but can try to see if it's easy to upgrade.

@Roshrini
Copy link
Member

Opened PR to upgrade ONNX version in CI. #12633

This issue will be fixed by that

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants