Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Bug report #656

Closed
withsmilo opened this issue Mar 25, 2019 · 4 comments
Closed

[CI] Bug report #656

withsmilo opened this issue Mar 25, 2019 · 4 comments

Comments

@withsmilo
Copy link
Collaborator

withsmilo commented Mar 25, 2019

@simon-mo , I listed up the CI bugs here.

  1. https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1818/ -> PR [CI] Decrease test samples for test_stop_models case #657 -> merged
[integration_py2_admin_unit_test] 19-03-25:03:04:31 INFO     [clipper_admin.py:697] [admin-test-cluster-297] Successfully registered model jpj:iv 
 [integration_py2_admin_unit_test] 19-03-25:03:04:31 INFO     [clipper_admin.py:615] [admin-test-cluster-297] Done deploying model jpj:iv. 
 [integration_py2_admin_unit_test] 19-03-25:03:04:31 INFO     [docker_container_manager.py:353] [admin-test-cluster-297] Found 0 replicas for johnbohnam:i. Adding 1 
 [integration_py2_admin_unit_test] 19-03-25:03:04:49 INFO     [clipper_admin.py:697] [admin-test-cluster-297] Successfully registered model johnbohnam:i 
 [integration_py2_admin_unit_test] 19-03-25:03:04:49 INFO     [clipper_admin.py:615] [admin-test-cluster-297] Done deploying model johnbohnam:i. 
 [integration_py2_admin_unit_test] 19-03-25:03:04:49 INFO     [docker_container_manager.py:353] [admin-test-cluster-297] Found 0 replicas for johnbohnam:ii. Adding 1
[integration_py2_admin_unit_test] ERROR 
 [integration_py2_admin_unit_test] 19-03-25:03:05:50 INFO     [test_utils.py:75] Creating DockerContainerManager 
 [integration_py2_admin_unit_test] 19-03-25:03:05:50 INFO     [test_utils.py:80] Cleaning up Docker cluster admin-test-cluster-297 
  1. https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1824/ -> PR [CI] Add retry-routine to 'docker push' in clipper_docker.cfg.py #655 -> merged
    This error occurred because docker push failed.
 [publish_tf36-container] Head https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/dd/dde37b0b47901bfc6be0c2857a4e9a757175ad82246ed224bd405a02f7f0e020/data?verify=1553522760-379DR9XVMPr%2FkUMzi9C2M5iBO6I%3D: net/http: TLS handshake timeout 
CI_build.Makefile:626: recipe for target 'publish_tf36-container' failed
make: *** [publish_tf36-container] Error 1
make: *** Waiting for unfinished jobs....
  1. https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1827/ -> PR [CI] Increase sleep time from 500us to 1000us in timers_test to avoid unintended errors #661 -> merged
 [unittest_libclipper] [ RUN      ] TimerSystemTests.SingleTimerExpire 
 [unittest_libclipper] /clipper/src/libclipper/test/timers_test.cpp:29: Failure 
 [unittest_libclipper] Value of: timer_future.isReady() 
 [unittest_libclipper] Actual: false 
 [unittest_libclipper] Expected: true 
 [unittest_libclipper] Uh oh 
 [unittest_libclipper] [  FAILED  ] TimerSystemTests.SingleTimerExpire (1 ms) 

 [unittest_libclipper] [ RUN      ] TimerSystemTests.OutOfOrderTimerExpire 
 [unittest_libclipper] /clipper/src/libclipper/test/timers_test.cpp:61: Failure 
 [unittest_libclipper] Value of: t2.isReady() 
 [unittest_libclipper] Actual: false 
 [unittest_libclipper] Expected: true 
 [unittest_libclipper] [  FAILED  ] TimerSystemTests.OutOfOrderTimerExpire (0 ms) 
  1. https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1829/ -> PR Change the RPCServicePort constant to the global configuration #660 -> merged
    I found a collision between run_rpc_container_tests and run_frontend_tests. The RPCService of the TaskExecutor in run_frontend_tests connected with the RPCService of the RPCTestContainer in run_rpc_container_tests!
[unittest_frontend] [15:20:50.440][error]      [REDIS] Error with command "GET CURRENT_MODEL_VERSION:m": 
 [unittest_frontend] [15:20:50.441][error]      [REDIS] No versions found for model m 
 [unittest_frontend] [15:20:50.441][error] [QUERYFR...] Found model m with missing current version. 
 [unittest_frontend] [15:20:50.515][info]         [RPC] Found message to receive 
 [unittest_frontend] [15:20:50.515][info]         [RPC] Found message to receive 
 [unittest_frontend] [15:20:50.515][info]         [RPC] New container connected 
 [unittest_frontend] [15:20:50.515][info]         [RPC] Container added 
 [unittest_frontend] [15:20:50.516][info]       [REDIS] Successfully issued command "SELECT 3" 
 [unittest_frontend] [15:20:50.517][info]       [REDIS] MESSAGE: hset 
 [unittest_frontend] [15:20:50.517][info]       [REDIS] Successfully issued command "HMSET rpctest_py,1,0 model_id rpctest_py:1 model_name rpctest_py model_version 1 model_replica_id 0 zmq_connection_id 0 batch_size 1 input_type doubles" 
 [unittest_frontend] [15:20:50.517][info]  [THREADPOOL] Work queue created for model rpctest_py:1, replica 0 
 [unittest_frontend] [15:20:50.517][info]       [REDIS] Successfully issued command "SELECT 3" 
 [unittest_frontend] [15:20:50.517][info]       [REDIS] Successfully issued command "HGETALL rpctest_py,1,0" 
  1. https://amplab.cs.berkeley.edu/jenkins/job/Clipper-PRB/1830/ -> PR [CI] Remove the dependency about Debian jessie backport #658 -> merged
    The Debian jessie-backports might be deprecated.
 [pyspark-container] W: The repository 'http://http.debian.net/debian jessie-backports Release' does not have a Release file. 
 [pyspark-container] E: Failed to fetch http://http.debian.net/debian/dists/jessie-backports/main/binary-amd64/Packages  404  Not Found 
 [pyspark-container] E: Failed to fetch http://cdn-fastly.deb.debian.org/debian/dists/jessie-backports/main/binary-all/Packages  404  Not Found 
 [pyspark-container] E: Some index files failed to download. They have been ignored, or old ones used instead. 
  1. https://amplab.cs.berkeley.edu/jenkins/job/Clipper-PRB/1837 -> PR [CI] Update the apt package list before installing openjdk #659 -> merged
 [pyspark-container] Fetched 111 MB in 16s (6667 kB/s) 
 [pyspark-container] E: Failed to fetch http://security.debian.org/debian-security/pool/updates/main/t/tiff/libtiff5_4.0.8-2+deb9u2_amd64.deb  404  Not Found 
 [pyspark-container] E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? 
 [pyspark-container] The command '/bin/sh -c mkdir -p /usr/share/man/man1 &&     apt-get install openjdk-8-jre openjdk-8-jdk-headless -y' returned a non-zero code: 100 
CI_build.Makefile:646: recipe for target 'pyspark-container' failed
make: *** [pyspark-container] Error 100
make: *** Waiting for unfinished jobs....
@withsmilo
Copy link
Collaborator Author

All PRs are merged finally! Thank you, @simon-mo @RehanSD .

@withsmilo
Copy link
Collaborator Author

withsmilo commented Mar 28, 2019

  1. https://amplab.cs.berkeley.edu/jenkins/job/Clipper-PRB/1852 -> Retry when docker push fails #662 -> merged
    This error occurred because docker push failed.
[integration_py2_tensorflow] 19-03-28:09:51:58 INFO     [deployer_utils.py:41] Saving function to /tmp/tmpjm_LX8clipper 
 [integration_py2_tensorflow] 19-03-28:09:51:58 INFO     [deployer_utils.py:51] Serialized and supplied predict function 
 [integration_py2_tensorflow] 19-03-28:09:51:58 INFO     [tensorflow.py:196] TensorFlow model saved at: /tmp/tmpjm_LX8clipper/tfmodel/model.ckpt 
 [integration_py2_tensorflow] 19-03-28:09:51:58 INFO     [tensorflow.py:270] Using Python 2 base image 
 [integration_py2_tensorflow] 19-03-28:09:51:58 INFO     [clipper_admin.py:472] [tf-3268] Building model Docker image with model data from /tmp/tmpjm_LX8clipper 
 [integration_py2_tensorflow] 19-03-28:09:51:58 INFO     [clipper_admin.py:477] [tf-3268] Step 1/2 : FROM clippertesting/tf-container:440f2a24f5 
 [integration_py2_tensorflow] 19-03-28:09:51:58 INFO     [clipper_admin.py:477] [tf-3268]  ---> eab63c5e3116 
 [integration_py2_tensorflow] 19-03-28:09:51:58 INFO     [clipper_admin.py:477] [tf-3268] Step 2/2 : COPY /tmp/tmpjm_LX8clipper /model/ 
 [integration_py2_tensorflow] 19-03-28:09:51:58 INFO     [clipper_admin.py:477] [tf-3268]  ---> fd2068f92693 
 [integration_py2_tensorflow] 19-03-28:09:51:58 INFO     [clipper_admin.py:477] [tf-3268] Successfully built fd2068f92693 
 [integration_py2_tensorflow] 19-03-28:09:51:58 INFO     [clipper_admin.py:477] [tf-3268] Successfully tagged tf-3268-tensorflow-model:1 
 [integration_py2_tensorflow] 19-03-28:09:51:58 INFO     [clipper_admin.py:479] [tf-3268] Pushing model Docker image to tf-3268-tensorflow-model:1 
.....
 [integration_py2_tensorflow] 19-03-28:09:52:59 ERROR    [deploy_tensorflow_models.py:228] Exception 
 [integration_py2_tensorflow] Traceback (most recent call last): 
 [integration_py2_tensorflow] File "/clipper/integration-tests/deploy_tensorflow_models.py", line 188, in <module> 
 [integration_py2_tensorflow] clipper_conn, sess, version, "integers", link_model=True) 
 [integration_py2_tensorflow] File "/clipper/integration-tests/deploy_tensorflow_models.py", line 75, in deploy_and_test_model 
 [integration_py2_tensorflow] predict_fn, sess) 
 [integration_py2_tensorflow] File "/clipper/clipper_admin/clipper_admin/deployers/tensorflow.py", line 293, in deploy_tensorflow_model 
 [integration_py2_tensorflow] registry, num_replicas, batch_size, pkgs_to_install) 
 [integration_py2_tensorflow] File "/clipper/clipper_admin/clipper_admin/clipper_admin.py", line 355, in build_and_deploy_model 
 [integration_py2_tensorflow] container_registry, pkgs_to_install) 
 [integration_py2_tensorflow] File "/clipper/clipper_admin/clipper_admin/clipper_admin.py", line 480, in build_model 
 [integration_py2_tensorflow] for line in docker_client.images.push(repository=image, stream=True): 
 [integration_py2_tensorflow] File "/usr/local/lib/python2.7/dist-packages/docker/api/client.py", line 307, in _stream_helper 
 [integration_py2_tensorflow] data = reader.read(1) 
 [integration_py2_tensorflow] File "/usr/local/lib/python2.7/dist-packages/urllib3/response.py", line 401, in read 
 [integration_py2_tensorflow] raise IncompleteRead(self._fp_bytes_read, self.length_remaining) 
 [integration_py2_tensorflow] File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__ 
 [integration_py2_tensorflow] self.gen.throw(type, value, traceback) 
 [integration_py2_tensorflow] File "/usr/local/lib/python2.7/dist-packages/urllib3/response.py", line 307, in _error_catcher 
 [integration_py2_tensorflow] raise ReadTimeoutError(self._pool, None, 'Read timed out.') 
 [integration_py2_tensorflow] ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. 
 [integration_py2_tensorflow] 19-03-28:09:52:59 INFO     [test_utils.py:75] Creating DockerContainerManager 
 [integration_py2_tensorflow] 19-03-28:09:52:59 INFO     [test_utils.py:80] Cleaning up Docker cluster tf-3268 
.....
 [integration_py2_tensorflow] 19-03-28:09:53:03 INFO     [clipper_admin.py:1283] [tf-3268] Stopped all Clipper cluster and all model containers 
 [integration_py2_tensorflow] Cost , Accuracy 
 [integration_py2_tensorflow] [6659.0625, 0.63461536] 
 [integration_py2_tensorflow] Cost , Accuracy 
 [integration_py2_tensorflow] [0.0, 1.0] 
 [integration_py2_tensorflow] Cost , Accuracy 
 [integration_py2_tensorflow] [0.0, 1.0] 
 [integration_py2_tensorflow] Cost , Accuracy 
 [integration_py2_tensorflow] [0.0, 1.0] 
 [integration_py2_tensorflow] Cost , Accuracy 
 [integration_py2_tensorflow] [0.0, 1.0] 
 [integration_py2_tensorflow] Starting Trial 0 with timeout 2400.0 seconds 
 [integration_py2_tensorflow] Starting Trial 1 with timeout 2400.0 seconds 
 [integration_py2_tensorflow] All retry failed. 
CI_test.Makefile:134: recipe for target 'integration_py2_tensorflow' failed
make: *** [integration_py2_tensorflow] Error 1

@withsmilo
Copy link
Collaborator Author

withsmilo commented Mar 31, 2019

  1. https://amplab.cs.berkeley.edu/jenkins/job/Clipper-PRB/1870 -> Catch more errors when pushing a Docker image #663 -> merged
    When pushing a Docker image, raise ReadTimeoutError.
 [integration_py3_pyspark] Traceback (most recent call last): 
 [integration_py3_pyspark] File "/clipper/integration-tests/deploy_pyspark_models.py", line 171, in <module> 
 [integration_py3_pyspark] deploy_and_test_model(sc, clipper_conn, svm_model, version) 
 [integration_py3_pyspark] File "/clipper/integration-tests/deploy_pyspark_models.py", line 64, in deploy_and_test_model 
 [integration_py3_pyspark] predict_fn, model, sc) 
 [integration_py3_pyspark] File "/clipper/clipper_admin/clipper_admin/deployers/pyspark.py", line 264, in deploy_pyspark_model 
 [integration_py3_pyspark] registry, num_replicas, batch_size, pkgs_to_install) 
 [integration_py3_pyspark] File "/clipper/clipper_admin/clipper_admin/clipper_admin.py", line 411, in build_and_deploy_model 
 [integration_py3_pyspark] container_registry, pkgs_to_install) 
 [integration_py3_pyspark] File "/clipper/clipper_admin/clipper_admin/clipper_admin.py", line 541, in build_model 
 [integration_py3_pyspark] _push_model() 
 [integration_py3_pyspark] File "/clipper/clipper_admin/clipper_admin/decorators.py", line 31, in f_retry 
 [integration_py3_pyspark] return f(*args, **kwargs) 
 [integration_py3_pyspark] File "/clipper/clipper_admin/clipper_admin/clipper_admin.py", line 539, in _push_model 
 [integration_py3_pyspark] for line in docker_client.images.push(repository=image, stream=True): 
 [integration_py3_pyspark] File "/usr/local/lib/python3.5/dist-packages/docker/api/client.py", line 307, in _stream_helper 
 [integration_py3_pyspark] data = reader.read(1) 
 [integration_py3_pyspark] File "/usr/local/lib/python3.5/dist-packages/urllib3/response.py", line 459, in read 
 [integration_py3_pyspark] raise IncompleteRead(self._fp_bytes_read, self.length_remaining) 
 [integration_py3_pyspark] File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__ 
 [integration_py3_pyspark] self.gen.throw(type, value, traceback) 
 [integration_py3_pyspark] File "/usr/local/lib/python3.5/dist-packages/urllib3/response.py", line 365, in _error_catcher 
 [integration_py3_pyspark] raise ReadTimeoutError(self._pool, None, 'Read timed out.') 
 [integration_py3_pyspark] urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. 

@withsmilo
Copy link
Collaborator Author

withsmilo commented Apr 1, 2019

  1. https://amplab.cs.berkeley.edu/jenkins/job/Clipper-PRB/1874 -> Fix some bugs of PySpark integration test & Disable Spark UI #665
 [integration_py2_pysparkml] Traceback (most recent call last): 
 [integration_py2_pysparkml] File "/clipper/integration-tests/deploy_pyspark_sparkml_models.py", line 166, in <module> 
 [integration_py2_pysparkml] log_docker(clipper_conn) 
 [integration_py2_pysparkml] NameError: name 'clipper_conn' is not defined 
 [integration_py2_pysparkml] Unhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(Thread-1, stopped daemon 139646541420288)>> 
 [integration_py2_pysparkml] Traceback (most recent call last): 
 [integration_py2_pysparkml] File "/usr/lib/python2.7/threading.py", line 774, in __bootstrap 
 [integration_py2_pysparkml] self.__bootstrap_inner() 
 [integration_py2_pysparkml] File "/usr/lib/python2.7/threading.py", line 814, in __bootstrap_inner 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant