Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow job in Bazel CI is red but only in Ubuntu 14.04 #4928

Closed
lfpino opened this issue Mar 28, 2018 · 15 comments
Closed

TensorFlow job in Bazel CI is red but only in Ubuntu 14.04 #4928

lfpino opened this issue Mar 28, 2018 · 15 comments
Assignees
Labels
breakage P1 I'll work on this now. (Assignee required)

Comments

@lfpino
Copy link
Contributor

lfpino commented Mar 28, 2018

The TensorFlow job is failing with:

ERROR: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/817efe17b83eb78b343080fc9971c47d/external/local_config_python/BUILD:166:1: Couldn't build file external/local_config_python/numpy_include/numpy/__multiarray_api.h: Executing genrule @local_config_python//:numpy_include failed: Output bazel-out/k8-opt/genfiles/external/local_config_python/numpy_include/numpy/__multiarray_api.h is a symbolic link. Only regular files and directories may be uploaded to a remote cache. Change the file type or add the "no-cache" tag/execution requirement.

Looks like a remote execution issue, assigning to @buchgr to diagnose.

Full log in: https://buildkite.com/bazel/bazel-with-downstream-projects-bazel/builds/179#415760a7-1156-4b75-897e-c2ed29db585a

@lfpino lfpino added P1 I'll work on this now. (Assignee required) breakage labels Mar 28, 2018
@buchgr
Copy link
Contributor

buchgr commented Mar 28, 2018

This is due to a recent change to remote caching, where we no longer can cache actions that generate symlinks as outputs. I ll take a look.

@buchgr
Copy link
Contributor

buchgr commented Mar 28, 2018

cc @meteorcloudy

@meteorcloudy
Copy link
Member

Since it's only failing since Wednesday, it should be caused by some change between (b8765a6, 73088a8]

@meteorcloudy
Copy link
Member

This is a regression, so it will be a release blocker for 0.13.0, but not for 0.12.0 because it's cut before the breaking change.

@lfpino
Copy link
Contributor Author

lfpino commented Mar 29, 2018

TF is back to green today (https://buildkite.com/bazel/bazel-with-downstream-projects-bazel/builds/180) but we still need to investigate what's going on here.

@vladmos
Copy link
Member

vladmos commented Apr 6, 2018

@jin
Copy link
Member

jin commented Apr 13, 2018

TF job is red for both 14.04 and 16.04 now, with a different error.

ERROR: /var/lib/buildkite-agent/builds/buildkite-ubuntu1604-xddx-1/bazel-downstream-projects/tensorflow/tensorflow/tools/api/generator/BUILD:27:1: Couldn't build file tensorflow/tools/api/generator/api/__init__.py: Executing genrule //tensorflow/tools/api/generator:python_api_gen failed (Exit 1)
--
  | Traceback (most recent call last):
  | File "/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/6b4c5d5813aa5928a7e6e5dae494f9c5/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/tools/api/generator/create_python_api.runfiles/org_tensorflow/tensorflow/tools/api/generator/create_python_api.py", line 26, in <module>
  | from tensorflow.python.util import tf_decorator
  | File "/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/6b4c5d5813aa5928a7e6e5dae494f9c5/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/tools/api/generator/create_python_api.runfiles/org_tensorflow/tensorflow/python/__init__.py", line 63, in <module>
  | from tensorflow.python.framework.framework_lib import *  # pylint: disable=redefined-builtin
  | File "/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/6b4c5d5813aa5928a7e6e5dae494f9c5/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/tools/api/generator/create_python_api.runfiles/org_tensorflow/tensorflow/python/framework/framework_lib.py", line 104, in <module>
  | from tensorflow.python.framework.importer import import_graph_def
  | File "/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/6b4c5d5813aa5928a7e6e5dae494f9c5/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/tools/api/generator/create_python_api.runfiles/org_tensorflow/tensorflow/python/framework/importer.py", line 32, in <module>
  | from tensorflow.python.framework import function
  | File "/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/6b4c5d5813aa5928a7e6e5dae494f9c5/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/tools/api/generator/create_python_api.runfiles/org_tensorflow/tensorflow/python/framework/function.py", line 37, in <module>
  | from tensorflow.python.ops import variable_scope as vs
  | File "/var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/6b4c5d5813aa5928a7e6e5dae494f9c5/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/tools/api/generator/create_python_api.runfiles/org_tensorflow/tensorflow/python/ops/variable_scope.py", line 24, in <module>
  | import enum  # pylint: disable=g-bad-import-order
  | ImportError: No module named enum

14.04: https://buildkite.com/bazel/bazel-with-downstream-projects-bazel/builds/197#32044fa3-fa5f-49ac-b157-6e59b8e8df04
16.04: https://buildkite.com/bazel/bazel-with-downstream-projects-bazel/builds/197#7aaa1cc0-0dd9-4a42-baf1-6479e8cc93d4

@jin
Copy link
Member

jin commented Apr 13, 2018

TF_serving has updated their TF commit hash, but now they're running into the no-cache issue:

ERROR: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/83a97a49514b3c6bb07710cea33f38f9/external/local_config_python/BUILD:166:1: Couldn't build file external/local_config_python/numpy_include/numpy/__multiarray_api.h: Executing genrule @local_config_python//:numpy_include failed: Output bazel-out/k8-fastbuild/genfiles/external/local_config_python/numpy_include/numpy/__multiarray_api.h is a symbolic link. Only regular files and directories may be uploaded to a remote cache. Change the file type or add the "no-cache" tag/execution requirement.
--
  | ERROR: /var/lib/buildkite-agent/.cache/bazel/_bazel_buildkite-agent/83a97a49514b3c6bb07710cea33f38f9/external/local_config_python/BUILD:39:1: Couldn't build file external/local_config_python/python_include/Python-ast.h: Executing genrule @local_config_python//:python_include failed: Output bazel-out/k8-fastbuild/genfiles/external/local_config_python/python_include/Python-ast.h is a symbolic link. Only regular files and directories may be uploaded to a remote cache. Change the file type or add the "no-cache" tag/execution requirement.

14.04: https://buildkite.com/bazel/bazel-with-downstream-projects-bazel/builds/197#5a04d42a-f730-4670-9852-7e763d9ef12e
16.04: https://buildkite.com/bazel/bazel-with-downstream-projects-bazel/builds/197#5c639bd6-1df7-4f42-816b-638a208f0d41

@meteorcloudy
Copy link
Member

Ping @buchgr

@meteorcloudy
Copy link
Member

Any progress on this? @buchgr
Are we making the "no cache" issue for symlink a warning instead of an error?

@buchgr
Copy link
Contributor

buchgr commented Apr 19, 2018

I have rolled back fa36d2f48965b127e8fd397348d16e991135bfb6 yesterday. Can you cherry pick this commit into the release?

@meteorcloudy
Copy link
Member

Thanks, I'll cherry-pick it!

@arthur309
Copy link

@jin Is there any way to solve the build problem of "python_api_gen failed" on 14.04?

@jin
Copy link
Member

jin commented Apr 28, 2018

@arthur309 sorry, I don't know what you're referring to. Is that error related to the remote execution issue in this thread?

@meteorcloudy
Copy link
Member

This issue has been resolved by fa36d2f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breakage P1 I'll work on this now. (Assignee required)
Projects
None yet
Development

No branches or pull requests

6 participants