Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python import errors when loading multiple libraries under same namespace #145

Closed
scele opened this issue Sep 20, 2021 · 12 comments
Closed
Labels
bug Something isn't working resolved-next-release Code to resolve issue is pending release

Comments

@scele
Copy link

scele commented Sep 20, 2021

Issue Description

If two libraries share a common root import path, they cannot both be imported at the same time.

This seems to be because the python_proto_library puts the generated _pb2.py files under a deeply nested path:

bazel-bin/foo.runfiles/__main__/protos/__init__.py
bazel-bin/foo.runfiles/__main__/protos/foo_py_proto_pb/__init__.py
bazel-bin/foo.runfiles/__main__/protos/foo_py_proto_pb/protos/__init__.py
bazel-bin/foo.runfiles/__main__/protos/foo_py_proto_pb/protos/foo_pb2.py
bazel-bin/foo.runfiles/__main__/protos/bar_py_proto_pb/__init__.py
bazel-bin/foo.runfiles/__main__/protos/bar_py_proto_pb/protos/__init__.py
bazel-bin/foo.runfiles/__main__/protos/bar_py_proto_pb/protos/bar_pb2.py

python_proto_library then adds both __main__/protos/foo_py_proto_pb and __main__/protos/bar_py_proto_pb to PYTHONPATH.

However, if I try to do:

from protos import foo_pb2
from protos import bar_pb2

The first import works, because the protos module resolves to __main__/protos/foo_py_proto_pb which does contain foo_pb2.py. However the second import fails, because protos still resolves to __main__/protos/foo_py_proto_pb (instead of __main__/protos/bar_py_proto_pb) module which does not contain bar_pb2.py.

Expand below for a repro script:

#!/bin/bash
set -exo pipefail

rm -rf bazel_test
mkdir bazel_test
cd bazel_test
cat > WORKSPACE <<EOF
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "rules_proto_grpc",
   sha256 = "7954abbb6898830cd10ac9714fbcacf092299fda00ed2baf781172f545120419",
    strip_prefix = "rules_proto_grpc-3.1.1",
    urls = ["https://github.com/rules-proto-grpc/rules_proto_grpc/archive/3.1.1.tar.gz"],
)

load("@rules_proto_grpc//:repositories.bzl", "rules_proto_grpc_toolchains", "rules_proto_grpc_repos")
rules_proto_grpc_toolchains()
rules_proto_grpc_repos()

load("@rules_proto//proto:repositories.bzl", "rules_proto_dependencies", "rules_proto_toolchains")
rules_proto_dependencies()
rules_proto_toolchains()
EOF
cat > foo.py <<EOF
from protos import foo_pb2
from protos import bar_pb2
EOF
cat > BUILD <<EOF
py_test(
    name = "foo",
    srcs = [
        "foo.py",
    ],
    deps = [
        "//protos:foo_py_proto",
        "//protos:bar_py_proto",
    ],
)
EOF
mkdir protos
cat > protos/foo.proto <<EOF
syntax = "proto3";

package protos.foo;

message Foo {
  int32 field = 1;
}
EOF
cat > protos/bar.proto <<EOF
syntax = "proto3";

package protos.bar;

message Bar {
  int32 field = 1;
}
EOF

cat > protos/BUILD <<EOF
load("@rules_proto//proto:defs.bzl", "proto_library")
load("@rules_proto_grpc//python:python_proto_library.bzl", "python_proto_library")

proto_library(
    name = "foo_proto",
    srcs = ["foo.proto"],
)

python_proto_library(
    name = "foo_py_proto",
    protos = [":foo_proto"],
    visibility = ["//visibility:public"],
)

proto_library(
    name = "bar_proto",
    srcs = ["bar.proto"],
)

python_proto_library(
    name = "bar_py_proto",
    protos = [":bar_proto"],
    visibility = ["//visibility:public"],
)
EOF

bazel version
bazel clean
bazel test :foo --test_output=all

Log Output

No response

rules_proto_grpc Version

3.1.1

Bazel Version

4.2.1

OS

Ubuntu 18.04.3 LTS

Link to Demo Repo

No response

WORKSPACE Content

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "rules_proto_grpc",
   sha256 = "7954abbb6898830cd10ac9714fbcacf092299fda00ed2baf781172f545120419",
    strip_prefix = "rules_proto_grpc-3.1.1",
    urls = ["https://github.com/rules-proto-grpc/rules_proto_grpc/archive/3.1.1.tar.gz"],
)

load("@rules_proto_grpc//:repositories.bzl", "rules_proto_grpc_toolchains", "rules_proto_grpc_repos")
rules_proto_grpc_toolchains()
rules_proto_grpc_repos()

load("@rules_proto//proto:repositories.bzl", "rules_proto_dependencies", "rules_proto_toolchains")
rules_proto_dependencies()
rules_proto_toolchains()

BUILD Content

# BUILD 
py_test(
    name = "foo",
    srcs = [
        "foo.py",
    ],
    deps = [
        "//protos:foo_py_proto",
        "//protos:bar_py_proto",
    ],
)


# protos/BUILD 
load("@rules_proto//proto:defs.bzl", "proto_library")
load("@rules_proto_grpc//python:python_proto_library.bzl", "python_proto_library")

proto_library(
    name = "foo_proto",
    srcs = ["foo.proto"],
)

python_proto_library(
    name = "foo_py_proto",
    protos = [":foo_proto"],
    visibility = ["//visibility:public"],
)

proto_library(
    name = "bar_proto",
    srcs = ["bar.proto"],
)

python_proto_library(
    name = "bar_py_proto",
    protos = [":bar_proto"],
    visibility = ["//visibility:public"],
)

Proto Content

No response

Any Other Content

# foo.py
from protos import foo_pb2
from protos import bar_pb2
@scele scele added the bug Something isn't working label Sep 20, 2021
@scele
Copy link
Author

scele commented Sep 20, 2021

This may be the same issue as #124 which was closed due to inactivity.

@aaliddell
Copy link
Member

aaliddell commented Sep 20, 2021

Have you tried with legacy_create_init=False on the py_test?

@scele
Copy link
Author

scele commented Sep 20, 2021

Oh, that fixes it! That's great, thank you! @aaliddell is this a conscious feature of rules_proto_grpc that it requires use of legacy_create_init=False? If so, I guess this ticket can be closed.

Hmm.. Let me evaluate how hard it would be to set set --incompatible_default_to_explicit_init_py for our code base.

For future reference, I had initially been hacking things to work with a patch like this:

--- ./rules_proto_grpc/internal/compile.bzl       2021-04-06 02:50:34.000000000 +0300
+++ ./rules_proto_grpc/internal/compile.bzl     2021-09-20 19:09:13.569543935 +0300
@@ -454,13 +454,14 @@
         # Otherwise, if we only have output files, build the output tree by
         # aggregating files created by aspect into one directory

-        output_root = get_package_root(ctx) + "/" + ctx.label.name
+        output_root = get_package_root(ctx)

         for output_files_dict in output_files_dicts:
             for root, files in output_files_dict.items():
                 for file in files.to_list():
                     # Strip root from file path
                     path = strip_path_prefix(file.path, root)
+                    path = strip_path_prefix(path, ctx.label.package)

                     # Prepend prefix path if given
                     if prefix_path:
@@ -470,7 +471,7 @@  
                     final_output_files_list.append(copy_file(
                         ctx,
                         file,
-                        "{}/{}".format(ctx.label.name, path),
+                        path,
                     ))

         final_output_files[output_root] = depset(direct = final_output_files_list)

@aaliddell
Copy link
Member

No problem, thanks for the detailed report, it helps track these things quickly 👍

We don't require it to be set either way in general, but for cases like yours where paths overlap it has to be disabled for modules to import as you expect. It's a rather frustrating behaviour of Bazel that catches a lot of people out when using Python, myself included. Why it's still the default I do not know...there's a bug report open to flip it that has never been applied. Personally I always set it false for all Python targets or flip that flag.

I think I'll add something to the docs to comment about this as a resolution, since it has come up a couple of times.

@aaliddell aaliddell added the resolved-next-release Code to resolve issue is pending release label Sep 20, 2021
@scele
Copy link
Author

scele commented Sep 20, 2021

Hmm, based on a quick try, looks like there are some corner cases where legacy_create_init=False is not effective. For example, I have something like this:

py_binary(
    name = "foo",
    legacy_create_init = False,
)
some_rule(
    name = "bar",
    dep = ":foo",
)

def _some_rule_impl(ctx):
    runfiles = ctx.runfiles().merge(ctx.attr.dep[DefaultInfo].default_runfiles)
    return [
        DefaultInfo(runfiles = runfiles, executable = ...),
    ]

Here it seems that the effect of legacy_create_init = False is not propagated to ctx.attr.dep[DefaultInfo].default_runfiles, so that bar.runfiles generated by some_rule still have autogenerated __init__.py files - seems like this could be a bazel bug. They only vanish if I set --incompatible_default_to_explicit_init_py - and flipping that flag seems like a big effort for big existing code bases like ours.

Given that --incompatible_default_to_explicit_init_py is still not enabled by default in bazel, it would seem pretty convenient if rules_proto_grpc could also keep support for the legacy behavior. I assume there are good reasons for using deep nesting for the generated files, and it's not just some old inherited behavior? I mean, if we could flatten the file structure (for all languages or just for python), then it seems we could automatically also support the legacy init behavior?

@aaliddell
Copy link
Member

Interesting, at first glance that looks like a Bazel bug. Could be related to bazelbuild/bazel#12238 which is another case of __init__.py wierdness with these flags that's just been pointed out to me. I've prodded to see what's happening with flipping the default too for Bazel 5.

flipping that flag seems like a big effort for big existing code bases like ours.

A migration strategy would be to flip the flag at the same time as setting legacy_create_init = True on all the py_binary and py_test targets, then slowly move over each target to False. But I appreciate that's still painful and likely to take a while.

I assume there are good reasons for using deep nesting for the generated files, and it's not just some old inherited behavior

This is done to prevent different targets attempting to write to the same file, which Bazel disallows. For example, if your foo_py_proto and bar_py_proto both included foo.proto they would both attempt to write to the file foo_pb2.py if the target name was not used to disambiguate. Also, some protoc plugins require us to use an output directory due to not being deterministic.

So in general, I would prefer to keep the targets outputting to separate trees since it's more painful not to. However, it should be possible to have a subsequent rule that merges these trees into a single one. There's a few ways this could be done:

  • Use a single python_proto_library:

    python_proto_library(
      name = "py_proto",
      protos = [":foo_proto", ":bar_proto"],
      visibility = ["//visibility:public"],
    )

    This will get the result you need since it'll output to one root, but may not be suitable in your situation.

  • Use python_proto_compile instread, merge the result and then produce a py_library:

    python_proto_compile(
      name = "foo_py_proto",
      protos = [":foo_proto"],
      visibility = ["//visibility:public"],
    )
    
    python_proto_compile(
      name = "bar_py_proto",
      protos = [":bar_proto"],
      visibility = ["//visibility:public"],
    )
    
    merge_trees(
      name = "py_proto_merged",
      deps = [":foo_py_proto", ":bar_py_proto"],
    )
    
    py_library(
      name = "py_proto",
      srcs = ":py_proto_merged",
      deps = [<TBD>],
    )

    Where merge_trees here uses the ProtoCompileInfo provider to find the files and roots, make a new merged tree and emit a new DefaultInfo provider that you can use as srcs to py_library. This is generic enough that it'd probably be useful to have in a utils directory in this repo perhaps.

  • Merge the Python libraries:

    <content from your protos/BUILD at top>
    
    merge_python_libraries(
      name = "py_proto",
      deps = [":foo_py_proto", ":foo_py_proto"],
    )

    This one might not work as expected due to the presence of the protobuf deps. This would use the PyInfo provider to build a merged tree and emit a new PyInfo provider.

@scele
Copy link
Author

scele commented Sep 20, 2021

Awesome, thank you for these detailed pointers @aaliddell - I'll give them a shot!

@aaliddell
Copy link
Member

aaliddell commented Sep 20, 2021

Option 4 would be to add an attr to the compile/library rules that would make the files be generated without the target name directory. This would be a 'sharp edge' tool though, since you'll get Bazel errors about conflicting writes if any of the targets overlap. Technically you may be able to use this to wtite the generated files back to the source tree for check-in, but that's somewhat working around the Bazel sandbox. e.g:

write_mode = 'PREFIXED' -> Write to directory named by target
write_mode = 'NO_PREFIX' -> Write direct to directory, as you want above
write_mode = 'SOURCE' -> Write to the source directory, such that the .py files end up next to the .proto files and can be checked in.

@aaliddell
Copy link
Member

Version 4 has an output_mode attr that'll let you write without the prefix if you set output_mode = "NO_PREFIX", as above

@CareF
Copy link

CareF commented Nov 4, 2022

Getting into the same problem for cases when grpc proto imports another proto file.
See an example:
https://github.com/CareF/py_grpc_test

bazel run //test:test_grpc results ImportError: cannot import name 'info_pb2' from 'proto'

Both output_mode = "NO_PREFIX" and legacy_create_init=False can solve the problem, so is gobal flag --incompatible_default_to_explicit_init_py

Is there any suggestion on what would be the recommended practice?


So for this issue, there two some very confusing and different-looking expected behaviors:

  • A py_binary bazel target can fail with unable to find some python module when only an additional python_grpc_library is added to its direct or indirect dependency
  • A python_grpc_library may not be successfully imported when its src proto imports another proto file that shares the same namespace

@aaliddell
Copy link
Member

Using legacy_create_init = False or the corresponding CLI flag is generally recommended for all Python targets, for exactly this reason that namespace packages get broken. You can put the flag in your bazelrc file to avoid having to enter it every time. At some point hopefully Bazel will flip the default.

@CareF
Copy link

CareF commented Nov 9, 2022

Using legacy_create_init = False or the corresponding CLI flag is generally recommended for all Python targets, for exactly this reason that namespace packages get broken. You can put the flag in your bazelrc file to avoid having to enter it every time. At some point hopefully Bazel will flip the default.

There is issue with subpar's par_binary for putting it in bazelrc, which seems to due to subpar being developed for older python version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working resolved-next-release Code to resolve issue is pending release
Projects
None yet
Development

No branches or pull requests

3 participants