-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pip_import builds source packages on workspace load #96
Comments
Something like the below repository rule that takes a single package name and version, downloads the wheel and calls whltool.par with the whl file. Users can provide their own requirements.bzl file with the requirement dict. def _pip_repository_impl(rctx):
if not rctx.which("python"):
fail("python not found")
if not rctx.which("pip"):
fail("pip not found")
exec_result = rctx.execute(["pip", "wheel",
"%s==%s" % (rctx.attr.package, rctx.attr.version),
"--no-deps",
] + rctx.attr.wheel_args,
environment = rctx.attr.wheel_env)
if exec_result.return_code:
fail("Failed to obtain package wheel: \n%s\n%s" % (exec_result.stdout, exec_result.stderr))
if exec_result.stderr:
print(exec_result.stderr)
exec_result = rctx.execute(["find", ".", "-type", "f", "-name", "*.whl"])
if exec_result.return_code:
fail("Could not find package wheel: \n%s" % exec_result.stderr)
whl_file = exec_result.stdout.rstrip()
args = [
"python",
rctx.path(rctx.attr._script),
"--whl", whl_file,
"--requirements", rctx.attr.requirements,
]
if rctx.attr.extras:
args += [
"--extras=%s" % extra
for extra in rctx.attr.extras
]
result = rctx.execute(args)
if result.return_code:
fail("whl_library failed: %s (%s)" % (result.stdout, result.stderr))
rctx.execute(["rm", whl_file])
return
pip_repository = repository_rule(
attrs = {
"package": attr.string(mandatory = True),
"version": attr.string(mandatory = True),
"requirements": attr.string(mandatory=True),
"extras": attr.string_list(),
"wheel_args": attr.string_list(),
"wheel_env": attr.string_dict(),
"_script": attr.label(
executable = True,
default = Label("@io_bazel_rules_python//tools:whltool.par"),
cfg = "host",
),
},
local = False,
implementation = _pip_repository_impl,
) |
@siddharthab That is something er are quite interested as well. Are you using that right now? If so do you have en example of how it is being used? |
Just looking further into it, imo it loading all requirements is not so much the issue (though I agree it would be nice to have a separate repository for each requirement), the main problem seems to be the need for the But maybe a half-way solution does not make too much sense and might as well go all the way. Or I wonder if there was a way to generate multiple external repositories from one repository_rule similar to how it is done currently but without actually downloading/building the deps and then one would have to depend on the external repository directly, such as |
Yes. We have been using the scheme as I mentioned above. It has been working fine for us. |
@siddharthab But then you have to specify pip dependencies as well as their sub-dependencies by hand, right? I just made a quick-and-dirty hack based on the existing rules_python that takes an Here is that code: def _pip_requirements_repository_impl(rctx):
if not rctx.which("python"):
fail("python not found")
if not rctx.which("pip"):
fail("pip not found")
rctx.file("BUILD", "")
req = '''
def requirement(name):
name_key = name.replace("-", "_").lower()
return "@%s//" + name_key + ":pkg"
''' % (rctx.attr.name)
rctx.file("requirements.bzl", req)
exec_result = rctx.execute(["pip", "wheel",
"-r%s" % rctx.path(rctx.attr.requirements),
] + rctx.attr.wheel_args,
environment = rctx.attr.wheel_env)
if exec_result.return_code:
fail("Failed to obtain package wheel: \n%s\n%s" % (exec_result.stdout, exec_result.stderr))
if exec_result.stderr:
print(exec_result.stderr)
exec_result = rctx.execute(["find", ".", "-type", "f", "-name", "*.whl"])
if exec_result.return_code:
fail("Could not find package wheel: \n%s" % exec_result.stderr)
whl_files = exec_result.stdout.rstrip().split("\n")
for whl_file in whl_files:
whl_dir = str(whl_file).lower().split("-")[0]
result = rctx.execute(["mkdir", whl_dir])
if result.return_code:
fail("mkdir failed: %s (%s)" % (result.stdout, result.stderr))
args = [
"python",
rctx.path(rctx.attr._script),
"--whl", whl_file,
"--requirements", "@%s//:requirements.bzl" % rctx.attr.name,
"--directory", whl_dir
]
if rctx.attr.extras:
args += [
"--extras=%s" % extra
for extra in rctx.attr.extras
]
result = rctx.execute(args)
if result.return_code:
fail("whl_library failed: %s (%s)" % (result.stdout, result.stderr))
rctx.execute(["rm", whl_file])
return
pip_requirements_repository = repository_rule(
attrs = {
"requirements": attr.label(
allow_files = True,
mandatory = True,
single_file = True,
),
"extras": attr.string_list(),
"wheel_args": attr.string_list(),
"wheel_env": attr.string_dict(),
"_script": attr.label(
executable = True,
default = Label("@io_bazel_rules_python//tools:whltool.par"),
cfg = "host",
),
},
local = False,
implementation = _pip_requirements_repository_impl,
) Usage: # WORKSPACE file
pip_repository(
name="py_deps",
requirements = "//:requirements.txt",
) # BUILD file
load("@py_deps//:requirements.bzl", "requirement")
py_library(
name = "lib",
srcs = ["lib.py"],
deps = [
requirement("mock"),
],
) So the usage stays exactly the same - has not been extensively tested. |
Yes, I think specifying every dependency manually is the bazel way. Without it, the versions of all your dependencies is not fixed for every user. And a person running the repository rule at a later date may get different versions for some of the unstated dependencies. To list all the dependencies, I created a small script for my users that installs the package in virtualenv and gets a requirements.txt file out of the virtualenv. It's not pretty, but that's what I could think of. |
You are right, it is also what I see as the bazel way, but interestingly also rules_nodejs has gone a different way by supporting package.json / lock files rather than specifying each dependency individually. Either way, I am interested in the pip_repository approach you created. Can you by any chance share the script you put together to get the requirements.txt out of the virtualenv? Would be very useful. |
#!/bin/bash
# Utility script to generate a requirements file for a transitive closure of
# dependencies of a package.
# The arguments to this script are arguments to pipenv install.
set -euo pipefail
readonly tmp_dir="$(mktemp -d)"
function cleanup {
rm -rf "${tmp_dir}"
}
trap cleanup INT HUP QUIT TERM EXIT
cd "${tmp_dir}"
if ! command -v pipenv; then
echo "pipenv not found; you can install by running:"
echo " pip install --user pipenv > /dev/null"
fi
pipenv install "$@" > /dev/null
echo ""
echo "Your package requirements are:"
pipenv run pip freeze |
Awesome thanks for sharing @siddharthab |
rules_python 0.1.0 has been released which upstreams the rules_python_external repo. Please switch from |
@alexeagle could you please clarify what was fixed on version 0.1.0. We're using this version and it still downloads and installs each and every dependency from source |
Sorry you're totally right. I misunderstood the implementation when I wrote that, and forgot to update here. The correct resolution is |
pip_import
currently installs all the wheels in a single repository, the same repository as the requirements.bzl file. That implies loading the requirements.bzl file in WORKSPACE also builds source packages for which wheels are not available (e.g. pysam). This makes the initial load slow, and requires all users to have the right environment setup to build any source packages into wheels, even if they don't do any python in their work.It will be nice to have a "no-deps" mode where the requirements.txt file is expected to be the transitive closure (e.g. from
pip freeze
), and we have a separate repo for each package (relying on the--no-deps
argument to pip wheel), such that they are downloaded and potentially built on demand, rather than on first load.This also allows users to skip a dependency from pip altogether, and provide their own substitute by appending to the requirement dict.
The text was updated successfully, but these errors were encountered: