Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to lock down dependencies for reproducibility? #140

Closed
dgrnbrg opened this issue Nov 29, 2018 · 19 comments
Closed

Ability to lock down dependencies for reproducibility? #140

dgrnbrg opened this issue Nov 29, 2018 · 19 comments
Labels
Can Close? Will close in 30 days if there is no new activity

Comments

@dgrnbrg
Copy link

dgrnbrg commented Nov 29, 2018

Hello, it seems that every time I have a clean build, these rules use pip to install the requirements.txt again. Because of this, it's possible for builds to get different dependency versions over time. Is there a way to check in the output of the pip import, so that I can ensure my build is completely repeatable? Ideally, this would support a model like https://github.com/johnynek/bazel-deps, in which I can run some command on my requirements.txt to generate .bzl files and other files (I'd be happy to check in the binaries or check in a virtual BUILD hierarchy that validates sha256sums of the deps). This repeatability is a key feature of bazel, and a worry for me using rules_python.

Is there a path to accomplish this? Would you accept a PR to add this?

@dgrnbrg
Copy link
Author

dgrnbrg commented Mar 8, 2019

Hello, I have dug into the code more, and I see that behind the scenes, this system produces the the exact @$PIP_IMPORT_NAME//:requirements.bzl that I'd like to commit to my outer repository. Currently, I can only seem to find this file by manually trawling through the bazel cache, since I'm not aware of any way to access artifacts from repository workspaces.

How should this be handled?

@aaliddell
Copy link
Contributor

aaliddell commented Mar 8, 2019

Two more options I can think of:

@dgrnbrg
Copy link
Author

dgrnbrg commented Mar 11, 2019

I just tried using the resolved files, which seem like what my company is looking for. However, when I run bazel sync, bazel seems to get into a bad state where it can't resolve dependencies, and it won't build anything again until I do bazel clean --expunge. The sync seems to fail with this error:

Collecting matplotlib==3.0.2 (from -r /home/dgrnbrg/3rdparty/py_research_requirements.txt (line 8))
 (  Could not find a version that satisfies the requirement matplotlib==3.0.2 (from -r /home/dgrnbrg/3rdparty/py_research_requirements.txt (line 8)) (from versions: 0.86, 0.86.1, 0.86.2, 0.91.0, 0.91.1, 1.0.1, 1.1.0, 1.1.1, 1.2.0, 1.2.1, 1.3.0, 1.3.1, 1.4.0, 1.4.1rc1, 1.4.1, 1.4.2, 1.4.3, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 2.0.0b1, 2.0.0b2, 2.0.0b3, 2.0.0b4, 2.0.0rc1, 2.0.0rc2, 2.0.0, 2.0.1, 2.0.2, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.2.0rc1, 2.2.0, 2.2.2, 2.2.3, 2.2.4)

Of course, matplotlib version 3.0.2 exists in real life, so I'm a bit confused. I think this might be related to the fact that I need https://github.com/darrengarvey/rules_python in order to have python 3 compatibility, which is critical for our use cases.

Do you have any suggestions or ideas on how to proceed?

@aaliddell
Copy link
Contributor

How are you running build and sync (i.e what args and env vars for each) and what's in your .bazelrc? IIRC, matplotlib 3.0.0 is not supported by python 2 and it's possible that when running sync it's falling back to using python 2, hence why you aren't seeing any of the 3.0.x versions.

@dgrnbrg
Copy link
Author

dgrnbrg commented Mar 11, 2019

Yes, I think that's the case. I use the afore-linked rules_python, which supports overriding the python version used for builds, so that we can have a working python 3 environment. My bazelrc includes:

build --python_path=/usr/bin/python3.6
test --python_path=/usr/bin/python3.6
run --python_path=/usr/bin/python3.6

build --action_env=BAZEL_PYTHON=/usr/bin/python3.6
test --action_env=BAZEL_PYTHON=/usr/bin/python3.6
run --action_env=BAZEL_PYTHON=/usr/bin/python3.6

I noticed that the max version of matplotlib discovered above is the max version supported by Python2, so I think you're correct in that sync is falling back to python2.

@aaliddell
Copy link
Contributor

Ah, so when you're running sync the BAZEL_PYTHON env var is perhaps not set, so it is falling back to python due to changes in that repo.

Eventually rules_python should have proper pip_import support for python 3, although various PRs have been proposing this for almost a year now: #158 #82.

Also, on another topic: in bazelrc, any args set for build are inherited by test and run, so you shouldn't need to have them written three times.

@dgrnbrg
Copy link
Author

dgrnbrg commented Mar 11, 2019

Interesting--I'll try dropping the args set for test and run. I wasn't able to use the --python-path or --action-env settings for sync in .bazelrc, since it appears that they're not supported in that context.

This is frustrating, since the net result of all this is that I cannot lock down my pip dependencies. (locking versions in requirements is something that we've tried to do, but it seems like sometimes deps-of-deps bump versions in a way that pip thinks is compatible but breaks our code)

@limdor
Copy link

limdor commented Mar 19, 2019

I just wanted to add a comment regarding freezing the version in requirements.txt. The issue will be that you need to specify the transitive dependencies also in the requirements.txt, if not it will give the false impresion that they are frozen.
Then when updating the version of your direct dependencies, you would like to update the transitive dependencies, it could be that some appear and some disapper and some change version.

@willstott101
Copy link

willstott101 commented Oct 8, 2020

I'd like to mention pipenv and it's Pipfile w/ Pipfile.lock which I've been using extensively to great effect in my Python projects. This solves the problems with requirements.txt not distinguishing between transitive and top-level dependencies.

I haven't yet tried to evaluate pipenv or even Pipfile support in Bazel, however many of the problems you're describing have good solutions with a Pipfile, which can specify python version, seperate dependencies and dev dependencies, and has a known lock format.

Perhaps language-specific lock files aren't ideal in bazel and I'd understand that. But worth mentioning I think.

Edit, found:
#72
#171

@groodt
Copy link
Collaborator

groodt commented Oct 8, 2020

Another lightweight reliable way to "resolve" a top-level requirements.in file to a transitively closed requirements.txt file is to use pip-tools compile

@whilp
Copy link

whilp commented Oct 8, 2020

I wrote a little glue to make a pip-compile workflow easy-ish in a bazel workspace:

https://github.com/whilp/world/blob/b4b01e019b0dc1888cc3c81e9ab5a242e9820717/requirements.in#L1
https://github.com/whilp/world/blob/b4b01e019b0dc1888cc3c81e9ab5a242e9820717/requirements.txt#L1
https://github.com/whilp/world/blob/b4b01e019b0dc1888cc3c81e9ab5a242e9820717/requirements/compile.py#L8

Here, I add deps (pinned to versions) in requirements.in, then run bazel requirements:compile as needed to update requirements.txt.

This kinda works (at least, doesn't break) with automated bumps from renovatebot:

whilp/world#434

(But I haven't gotten those bumps to update requirements.txt as well, as they should.)

@thundergolfer
Copy link

@willstott101 we originally used pipenv to produce our transitively closed and pinned requirements.txt file, but had problems and then moved to using pip-tools compile, as @groodt mentions above.

The problem with pipenv was that we wanted to be able to 're-lock' a lock-file deterministically to check platform-consistency (OSX, Linux) and validate that one of our repo users hadn't made a mistake when changing deps, but unfortunately pipenv couldn't deterministically re-lock.

These pipenv issues seem directly related to the issues we experienced:

pip-tools compile has for us proved better at consistently producing transitively-closed lock files.


@whilp thanks for post your solution. @alexeagle at Robinhood has a small Starlark-based pip-tools compile integration which I've seen and is much nicer than my company's bash script. He's planning to commit it here.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days.
Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

@github-actions github-actions bot added the Can Close? Will close in 30 days if there is no new activity label Apr 14, 2021
@leoluk
Copy link

leoluk commented Apr 14, 2021

Still relevant

@github-actions github-actions bot removed the Can Close? Will close in 30 days if there is no new activity label Apr 15, 2021
@thundergolfer
Copy link

For those interested in this issue, could you check out

def compile_pip_requirements(

and provide feedback on whether it adequately addresses the issue? It's not Bazel native, but it does easily produce a 'compiled' transitively closed list of dependencies with hashes of the package files.

@thekyz
Copy link

thekyz commented Jun 14, 2021

I'm curious about compile_pip_requirements @thundergolfer: how does it ensure that 2 people running the rule will end up with the same packages (in particular when dealing with transitive dependencies that don't specify the version completely) ?

@thundergolfer
Copy link

2 people running the rule

The pip-compile program is definitely vulnerable to differences in the machine that runs the logic (eg. OS, python version). In practice we have a CI check running to ensure there's agreement between local and CI and work to 'lock down' the development environment such that it's the same across machines.

in particular when dealing with transitive dependencies that don't specify the version completely

I believe pip-tools looks at the existing transitively locked requirements.txt file and will use the versions specified within if they are compatible. It does not rely on deps to specify things strictly with ==. The resolver will arrive at the same transitive set.

@github-actions
Copy link

github-actions bot commented Jan 2, 2022

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days.
Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

@github-actions github-actions bot added the Can Close? Will close in 30 days if there is no new activity label Jan 2, 2022
@github-actions
Copy link

github-actions bot commented Feb 2, 2022

This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"

@github-actions github-actions bot closed this as completed Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Can Close? Will close in 30 days if there is no new activity
Projects
None yet
Development

No branches or pull requests

9 participants