Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve python dependency upgrade management for Tails workstations #1800

Open
msheiny opened this issue Jun 13, 2017 · 11 comments
Open

Improve python dependency upgrade management for Tails workstations #1800

msheiny opened this issue Jun 13, 2017 · 11 comments

Comments

@msheiny
Copy link
Contributor

msheiny commented Jun 13, 2017

Feature request

Description

Once PR for #1146 lands we want to re-evaluate the current strategy of upgrading and managing the virtualenv on the tails workstations. This was brought up in #1781 as part of the PR discussion.

There are two proposed solutions so far to improve this:

  • it has been suggested that pip-sync tool might be a better fit here to ensure we dont have dangling dependencies and the latest python dependencies are in place. The pip-sync command is part of pip-tools which will have to be installed in the virtualenv first. There would have to be some conditional logic to utilize pip install the first round and then utilize pip-sync for subsequent runs. I havent tested this, but we would also want to ensure that pip-tools plays nice with the --require-hashes option currently being utilized.

  • another possible solution to investigate would be to tar up the relevant virtualenv and host that ourselves in a way that is pulled down in an abstracted way via the securedrop-admin script. This strategy provides a number of improvements - we can drop the apt installation of a slew of compilation tools (required during pip install ), and as developers we can remove the logic concerning pip installation (which means not worrying that different dependencies get out of sync). This overall means less time waiting for the setup portion of the installation to occur on the tails workstations. With this strategy, we also do not have to maintain the separate pip requirement.txt with the sha256 hashes which currently breaks the workflow of using pip-compile as intended.

User Stories

As a user, I want to ensure that my securedrop workstation dependencies are in sync and are straight-forward to upgrade.

@ghost
Copy link

ghost commented Jun 14, 2017

What about populating a wheel directory with all the dependencies ? Maybe this has been discussed before :-) But that's what came to mind as an alternative to tar the virtualenv. Another advantage is that it can have manylinux wheels when/if they are not provided in pypi upstream, so that there never is a need for compilation. There is a script within Ceph that populates a wheel directory for the purpose of creating packages on a machine that has no access to the network. And another script within python-crush that builds binary wheels.

@msheiny
Copy link
Contributor Author

msheiny commented Jun 14, 2017

What about populating a wheel directory with all the dependencies ? Maybe this has been discussed before :-)

Thanks for bringing that up @dachary ! This hasn't been specifically discussed yet -- this virtualenv installation management is a new problem in the project (we were previously relying on Tails/Debian's version of ansible from apt). Any potential downsides that you have experienced to slinging around wheels?

@ghost
Copy link

ghost commented Jun 14, 2017

I've been happy with wheels and manylinux so far.

@psivesely
Copy link
Contributor

There would have to be some conditional logic to utilize pip install the first round and then utilize pip-sync for subsequent runs.

Why wouldn't pip-sync work the first time?

we would also want to ensure that pip-tools plays nice with the --require-hashes option currently being utilized.

It does. I mostly did everything listed in #1617 at one point (work now lost 😿), so I can confirm that.

I'm not in favor of the tarball idea--that seems really hacky and not very transparent. Not to mention the overhead for us to maintain yet another software distribution mechanism (we already have apt repos, git tags, Docker images, etc.).

With this strategy, we also do not have to maintain the separate pip requirement.txt with the sha256 hashes which currently breaks the workflow of using pip-compile as intended.

I'm not follow here. Can you explain further?

The wheel directory seems better, and is what we use on the app server, but there we have the problem of deprecated dependencies: #856. Maybe you have a way of dealing with this @dachary. Working out the persistence story to make sure our wheel directory is prepended to our PYTHONPATH shouldn't be difficult.

I think the way that is most easy and surefire is to rebuild the virtualenv from scratch each time the admin needs to run a playbook and the Python dependencies have changed since the last time they did. While this is not the most time or bandwidth efficient solution (the slowdown would be minimal if we could take advantage of pip's default cacheing, but alas, we're working with an amnesiac OS), we can at least agree that it should be reliable.

One thing to remember about keeping the same virtualenv long-term is that python, pip, setuptools, etc. are copied into the virtualenv and then do not get updated.

@ghost
Copy link

ghost commented Jun 14, 2017

we have the problem of deprecated dependencies: #856. Maybe you have a way of dealing with this @dachary.

I did not run into such a problem. I kind of assumed nothing is deprecated in pypi ? I'm very interested to hear if that's not the case ;-) Or maybe you're referring to something else ?

@psivesely
Copy link
Contributor

Or maybe you're referring to something else ?

Yes, deprecated as in deprecated for use by the SD project. Say we don't use a top-level dependency anymore, or one of our top-level dependencies no longer uses one of its dependencies. It's not urgent, but at the same time we'd rather remove that no-longer-used dependency.... Maybe deprecated is not the best word choice.

@msheiny
Copy link
Contributor Author

msheiny commented Jun 14, 2017

Why wouldn't pip-sync work the first time?

Well so you gotta install pip-tools first to use pip-sync. Obviously we could do that globally if no virtualenv already exists. Yeah, okay - I guess its not really an issue

With this strategy, we also do not have to maintain the separate pip requirement.txt with the sha256 hashes which currently breaks the workflow of using pip-compile as intended..... I'm not follow here. Can you explain further?

Oh 💩 - i totally didnt realize there was a --generate-hashes flag for pip-compile.... hahaha

I think the way that is most easy and surefire is to rebuild the virtualenv from scratch each time the admin needs to run a playbook and the Python dependencies have changed since the last time they did. While this is not the most time or bandwidth efficient solution (the slowdown would be minimal if we could take advantage of pip's default cacheing, but alas, we're working with an amnesiac OS), we can at least agree that it should be reliable.

If you were able to confirm that pip-sync plays well with hashes in the past 🎉 AND we rebuild the troublesome python dependencies as wheels --- that sounds like it should solve all bullet points and we wont need to blow away the virtualenv each time.

@garrettr
Copy link
Contributor

another possible solution to investigate would be to tar up the relevant virtualenv and host that ourselves in a way that is pulled down in an abstracted way via the securedrop-admin script.

I am less inclined towards this proposal for resolution because it requires us to maintain additional infrastructure (yes, even if it's "just an s3 bucket" it's still additional infrastructure).

That said, "tar up the relevant virtualenv" might be more complicated than you think. I am slightly concerned there could be potential issues with cross-compiling dependencies that have binary components. I think, as @dachary suggested, wheels are probably your best bet here, since they were designed with this kind of use case in mind.

All that said, is there any reason we wouldn't just use a pip wheel archive, like we already do for securedrop?

@psivesely
Copy link
Contributor

Well so you gotta install pip-tools first to use pip-sync. Obviously we could do that globally if no virtualenv already exists. Yeah, okay - I guess its not really an issue

Oh yeah, duh. I'm amnesiac about Tails amnesia sometimes lol.... It is an issue though, right? How do we keep pip-tools installed?

@psivesely
Copy link
Contributor

All that said, is there any reason we wouldn't just use a pip wheel archive, like we already do for securedrop?

See my concerns in #1800 (comment) re #856.

@ghost ghost added the goals: packaging label Dec 4, 2017
@zenmonkeykstop zenmonkeykstop changed the title Python dependency upgrade story Improve python dependency upgrade management for Tails workstations Sep 8, 2022
@zenmonkeykstop
Copy link
Contributor

This is still valid, and may see some attention as part of packaging changes (moving from deployment from github to deployment as a Debian package).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants