-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft of wheel packaging. #382
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments to start. Still need to get through a lot of this.
chalice/deploy/packager.py
Outdated
assert os.path.isdir(deps_dir) | ||
# Now we need to create a zip file and add in the site-packages | ||
# dir first, followed by the app_dir contents next. | ||
if not os.path.isfile(package_filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably would want this built into osutils as a file_exists()
to be consistent on using the OSUtils abstraction
chalice/deploy/packager.py
Outdated
requirements_file = os.path.join(project_dir, 'requirements.txt') | ||
# Now we need to create a zip file and add in the site-packages | ||
# dir first, followed by the app_dir contents next. | ||
if dependency_builder is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to put this in the __init__
that way the if they do not specify a DependencyBuilder
the OSUtils
will at least get plumbed into that if they specified that as well. Or if not we want to be making sure you specify self._osutils when instantiating this.
chalice/deploy/packager.py
Outdated
def _get_requirements_file(self, project_dir): | ||
# type: (str) -> str | ||
# Gets the path to a requirements.txt file out of a project dir path | ||
return os.path.join(project_dir, 'requirements.txt') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to include this functionality in the OSUtils. In general any call to the os
library should probably live in the OSUtils
class.
chalice/deploy/packager.py
Outdated
# type: (str) -> str | ||
requirements_file = os.path.join(project_dir, 'requirements.txt') | ||
deps_dir = self.site_package_dir(project_dir) | ||
print(deps_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a left over print statement
chalice/deploy/packager.py
Outdated
return False | ||
|
||
@contextlib.contextmanager | ||
def _tempdir(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make sense to put this in OSUtils as well
chalice/deploy/packager.py
Outdated
final_command = [command] | ||
final_command.extend(args) | ||
# Call pip and hide stdout/stderr | ||
with self._consume_stdout_and_stderr(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what does the output look like when you don't capture it? If it does not look good, it is worth not including it but we may want to at least log it in case anything important happened for debugging reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its pretty chatty, progress bars and warnings as pip installs things. I was thinking of capturing it and we can elevate it to the user if they have --debug
enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might try adding varying levels of -q
to the pip call to see if that makes it better enough that it changes your mind on how you want to handle this. Though I'm fine with not exposing that by default too, just mentioning that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I hadn't considered that.
chalice/deploy/packager.py
Outdated
|
||
# Now that `directory` has all the manylinux1 compatible binary | ||
# wheels we could get, and all the source distributions. We need to | ||
# find all the source dists that do not have a mathcing wheel file, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mathcing -> matching
chalice/deploy/packager.py
Outdated
finally: | ||
shutil.rmtree(tempdir) | ||
|
||
def _download_dependencies(self, directory, requirements_file): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth breaking each of these individual steps into separate internal methods so the method is not super long and it makes it a bit easier to read.
chalice/deploy/packager.py
Outdated
# PyPi for source distributions. | ||
for ext in self.PYPI_SDIST_EXTS: | ||
if filename.endswith(ext): | ||
name, version = filename[:-len(ext)].split('-') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to make you sad, but sdist filenames cannot be deterministically parsed to a single name/version without prior knowledge of what name you're looking for. For example: foo-1-1.tar.gz
could be either ("foo", "1-1")
or ("foo-1", "1")
.
You can unpack the tarball and ask it though, doing somehting as simple as python setup.py --name
and python setup.py --version
should print the name/version to stdout... although that can fail if someone writes their setup.py
so that it also prints to stdout. You can do it more robustly, but uh, it's really gross to do that. You get to monkeypatch the setup.py
like pip does and then run python setup.py egg_info
and parse the egg-info file. I can point you to code in pip that does that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
chalice/deploy/packager.py
Outdated
def download_all_dependencies(self, requirements_file, directory): | ||
# type: (str, str) -> int | ||
"""This command downloads all dependencies as sdists.""" | ||
# None of these should fail so we can request non-binary versions of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I told you that none of these should fail so you should download the sdists, but it just occurs to me that this can still fail, because people can publish wheels to PyPI without publishing a sdist at all. This is pretty rare though, so it's possible we could just say that we don't support that edge case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I can get rid of the --no-binary
part and adjust the rest of the steps minorly to account for that. Should cover all the cases.
chalice/deploy/packager.py
Outdated
"""This command downloads all dependencies as sdists.""" | ||
# None of these should fail so we can request non-binary versions of | ||
# all the packages in one call to `pip download` | ||
arguments = ['--no-binary=:all:', '-r', requirements_file, '--dest', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now people can influence where pip pulls its dependencies from using environment variables and config files. Do we want to allow that still or should chalice control where pip pulls its deps from and isolate itself from the host system? Either way, I think it would be nice to let people pass in their own repositories for people who don't want their deployment to depend on PyPI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm interesting. I guess technically pip is an implementation detail of the requirements builder. Though I think that would be useful to allow people who are getting requirements that are not on pypi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep it an implementation detail, I suggest we run pip in isolated mode, and surface our own settings for passing in a repository or find-links style location and pass that into pip ourselves. That way we still control the interface and if we rip out pip then we can just translate those options to whatever the new thing is.
a223829
to
1f37998
Compare
I haven't had a chance to look at the code yet, but just playing around with the PR, it seems to work well for most of the packages I tried. I noticed that when I try out Thoughts? |
@jamesls I think that is probably going to be solved for the case of Scipy when @stealthycoin removes the |
Yep was just about to comment the same thing. I think this will be solved when I update the download logic to not force sdists initially. |
By the way scipy if you download it straight from PyPi is |
That reminds me of pypa/pip#3938, for pip I decided not to do anything about it because space is generally not a concern for us and stripping binaries isn't universally safe. However for Lambda space is a concern, so maybe it'd make sense to offer binary stripping as part of the deployment process? |
Codecov Report
@@ Coverage Diff @@
## master #382 +/- ##
==========================================
+ Coverage 92.17% 92.92% +0.75%
==========================================
Files 18 18
Lines 2440 2672 +232
Branches 320 352 +32
==========================================
+ Hits 2249 2483 +234
+ Misses 142 140 -2
Partials 49 49
Continue to review full report at Codecov.
|
Downside of that is that we would need to introduce options to strip particular dependencies, and within those its possible that there are multiple binaries that can be safe/unsafe to strip which seems like a granularity that is better suited for just letting them handle it themselves and put what they want in the vendor directory. We would also either have to take a dependency on a pure python ELF parser or do best effort based on what tools are available on the operating system. Not sure it's worth it. |
Yea, another option may be just the ability to run a user provided script as part of the deploy process that allows them to script that themselves-- although that's not super relevant to this particular PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just had some additional comments. I like how you broke out each of the individual steps in the installation process into internal methods.
chalice/constants.py
Outdated
the chalice vendor folder. | ||
|
||
Your deployment will continue but may not work correctly | ||
if missing dependencies are not present For more information: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing period between present and For
requirements.txt
Outdated
@@ -0,0 +1 @@ | |||
pip>=1.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to go into the setup.py
and setup.cfg
file instead. While you are at it, can we remove virtualenv
as a dependency?
# which will serve as the master list of dependencies needed to deploy | ||
# successfully. | ||
self._pip.download_all_dependencies(requirements_file, directory) | ||
deps = {Package(directory, filename) for filename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there will ever any edge cases with making this a Set and say pip downloaded the same package more than once? I have not thought it through if this is an issue but this line of code stood out to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that pip should only ever be downloading a single package per project here. Unless I'm missing something from how we're calling pip here, if it did so that would be a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok cool. I made the assumption that the set was being used for duplicates, but after talking offline with @stealthycoin it was just to be able to take advantage of set operations.
chalice/deploy/packager.py
Outdated
return '%s==%s' % (name, version) | ||
|
||
|
||
class SdistMetadataFetcher(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... this code is pretty complicated. From the sounds of it there is not much we can do to simplify it. It may be worth looking to add logging statements if things go wrong or add a todo for that to help us if things go wrong in the logic.
tests/unit/deploy/test_packager.py
Outdated
|
||
def test_same_pkg_sdist_and_whl_collide(self, osutils): | ||
with osutils.tempdir() as tempdir: | ||
self._write_fake_sdist(tempdir, 'foobar-1.0') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to parameterize this for name and version. So we do not have to stick with foobar and 1.0 all over the place.
tests/unit/deploy/test_packager.py
Outdated
|
||
def test_setup_tar_gz(self, osutils, sdist_reader): | ||
setup_py = self._SETUP_PY % ( | ||
self._SETUPTOOLS, 'foo-bar', '1.0-2b' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we are using '1.0-2b'
and 'foo-bar'
because those won't work with the naive approach? It might be worth making it clear by having an explicit test name for it. And also have a test for a normal package name and version.
chalice/deploy/packager.py
Outdated
|
||
class SdistMetadataFetcher(object): | ||
"""This is the "correct" way to get name and version from an sdist.""" | ||
# https://github.com/pypa/pip/blob/master/pip/utils/setuptools_build.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a reasonable idea to embed a commit hash in this URL since this file can be renamed, deleted, etc in pip's master
branch which would invalidate this link.
6a126fe
to
123452a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I just had some general comments on how to clean up the tests. One question I do have is do we need to be making any assertions on the pip calls being made? Looks like right now we just assert the packages are in the site packages directory. It may be worth to on top of that make sure that the calls lead to adding the package as right now we are just relying on side effects.
'typing==3.5.3.0', | ||
'six>=1.10.0,<2.0.0', | ||
'pip>=9' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused now. Why pip 9 instead of 1.4 that you had before in the requirements.txt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The --platform
arguments to pip download
he's using are new in pip 9, the original version specifier allowed versions that wouldn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh makes sense
tests/functional/test_package.py
Outdated
@@ -23,6 +31,257 @@ def index(): | |||
return app | |||
|
|||
|
|||
@contextlib.contextmanager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we use the builtin capsys
fixture? https://docs.pytest.org/en/latest/capture.html#accessing-captured-output-from-a-test-function
That looks similar to what you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah cool I hadn't seen that.
tests/functional/test_package.py
Outdated
]) | ||
|
||
appdir = str(_create_app_structure(tmpdir)) | ||
self._write_requirements_txt('\n'.join(reqs), appdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I wonder if we can make a higher level helper function that creates the app structure but does so by providing the tempdir and a list of requirements (instead of joining them with a newline and passing a string)? It will make it easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In python 2 create_app_structure returns a LocalPath object.
tests/functional/test_package.py
Outdated
def test_can_get_whls_mixed_compat(self, tmpdir, osutils, pip_runner): | ||
reqs = ['foo', 'bar', 'baz'] | ||
pip, runner = pip_runner | ||
pip.add_side_effect('download', [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like download is really the only command used for side effects. Would it be possible to simplify to something like this:
pip.packages_to_download([
'foo-1.0-cp36-none-any.whl',
'bar-1.2-cp36-cp36m-manylinux1_x86_64.whl',
'baz-1.5-cp36-cp36m-linux_x86_64.whl'
])
Ideally I would like to not have to worry about the PipSideEffect
object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wheel also uses it so I do need a way to provide that argument.
tests/functional/test_package.py
Outdated
site_packages = builder.build_site_packages(appdir) | ||
installed_packages = os.listdir(site_packages) | ||
assert len(installed_packages) == 0 | ||
assert 'Could not install dependencies:\nbaz==1.5' in out.getvalue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think this should be in standard error if it is stating it cannot download a dependency
tests/functional/test_package.py
Outdated
site_packages = builder.build_site_packages(appdir) | ||
installed_packages = os.listdir(site_packages) | ||
assert len(installed_packages) == 0 | ||
assert 'Could not install dependencies:\nbaz==1.5' in out.getvalue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this again. I wonder if there is a better way to represent the packages that failed to install than relying on standard error? I wonder if we should be raising an error or returning the list of failed packages (along with successfully installed packages)?
tests/functional/test_package.py
Outdated
]) | ||
# Foo is built from and is pure python so it yields a compatible | ||
# wheel file. | ||
pip.add_side_effect('wheel', [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah then for wheels we would have a similar:
pip.packages_to_build_wheels_for(['foo-1.2-cp36-none-any.whl'])
Capturing offline discussion. The |
Broke function down into smaller components.
639b2c8
to
bfeb58a
Compare
The download algorithm was changed to download all packages using whatever it can find rather than forcing --no-binary=:all: which allows it to download whl files on the first pass. This results in fewer overlal downloads since now only the whl files that are incompatible with the lambda runtime, and sdists need to be redownloaded. Next it will try to download lamba-compatible whl files for all sdists and lambda-incompatible whl files. Once this step is completed we have all the whl files are are going to be able to get from PyPi that are compatible with lambda, any sdists left over without a corresponding compatible whl file will now be built to a whl. Now that we have done our best to get a whl file for each dependency, we do a final pass over the directory and install all compatible whl files to the lambda package. Any left over dependencies that do not have a compatible whl file are reported to the user, as they will need to be manually added to the vendor directory before deployment. Updated tests. And fixed pip requirement.
Virtualenv is no longer used so the compat functionality for it is no longer needed. Added more tests and renamed/better commented the core _download_dependencies logic in the DependencyBuilder.
Updated functional tests for packaging. Fix tests to work with new interface.
Added tests for installing platlib and purelib from the data directory of a wheel file's .data directory. Pip uses a subprocess instead of importing pip and using that for the advantage of not polluting stdout, stderr with progress bars. More granular error detection for pip usage was also added. Specifically errors for: timeout, new connection failed, and not having permission to write to disk. Tests were updated accordingly.
67d1e00
to
b19b71b
Compare
Pushing this up a little early so that folks can try it out. Testing is not fleshed out, nor are all the error cases/prompts during deployment.
This branch won't pass automated linting/testing yet.
cc @kyleknap @jamesls @dstufft