Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft of wheel packaging. #382

Closed
wants to merge 6 commits into from
Closed

Conversation

stealthycoin
Copy link
Contributor

Pushing this up a little early so that folks can try it out. Testing is not fleshed out, nor are all the error cases/prompts during deployment.

This branch won't pass automated linting/testing yet.

cc @kyleknap @jamesls @dstufft

Copy link
Contributor

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments to start. Still need to get through a lot of this.

assert os.path.isdir(deps_dir)
# Now we need to create a zip file and add in the site-packages
# dir first, followed by the app_dir contents next.
if not os.path.isfile(package_filename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably would want this built into osutils as a file_exists() to be consistent on using the OSUtils abstraction

requirements_file = os.path.join(project_dir, 'requirements.txt')
# Now we need to create a zip file and add in the site-packages
# dir first, followed by the app_dir contents next.
if dependency_builder is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to put this in the __init__ that way the if they do not specify a DependencyBuilder the OSUtils will at least get plumbed into that if they specified that as well. Or if not we want to be making sure you specify self._osutils when instantiating this.

def _get_requirements_file(self, project_dir):
# type: (str) -> str
# Gets the path to a requirements.txt file out of a project dir path
return os.path.join(project_dir, 'requirements.txt')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to include this functionality in the OSUtils. In general any call to the os library should probably live in the OSUtils class.

# type: (str) -> str
requirements_file = os.path.join(project_dir, 'requirements.txt')
deps_dir = self.site_package_dir(project_dir)
print(deps_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a left over print statement

return False

@contextlib.contextmanager
def _tempdir(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make sense to put this in OSUtils as well

final_command = [command]
final_command.extend(args)
# Call pip and hide stdout/stderr
with self._consume_stdout_and_stderr():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what does the output look like when you don't capture it? If it does not look good, it is worth not including it but we may want to at least log it in case anything important happened for debugging reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its pretty chatty, progress bars and warnings as pip installs things. I was thinking of capturing it and we can elevate it to the user if they have --debug enabled.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might try adding varying levels of -q to the pip call to see if that makes it better enough that it changes your mind on how you want to handle this. Though I'm fine with not exposing that by default too, just mentioning that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I hadn't considered that.


# Now that `directory` has all the manylinux1 compatible binary
# wheels we could get, and all the source distributions. We need to
# find all the source dists that do not have a mathcing wheel file,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mathcing -> matching

finally:
shutil.rmtree(tempdir)

def _download_dependencies(self, directory, requirements_file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth breaking each of these individual steps into separate internal methods so the method is not super long and it makes it a bit easier to read.

# PyPi for source distributions.
for ext in self.PYPI_SDIST_EXTS:
if filename.endswith(ext):
name, version = filename[:-len(ext)].split('-')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to make you sad, but sdist filenames cannot be deterministically parsed to a single name/version without prior knowledge of what name you're looking for. For example: foo-1-1.tar.gz could be either ("foo", "1-1") or ("foo-1", "1").

You can unpack the tarball and ask it though, doing somehting as simple as python setup.py --name and python setup.py --version should print the name/version to stdout... although that can fail if someone writes their setup.py so that it also prints to stdout. You can do it more robustly, but uh, it's really gross to do that. You get to monkeypatch the setup.py like pip does and then run python setup.py egg_info and parse the egg-info file. I can point you to code in pip that does that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

def download_all_dependencies(self, requirements_file, directory):
# type: (str, str) -> int
"""This command downloads all dependencies as sdists."""
# None of these should fail so we can request non-binary versions of
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I told you that none of these should fail so you should download the sdists, but it just occurs to me that this can still fail, because people can publish wheels to PyPI without publishing a sdist at all. This is pretty rare though, so it's possible we could just say that we don't support that edge case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I can get rid of the --no-binary part and adjust the rest of the steps minorly to account for that. Should cover all the cases.

"""This command downloads all dependencies as sdists."""
# None of these should fail so we can request non-binary versions of
# all the packages in one call to `pip download`
arguments = ['--no-binary=:all:', '-r', requirements_file, '--dest',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now people can influence where pip pulls its dependencies from using environment variables and config files. Do we want to allow that still or should chalice control where pip pulls its deps from and isolate itself from the host system? Either way, I think it would be nice to let people pass in their own repositories for people who don't want their deployment to depend on PyPI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm interesting. I guess technically pip is an implementation detail of the requirements builder. Though I think that would be useful to allow people who are getting requirements that are not on pypi.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to keep it an implementation detail, I suggest we run pip in isolated mode, and surface our own settings for passing in a repository or find-links style location and pass that into pip ourselves. That way we still control the interface and if we rip out pip then we can just translate those options to whatever the new thing is.

@jamesls
Copy link
Member

jamesls commented Jun 19, 2017

I haven't had a chance to look at the code yet, but just playing around with the PR, it seems to work well for most of the packages I tried.

I noticed that when I try out scipy, the download of the sdist takes forever. It looks like it's actually compiling the package as part of the download. As someone not familiar with the internals that's a bit surprising to me, as I would expect (regardless of pip's behavior) chalice to not compile something especially when there appears to be a lambda compatible wheel file (https://pypi.python.org/pypi/scipy).

Thoughts?

@dstufft
Copy link

dstufft commented Jun 19, 2017

@jamesls I think that is probably going to be solved for the case of Scipy when @stealthycoin removes the --no-binary flag from the initial download. More generically though, there's not much that can be done about it if a project is compiling something when running python setup.py egg_info (even though they should not be doing that), but in this specific case the wheels will solve it.

@stealthycoin
Copy link
Contributor Author

Yep was just about to comment the same thing. I think this will be solved when I update the download logic to not force sdists initially.

@stealthycoin
Copy link
Contributor Author

By the way scipy if you download it straight from PyPi is 65mb (after zipped) so it will fail to upload anyway. It will work if you vendor it yourself after manually stripping the binaries though since that gets it down to around 40mb after zipped. Then of course you don't have debug symbols.

@dstufft
Copy link

dstufft commented Jun 20, 2017

That reminds me of pypa/pip#3938, for pip I decided not to do anything about it because space is generally not a concern for us and stripping binaries isn't universally safe. However for Lambda space is a concern, so maybe it'd make sense to offer binary stripping as part of the deployment process?

@codecov-io
Copy link

codecov-io commented Jun 20, 2017

Codecov Report

Merging #382 into master will increase coverage by 0.75%.
The diff coverage is 98.96%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #382      +/-   ##
==========================================
+ Coverage   92.17%   92.92%   +0.75%     
==========================================
  Files          18       18              
  Lines        2440     2672     +232     
  Branches      320      352      +32     
==========================================
+ Hits         2249     2483     +234     
+ Misses        142      140       -2     
  Partials       49       49
Impacted Files Coverage Δ
chalice/utils.py 94.62% <100%> (+3.09%) ⬆️
chalice/deploy/deployer.py 89.52% <100%> (+0.22%) ⬆️
chalice/constants.py 100% <100%> (ø) ⬆️
chalice/compat.py 100% <100%> (+28%) ⬆️
chalice/deploy/packager.py 95.03% <98.75%> (+4.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0fb8399...73add2c. Read the comment docs.

@stealthycoin
Copy link
Contributor Author

Downside of that is that we would need to introduce options to strip particular dependencies, and within those its possible that there are multiple binaries that can be safe/unsafe to strip which seems like a granularity that is better suited for just letting them handle it themselves and put what they want in the vendor directory. We would also either have to take a dependency on a pure python ELF parser or do best effort based on what tools are available on the operating system. Not sure it's worth it.

@dstufft
Copy link

dstufft commented Jun 20, 2017

Yea, another option may be just the ability to run a user provided script as part of the deploy process that allows them to script that themselves-- although that's not super relevant to this particular PR.

@stealthycoin stealthycoin changed the title First draft of wheel packaging. Pushing up so it can be tried out. Draft of wheel packaging. Jun 20, 2017
Copy link
Contributor

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just had some additional comments. I like how you broke out each of the individual steps in the installation process into internal methods.

the chalice vendor folder.

Your deployment will continue but may not work correctly
if missing dependencies are not present For more information:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing period between present and For

requirements.txt Outdated
@@ -0,0 +1 @@
pip>=1.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to go into the setup.py and setup.cfg file instead. While you are at it, can we remove virtualenv as a dependency?

# which will serve as the master list of dependencies needed to deploy
# successfully.
self._pip.download_all_dependencies(requirements_file, directory)
deps = {Package(directory, filename) for filename
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there will ever any edge cases with making this a Set and say pip downloaded the same package more than once? I have not thought it through if this is an issue but this line of code stood out to me.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that pip should only ever be downloading a single package per project here. Unless I'm missing something from how we're calling pip here, if it did so that would be a bug.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool. I made the assumption that the set was being used for duplicates, but after talking offline with @stealthycoin it was just to be able to take advantage of set operations.

return '%s==%s' % (name, version)


class SdistMetadataFetcher(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... this code is pretty complicated. From the sounds of it there is not much we can do to simplify it. It may be worth looking to add logging statements if things go wrong or add a todo for that to help us if things go wrong in the logic.


def test_same_pkg_sdist_and_whl_collide(self, osutils):
with osutils.tempdir() as tempdir:
self._write_fake_sdist(tempdir, 'foobar-1.0')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to parameterize this for name and version. So we do not have to stick with foobar and 1.0 all over the place.


def test_setup_tar_gz(self, osutils, sdist_reader):
setup_py = self._SETUP_PY % (
self._SETUPTOOLS, 'foo-bar', '1.0-2b'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we are using '1.0-2b' and 'foo-bar' because those won't work with the naive approach? It might be worth making it clear by having an explicit test name for it. And also have a test for a normal package name and version.


class SdistMetadataFetcher(object):
"""This is the "correct" way to get name and version from an sdist."""
# https://github.com/pypa/pip/blob/master/pip/utils/setuptools_build.py
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a reasonable idea to embed a commit hash in this URL since this file can be renamed, deleted, etc in pip's master branch which would invalidate this link.

@stealthycoin stealthycoin force-pushed the whl-packaging branch 3 times, most recently from 6a126fe to 123452a Compare June 21, 2017 17:51
Copy link
Contributor

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I just had some general comments on how to clean up the tests. One question I do have is do we need to be making any assertions on the pip calls being made? Looks like right now we just assert the packages are in the site packages directory. It may be worth to on top of that make sure that the calls lead to adding the package as right now we are just relying on side effects.

'typing==3.5.3.0',
'six>=1.10.0,<2.0.0',
'pip>=9'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused now. Why pip 9 instead of 1.4 that you had before in the requirements.txt?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --platform arguments to pip download he's using are new in pip 9, the original version specifier allowed versions that wouldn't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh makes sense

@@ -23,6 +31,257 @@ def index():
return app


@contextlib.contextmanager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we use the builtin capsys fixture? https://docs.pytest.org/en/latest/capture.html#accessing-captured-output-from-a-test-function

That looks similar to what you need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool I hadn't seen that.

])

appdir = str(_create_app_structure(tmpdir))
self._write_requirements_txt('\n'.join(reqs), appdir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I wonder if we can make a higher level helper function that creates the app structure but does so by providing the tempdir and a list of requirements (instead of joining them with a newline and passing a string)? It will make it easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In python 2 create_app_structure returns a LocalPath object.

def test_can_get_whls_mixed_compat(self, tmpdir, osutils, pip_runner):
reqs = ['foo', 'bar', 'baz']
pip, runner = pip_runner
pip.add_side_effect('download', [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it looks like download is really the only command used for side effects. Would it be possible to simplify to something like this:

pip.packages_to_download([
    'foo-1.0-cp36-none-any.whl',
    'bar-1.2-cp36-cp36m-manylinux1_x86_64.whl',
    'baz-1.5-cp36-cp36m-linux_x86_64.whl'
])

Ideally I would like to not have to worry about the PipSideEffect object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wheel also uses it so I do need a way to provide that argument.

site_packages = builder.build_site_packages(appdir)
installed_packages = os.listdir(site_packages)
assert len(installed_packages) == 0
assert 'Could not install dependencies:\nbaz==1.5' in out.getvalue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think this should be in standard error if it is stating it cannot download a dependency

site_packages = builder.build_site_packages(appdir)
installed_packages = os.listdir(site_packages)
assert len(installed_packages) == 0
assert 'Could not install dependencies:\nbaz==1.5' in out.getvalue()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this again. I wonder if there is a better way to represent the packages that failed to install than relying on standard error? I wonder if we should be raising an error or returning the list of failed packages (along with successfully installed packages)?

])
# Foo is built from and is pure python so it yields a compatible
# wheel file.
pip.add_side_effect('wheel', [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah then for wheels we would have a similar:

pip.packages_to_build_wheels_for(['foo-1.2-cp36-none-any.whl'])

@stealthycoin
Copy link
Contributor Author

stealthycoin commented Jun 21, 2017

Capturing offline discussion. The PipSideEffect class will not be invoked directly instead a function off of FakePip such as packages_to_build_wheels_for will be called to register a set of files to be created based on the pip function invoked. There will also be a way to check that the input parameters match some expected set before writing the files.

@stealthycoin stealthycoin force-pushed the whl-packaging branch 2 times, most recently from 639b2c8 to bfeb58a Compare June 23, 2017 19:16
jcarlyl added 4 commits June 26, 2017 10:41
The download algorithm was changed to download all packages using
whatever it can find rather than forcing --no-binary=:all: which allows
it to download whl files on the first pass. This results in fewer
overlal downloads since now only the whl files that are incompatible
with the lambda runtime, and sdists need to be redownloaded.

Next it will try to download lamba-compatible whl files for all sdists
and lambda-incompatible whl files. Once this step is completed we
have all the whl files are are going to be able to get from PyPi that
are compatible with lambda, any sdists left over without a corresponding
compatible whl file will now be built to a whl.

Now that we have done our best to get a whl file for each dependency, we
do a final pass over the directory and install all compatible whl files
to the lambda package. Any left over dependencies that do not have a
compatible whl file are reported to the user, as they will need to be
manually added to the vendor directory before deployment.

Updated tests. And fixed pip requirement.
Virtualenv is no longer used so the compat functionality for it is no
longer needed. Added more tests and renamed/better commented the core
_download_dependencies logic in the DependencyBuilder.
Updated functional tests for packaging. Fix tests to work with new
interface.
Added tests for installing platlib and purelib from the data directory
of a wheel file's .data directory. Pip uses a subprocess instead of
importing pip and using that for the advantage of not polluting stdout,
stderr with progress bars. More granular error detection for pip usage
was also added. Specifically errors for: timeout, new connection failed,
and not having permission to write to disk. Tests were updated accordingly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants