Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated files must be present before invocation to be included in package_data #1064

Open
JonathonReinhart opened this issue Jun 15, 2017 · 4 comments
Labels
Needs Triage Issues that need to be evaluated for severity and status.

Comments

@JonathonReinhart
Copy link

JonathonReinhart commented Jun 15, 2017

I've created the following repository to demonstrate the problem:
https://github.com/JonathonReinhart/setuptools-package_data-issue

Copying what I've shown there:

This repository is an MCVE for the following issues:

Problem statement

When python setup.py sdist bdist_wheel is invoked from a clean starting point, data files (identified in package_data), which are generated during setup.py hooks, will not be included in the wheel.

  • From a clean starting point (git clean -fdx) (primarily removes mkpkg/generated_data):
    • python setup.py bdist_wheel - OK
    • python setup.py sdist then python setup.py bdist_wheel - OK
    • python setup.py sdist bdist_wheel - BROKEN

Steps to reproduce

First:

  • Run git clean -fdx -- remove any generated files
  • Run python setup.py sdist bdist_wheel
  • Observe only the following line in the output:
    • copying build/lib/mypkg/__init__.py -> build/bdist.linux-x86_64/wheel/mypkg
  • Run unzip -l dist/mypkg-1.2.3-py2-none-any.whl
    • Observe that mypkg/generated_data is missing from the wheel

Now that mypkg/generated_data is present:

  • Run python setup.py sdist bdist_wheel again
  • Observe the following lines in the output:
    • copying build/lib/mypkg/generated_data -> build/bdist.linux-x86_64/wheel/mypkg
    • copying build/lib/mypkg/__init__.py -> build/bdist.linux-x86_64/wheel/mypkg
  • Run unzip -l dist/mypkg-1.2.3-py2-none-any.whl
    • Observe that mypkg/generated_data is now present in the wheel

Clean the directory, and run other variations of sdist and bdist_wheel (as mentioned above) and observe that the problem does not manifest.

Workarounds that do not work

  • Hooking build_py instead of build
@JonathonReinhart
Copy link
Author

Sigh, it turns out setting include_package_data to true "fixes" the problem.

(again copied from my example project)

Solution

Setting include_package_data = True caused setup.py to behave as intended.

Note that the setuptools documentation
is not particularly clear on why this works:

include_package_data
If set to True, this tells setuptools to automatically include any data files
it finds inside your package directories that are specified by your
MANIFEST.in file. For more information, see the section below on Including
Data Files.

package_data
A dictionary mapping package names to lists of glob patterns. For a complete
description and examples, see the section below on Including Data Files. You
do not need to use this option if you are using include_package_data,
unless you need to add e.g. files that are generated by your setup script and
build process. (And are therefore not in source control or are files that you
don’t want to include in your source distribution.)

...

In summary, ...

include_package_data
Accept all data files and directories matched by MANIFEST.in.

package_data
Specify additional patterns to match files and directories that may or may
not be matched by MANIFEST.in or found in source control.

Nothing in this documentation describes the behavior I was seeing, however.
In my case, it seemed like the files discovered by sdist (using MANIFEST.in)
somehow overrode those that bdist_wheel would normally find.

For whatever reason, specifying include_package_data seems to have two,
somewhat inverted purposes.

@JonathonReinhart
Copy link
Author

command/build_py.py has the following ("positive") reference to include_package_data:

    def analyze_manifest(self):
        self.manifest_files = mf = {}
        if not self.distribution.include_package_data:
            return
        src_dirs = {}
        for package in self.packages or (): 
            # Locate package source directory
            src_dirs[assert_relative(self.get_package_dir(package))] = package

        self.run_command('egg_info')
        ei_cmd = self.get_finalized_command('egg_info')
        for path in ei_cmd.filelist.files:
           # ...

It seems that even though this is named analyze_manifest, invoking egg_info must be actually looking at more than just the manifest. It must also find files present in the packages "which may or may not be because they're listed in package_data.

Ugh. This is so complicated.

JonathonReinhart added a commit to JonathonReinhart/scuba that referenced this issue Jun 16, 2017
In 422546f, we worked around issue #77 where scubainit was not being
included in bdist_wheels. The actual fix however, was to simply set
include_package_data = True.

See pypa/setuptools#1064
JonathonReinhart added a commit to JonathonReinhart/scuba that referenced this issue Jun 16, 2017
In 422546f, we worked around issue #77 where scubainit was not being
included in bdist_wheels. The actual fix however, was to simply set
include_package_data = True.

See pypa/setuptools#1064
dhimmel added a commit to dhimmel/manubot that referenced this issue Aug 17, 2018
@pganssle pganssle added the Needs Triage Issues that need to be evaluated for severity and status. label Oct 19, 2018
@alexshpilkin
Copy link

There are no less than three places where a generated file gets kicked out of SOURCES.txt (not the sdist itself, but still required): FileList.append refuses to add nonexistent files, then manifest_maker.prune_file_list weeds them out, then manifest_maker.write_manifest weeds them out again.

All ultimately chain up to FileList._safe_path.

@alexshpilkin
Copy link

alexshpilkin commented May 1, 2019

Huh. Actually, SOURCES.txt is what matters — if a file gets there, it’ll get into the sdist if sdist.make_release_tree generates it.

The _safe_path magic sauce in FileList looks like it needs to be ripped out and replaced with bytes paths, to be honest, now that CPython 3.5 is EOL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issues that need to be evaluated for severity and status.
Projects
None yet
Development

No branches or pull requests

3 participants