Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add relocated_resources target type #11559

Closed

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented Feb 12, 2021

As re-found in #11551, we intentionally do not include files() targets when packaging a PEX because the files cannot be easily accessed via the traditional filesystem APIs. Even when using unzip and venv modes, you must still relativize the open() to the __file__, rather than using a path relative to the build root. When you have source roots, __file__ behaves differently between ./pants run and ./pants package, as the latter strips source roots.

When dealing with pex_binary, almost always you should either use resources or use archive. But a user needs the flexibility of relocated_files - they have a file common_assets/common.css, along with several projects in the format <project>/assets/*.css. They need common.css to be in that final path <project>/assets/*.css when the PEX is built, but they don't want to copy that file in several places; symlinks work but are not first-class. They want to use the equivalent of relocated_files, but for it to work with their PEX.

relocated_resources are pretty advanced and will likely confuse some users with getting source roots correct. But this isn't necessarily a reason to avoid the feature; for the right user, this advanced feature will be necessary to their workflow. Beyond good docs, we leave it to the user to get the path manipulation working properly. Most users can ignore this.

[ci skip-rust]
[ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@Eric-Arellano Eric-Arellano changed the title WIP: Add relocated_resources target type Add relocated_resources target type Feb 12, 2021
@Eric-Arellano Eric-Arellano marked this pull request as ready for review February 12, 2021 21:17
@jsirois
Copy link
Contributor

jsirois commented Feb 12, 2021

As re-found in #11551, we intentionally do not include files() targets when packaging a PEX because the files cannot be easily accessed via the traditional filesystem APIs. Even when using unzip and venv modes, you must still relativize the open() to the file, rather than using a path relative to the build root. When you have source roots, file behaves differently between ./pants run and ./pants package, as the latter strips source roots.

A PEX is a deployable application. You cannot locate files in a deployed application by relying on any build root or any specific CWD for that matter. This however should have 0 bearing on whether we allow folks to package files in a PEX. They can completely legitimately use __file__ to properly locate other files in the app in a consistent way.

So I stopped on that paragraph since its not correct. I can review further if that's reasoning that can be left out of the description of this PR and there is a better trumping reason.

@Eric-Arellano
Copy link
Contributor Author

They can completely legitimately use file to properly locate other files in the app

Agreed.

in a consistent way.

Nack. This is the issue identified in #11551 (comment): if you have source roots, __file__ means something different depending on ./pants run vs. ./pants package && dist/app.pex, specifically because we're stripping source roots in the PEX.

If we can figure out a solution for that inconsistency, I'll all in for allowing files() to be used with a pex_binary and ./pants package. For example, perhaps we could somehow stop stripping source roots in the PEX and hardcode that PEX_EXTRA_SYS_PATH should be used in the PEX at runtime? I don't think the answer is to start stripping source roots with ./pants run.

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@jsirois
Copy link
Contributor

jsirois commented Feb 13, 2021

I'm thoroughly confused. Lets leave aside how things work now for a moment.

If I have a pex_binary that uses relocated_files, then pants package and pants run should both package / arrange things the same now matter how its done under the hood. I.E.: If package doesn't strip source roots for relocated files, neither should run. If so, then access via file should work the same in both.

Where is that wrong?

(and note this focus on pants run and pants package providing consistent layouts has nothing to do with python - we should do this for every language)

@Eric-Arellano
Copy link
Contributor Author

then pants package and pants run should both package / arrange things the same now matter how its done under the hood. I.E.: If package doesn't strip source roots for relocated files, neither should run.

Yes to never stripping relocated files. Those are never stripped in Pants.

The inconsistency is how we handle resources and Python files. Originally, we always stripped source roots - I introduced the original sin in V2 land in 2019 with #7696. But Benjy and I slowly chipped away at replacing this, as stripping has several downsides, including tool output not displaying the full path, and issues with namespace packages having conflicting content, like if src/python/pants/__init__.py and tests/python/pants/__init__.py had different content.

Now, the sole place we strip is when running ./pants package on a pex_binary because I don't know of a way to hardcode the PYTHONPATH into the pex. Would that be a feature we could add?

I agree it's broken that ./pants run and ./pants package treat __file__ differently. But I think ./pants run is correct, and it's ./pants package we'd want to fix to stop stripping source roots.

@jsirois
Copy link
Contributor

jsirois commented Feb 13, 2021

But Benjy and I slowly chipped away at replacing this, as stripping has several downsides, including tool output not displaying the full path, and issues with namespace packages having conflicting content,

These two examples are both examples of bad not good. At deploy run time conflicting namespace packages will cause a problem, they should do the same at ./pants run time. At deploy run time, printed paths will have source roots stripped - so at pants run time that gives a preview of what you'll be reading when debugging production runtime. Are these two examples the only justification for making run layout differently than package?

I want to again highlight the non-Pythonness of this issue / decision. The way we go here sets a precedent for all language handling.

@jsirois
Copy link
Contributor

jsirois commented Feb 13, 2021

Now, the sole place we strip is when running ./pants package on a pex_binary because I don't know of a way to hardcode the PYTHONPATH into the pex. Would that be a feature we could add?

The existing pex-tool/pex#987 would cover this.

I agree it's broken that ./pants run and ./pants package treat file differently. But I think ./pants run is correct, and it's ./pants package we'd want to fix to stop stripping source roots.

This seems broken to me. So, take for example --venv mode. If we package a PEX file that supports source roots via PYTHONPATH ammendments, then creating a venv would look like this:

  1. For each PYTHONPATH entry that has a prefix of the PEX file path itself, consider that an embedded source root and for all those files, copy their relative paths into venv//lib/pythonX.Y/site-packages/.
  2. When files in the various source roots conflict, then ... overwrite? Die?

@stuhood
Copy link
Member

stuhood commented Feb 17, 2021

I want to again highlight the non-Pythonness of this issue / decision. The way we go here sets a precedent for all language handling.

Yea, this. A lot of languages have a "resource" concept -- embedded in my app and loaded from a "path" in a language-specific way. And when things are loaded from a path like that, it makes sense for them to be shaped like path entries (ie, the language's modules). An overview of Java's, for example: https://docs.oracle.com/javase/8/docs/technotes/guides/lang/resources.html

The advantage of resources is supposed to be that they dodge the question of "where does the file actually live, and is it actually a file, or is it just a stream of bytes?": and that's handy precisely because it dodges the the kinds of problems that we have with the files target type: consistently knowing where they will live relative to your application, and knowing that they are actually "real" files on disk is hard.

files are hard to get right, but they're obviously necessary+correct for archive and test. Adding support for consuming them in other goals is good, but it remains important for them to be consistent.


I don't have a good answer for the "I'd like to load the same resource from multiple locations" usecase (it's a bit like wanting to load the same Python module from multiple places). In the usecases that I've seen, resources have been loaded from a consistent path by whichever code module they are associated with.

It's possible that this usecase is weird enough (unsure) that we should actually recommend using files for it, even if we also ban using files with zipapp+zipsafe=True (the reasoning being that the expected semantics of files just cannot be achieved with zipsafe=True because they will not be "real files on disk").

Base automatically changed from master to main March 19, 2021 19:20
@Eric-Arellano Eric-Arellano deleted the relocated-resources branch July 19, 2021 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants