Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add back python_app / bundle, but improved #10733

Closed
Eric-Arellano opened this issue Sep 4, 2020 · 7 comments · Fixed by #10895
Closed

Add back python_app / bundle, but improved #10733

Eric-Arellano opened this issue Sep 4, 2020 · 7 comments · Fixed by #10895
Assignees
Milestone

Comments

@Eric-Arellano
Copy link
Contributor

A user explained that they have use for this goal. They very intentionally want the loose files to be kept outside the .pex file.

However, we should make this much simpler than before and redesign from first principles.

@Eric-Arellano Eric-Arellano added this to the 2.0.x milestone Sep 9, 2020
@Eric-Arellano
Copy link
Contributor Author

Eric-Arellano commented Sep 22, 2020

Requirements

  • Be able to have loose files not in the binary, e.g. not in the .pex file.
  • Be able to set the rel_path for files so that they have a common prefix.
  • Be able to choose the archive format.

Nice to haves

  • Get rid of bundles; it was confusing.
  • Consolidate the bundle goal into binary. It was confusing to everyone what the difference was.
  • Generalize this pattern; rather than python_app, jvm_app, etc, have something like archive.
  • Allow >1 binary to appear in the final archive.

Proposal

Add a new archive target, with the fields:

  • binaries -> path to binary targets, i.e. targets that ./pants binary knows how to operate on, even if not traditional python_binary targets.
  • files -> path to files targets (or subclasses)
  • format, e.g. .zip vs. .tar.gz

To accommodate rel_path, files gains a new field path_prefix: Optional[str]. This path_prefix will get used everywhere the files target is used, including when running tests. It's unclear how to guarantee this happens, e.g. if we want to special case the hydrate_sources rule. What should ./pants filedeps do? This step should be broken out as a precursor.

archive(
  binaries=[":bin1", ":bin2"],
  files=["//:files1"],
  format=".zip"
)
files(
  name="files1",
  sources=["*.json"],
  path_prefix="json_files/"
)

@stuhood
Copy link
Member

stuhood commented Sep 22, 2020

Thanks!

And as discussed in the comments on #10756, it would be good to be consistent about whether there are separate files and binaries lists, or whether because both of them should be treated as "untyped files", they should go in a single list for which we assume all entries can be converted into "untyped files".

To accommodate rel_path, files gains a new field path_prefix: Optional[str]. This path_prefix will get used everywhere the files target is used, including when running tests. It's unclear how to guarantee this happens, e.g. if we want to special case the hydrate_sources rule. What should ./pants filedeps do? This step should be broken out as a precursor.

This will be important.

One related challenge that occurred to me is that configuring the rel_path on files targets might mean that we end up needing a parallel option on (python_)binary targets, because currently you can only specify their name. Given that, some other prior art that @tdyas has mentioned is the maven assembly plugin (examples), which is a bit more imperative. An imperative strategy might involve declaring filtering/relocating/etc imperative statements that you could declare on a target that depended on the files... i.e., on an archive or binary, or on some intermediate relocated_files target or something.

EDIT: And another thing that we should likely fix here as well is that files targets are currently relative to the buildroot... but it's possible that they should be relative to a sourceroot or to their definition location.

@stuhood
Copy link
Member

stuhood commented Sep 24, 2020

We've marked this as a blocker for 2.0.0rc0, but mostly because of this question:

And another thing that we should likely fix here as well is that files targets are currently relative to the buildroot... but it's possible that they should be relative to a sourceroot or to their definition location.

If it turns out that we don't need to change files, then this probably won't be a blocker for rc0.

@Eric-Arellano
Copy link
Contributor Author

I was typing out some thoughts about how technically we can allow any target type in the files field, and a more apt name might be loose_sources. This could be convenient to put a python_library target in the field and have it included as loose files.

But while typing it out, I realized that in order to get the files to show up without their source roots in both archive and test, you would need to use a files() dependency for test, anyways. If you depend on a python_library in a python_tests target, we will strip the source root.

In other words, I think that always using a files() target to refer to loose source files is good. And +1 to the relative path feature living on a files target.

--

There's one downside that Pants generally doesn't work as well if >1 target refers to the same file. The main place this breaks things is dep inference. But, fortunately, our dep inference code only considers Python targets, so it's legal to have a files owner and a python_library target at the same time.

@Eric-Arellano
Copy link
Contributor Author

Eric-Arellano commented Sep 28, 2020

I'm working on the files target changes for now, and came up with these use cases for the rel_path feature:

  1. Remove a prefix, e.g. src/resources/f.json becomes f.json
  2. Add a prefix, e.g. f.json becomes json/f.json.
  3. Remove a prefix and add a prefix at the same time, e.g. src/resources/f.json becomes json/f.json.

Some other requirements:

  • filedeps should use the original paths
  • test and archive must use the modified paths
  • It's unclear what other goals like fmt and run should do. Likely, use the modified paths for consistency.

I'm thinking the most flexible design will be a path_prefix_mapper field, similar to the mapper field for bundle:

# 1 remove 'src/python'
files(
  path_prefix_mapper={"src/python": ""},
)

# 2 add 'thirdparty'
files(
  path_prefix_mapper={"": "thirdparty"},
)

# 3 remove and then add
files(
  path_prefix_mapper={"src/python": "thirdparty"},
)

This field would validate that the dict does not have >1 entry. Why? This manipulation should apply to every single file in the files target; you should not have branching like {"parent/dir1": "foo", "parent/dir2": "bar"}. If you need that granularity, create more granular files targets.

I think I'd prefer to avoid multiple different fields that are all related, like we had with bundle: https://v1.pantsbuild.org/build_dictionary.html. I found it a bit difficult to know when to use what, and how they related. We also don't have a good Target API story for multiple fields being related and validating their interaction, and it goes against the Field-Driven nature of the API: #9561.

@stuhood
Copy link
Member

stuhood commented Sep 29, 2020

I'm working on the files target changes for now, and came up with these use cases for the rel_path feature:
..
Some other requirements:

This all looks great!

It's unclear what other goals like fmt and run should do. Likely, use the modified paths for consistency.

fmt is not running the code, so it shouldn't need to apply source root rewriting (I'm guessing it already doesn't?). It would also make it much more challenging to write the manipulated outputs back into the filesystem.

And re: run: I think that with the formulation on this ticket, a *_binary target would not depend on the loose files: rather, an archive would. That would mean that run would not need to worry about loose files when passed a *_binary target. On the other hand, I do wonder what the behavior of run for an archive would be... but could punt on that.

This field would validate that the dict does not have >1 entry. Why? This manipulation should apply to every single file in the files target; you should not have branching like {"parent/dir1": "foo", "parent/dir2": "bar"}. If you need that granularity, create more granular files targets.

The single-entry dict is a bit odd. Maybe a type path_prefix_mapping=Mapping(from=, to=)?

@stuhood
Copy link
Member

stuhood commented Sep 29, 2020

I think that determining the connection between the "mapping" and source roots is the main thing to nail down with regard to unblocking rc0. Basically, determining whether:

  1. source roots should be stripped from files by default, with the mapping as an "overridden source root" for the file
  2. the default should continue to be that files are absolute (ie, their default source root is the buildroot)

Eric-Arellano added a commit that referenced this issue Oct 1, 2020
### Problem

As explained in #10733 and #10876, we are adding back an improved version of the `bundle` goal, and we need a way for users to have Pants relocate loose source files, which they previously did via `bundle(rel_path=)`.

Further, we want to allow users to relocate in other contexts, too, such as for `tests`.

### Solution

Add a new intermediate target type called `relocated_files`, which has fields for the original `files()` targets and for the desired mapping. This will then use codegen to apply the mapping and generate `FilesSources`.

This solution is chosen instead of two alternative designs:

1) Define the mapping on the `files()` target, as done in #10876. This implementation was messy and leaked in lots of places. We also think the risk was too high that a user would change a `files()` target and accidentally break multiple consumers.
2) Have every consumer define its own mapping. This does not scale well as it's not composable.

Instead, we find a happy medium by using this intermediate type. There is still a risk of #1, where changing the target will break every consumer. But we think this risk is much lower given the explicitness of having a `relocated_files` target type.

You can safely define multiple `relocated_files` targets over the same original `files` target if you want multiple distinct mappings for the same file.

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Oct 1, 2020
…al (#10881)

We now have several goals that build an asset:

* `binary`
* `awslambda`
* `setup-py`

Soon, we will be adding support for an `archive()` target type as part of #10733. An `archive` is not really a `binary`, nor any of the other two. We don't want to add a _fourth_ goal `archive`. It gets confusing when there are lots of goals; and it also makes things like `runtime_binary_dependencies` less flexible, e.g. not working with `python_awslambda`.

Semantically, the `package` goal is for transforming your code into a single exportable asset that goes into `--distdir`. This is different than, say, a `compile` goal; running on a library target should no-op.

This only deprecates `awslambda` and `binary` for now; `setup-py` will be a followup, as that doesn't port as easily due to passthrough args.

See https://docs.google.com/document/d/1q77XvcIU9dCzuzA2iFRQQ9AGIfV9jpR269415HdEt0E/edit# for a design doc diving more into this.

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Oct 2, 2020
Closes #10733.

Users liked being able to use `bundle` to be able to create archives which included loose files + binaries. This adds back that feature, but in a simpler way:

* `archive` is used regardless of language. No more `python_app` vs. `jvm_app`.
* Depending on loose files is down like a normal dependency now, no more `bundle`. To relocate files, you can use the `relocated_files` target.
* Your `archive` can contain other `archive`s, or anything built via `./pants package`.

For example:

```python
archive(
    files=["src/resources:relocated_json"],
    packages=["src/python:app"],
    format="tgz",
)
```

```bash
> ./pants package '//:app_archive'
```

[ci skip-rust]
[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants