Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go: design support for multiple go.mod in repository #13114

Closed
Eric-Arellano opened this issue Oct 5, 2021 · 5 comments · Fixed by #16386
Closed

Go: design support for multiple go.mod in repository #13114

Eric-Arellano opened this issue Oct 5, 2021 · 5 comments · Fixed by #16386
Labels
backend: Go Go backend-related issues

Comments

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Oct 5, 2021

A key feature of Pants is that you can easily import your code from anywhere, even if it exists in a separate "project" (e.g. python_distribution). Traditionally, inter-repository dependencies happen at the project-level, that <root>/projectA depends on <root>/projectB, whereas Pants empowers you to depend on the most granular level possible, e.g. files for Python.

What should this look like in Go?

Background

Import paths are the way "packages" (the atoms of Go) are referenced, e.g. in the import directive. For example, example.com/projA/strutil. We care about import paths for dependency inference.

We have these targets:

  • go_mod. This is roughly a "Go project". The user sets the module directive like example.com/projA there, which is the foundation of the import path.
  • go_package. A first-party package, corresponding to a directory.
  • go_third_party_package. Generated by inspecting the go.mod to see which external modules are used.

Proposed semantics

Can import any go_package target

If you have two go_mods with the import path example.com/projA and example.com/projB, then I believe it should be possible for them to import each others' packages, e.g. import "example.com/projA/strutil".

When building Go packages, we convert the package (internal and external) to get a __pkg__.a file. We set up the file importcfg to map that .a file with its import path. So it should be easy to build dependencies from another go.mod - we don't need to change anything to support this!

Every go_package and go_third_party_package require an owning go_mod

Go tooling requires that we have a go.mod for things like go list to work (required for metadata analysis). While we could hypothetically synthesize these, our lives will be much easier if we always assume there is an owning go.mod (go_mod target) for every go_package and _go_external_package.

Every first-party package has a unique import path

This is an expectation of Go: your modules must have unique names. Because the package import path starts with the module's path, it's safe for the individual package's name to be reused, e.g. example.com/projA/strutil vs example.com/projB/strutil.

It's unclear if Pants should eagerly error for this case. I think it's probably enough for dependency inference to not warn of ambiguity.

NB: even if you set package main, the import path is not overridden to main. See #13113.

Open questions

The third-party dependencies "universe"

In Python, we have a global universe of third-party dependencies, as explained at https://www.pantsbuild.org/docs/python-third-party-dependencies. If you have >1 target for the same requirement, then we can't infer which you want to use and you must explicitly add to dependencies.

What should we do with Go? It would be an error for the transitive closure to use >1 version of the same external package, as it would result in two __pkg__.a files under the same import path in the importcfg file. That restriction is the same as Python not allowing conflicting versions of the same dependency. Almost certainly, we should not infer a dependency on the module when there is ambiguity.

Do we allow a go_package to depend on a go_third_party_package from another go.mod? That is a little weird because we decided in #13050 to punt on Go performing dependency management for Go. Instead, we will expect users to run go mod tidy, which results in Go updating go.mod and go.sum by inspecting what imports files have. If projB uses an external package defined in projA/go.mod, then go mod tidy will add that to projB/go.mod. Then you get ambiguity, and you have to explicitly add the dep.

We could instead implement that you can only depend on a go_third_party_package from your own go.mod. If we do this, users will need to be vigilant that the transitive closure always uses the same version of the external package. That is, it would be legal to have projA/go.mod and projB/go.mod both have their own go_third_party_package targets representing the same external package, so long as the version was kept the same. Our code would build to the same __pkg__.a, so no merge conflict. (Although, I think transitive deps of that external package also need to be in sync.) We could maybe add eager validation to detect that?

@Eric-Arellano Eric-Arellano added the backend: Go Go backend-related issues label Oct 5, 2021
Eric-Arellano added a commit that referenced this issue Oct 6, 2021
Closes #13049. This greatly reduces boilerplate and also allows us to make some fields required like `import_path` and `subpath`, so that we don't have to calculate those in consuming rules like `build_go_pkg.py`.

## The address format

The generated address looks like `project#./dir`. @tdyas offered that this is intuitive for Go developers because they have to say `go build ./dir` already with the leading `./`. 

This solves how to represent when the `_go_internal_package` is in the same dir as the `go_mod`: `project#./`.

This also makes very clear the difference from external packages like `project#rsc.io/quote` vs. internal packages like `project#./dir`.

## Improves dependency inference

Now that the `import_path` field is required for both `_go_internal_package` and `_go_external_package`, we can create a global mapping of `import_path -> pkg_target`. This is necessary for #13114.

This also improves performance. We don't need to call `ResolvedGoPackage` on all the candidate targets a package might depend on to calculate their import paths. We do still need it when resolving the deps of a particular `_go_internal_package`, but we can be lazier when we call that codepath in not evaluating all candidate targets.

### `dependencies` benchmark

As expected, there is no difference because we are finding the dependencies of everything, so we still have to call `ResolvedGoPackage`. The perf gains are only in things sometimes being less eager, which isn't the case here.

Before:

```
❯ hyperfine -r 5 './pants_from_sources --no-pantsd --no-process-execution-local-cache dependencies ::'
  Time (mean ± σ):     26.501 s ±  1.537 s    [User: 29.554 s, System: 24.115 s]
  Range (min … max):   24.928 s … 28.763 s    5 runs
```

After:
```
❯ hyperfine -r 5 './pants_from_sources --no-pantsd --no-process-execution-local-cache dependencies ::'
  Time (mean ± σ):     26.359 s ±  0.526 s    [User: 29.600 s, System: 23.769 s]
  Range (min … max):   25.625 s … 26.993 s    5 runs
```

### `package` benchmark

Before:

```
❯ hyperfine -r 5 './pants_from_sources --no-pantsd --no-process-execution-local-cache package ::'
  Time (mean ± σ):     33.777 s ±  0.248 s    [User: 39.221 s, System: 39.389 s]
  Range (min … max):   33.517 s … 34.062 s    5 runs
```

After:

```
❯ hyperfine -r 5 './pants_from_sources --no-pantsd --no-process-execution-local-cache package ::'
Benchmark #1: ./pants_from_sources --no-pantsd --no-process-execution-local-cache package ::
  Time (mean ± σ):     31.049 s ±  0.702 s    [User: 40.606 s, System: 40.537 s]
  Range (min … max):   30.512 s … 32.273 s    5 runs
```

## TODO: fix `go_binary` inference of `main` field

#13117 added inference of the `main` field for `go_binary`, that it defaults to the `go_package` defined in that directory.

But target generation no longer generates targets actually in each directory. All generated targets are "located" in the BUILD file of the `go_mod`, i.e. their `spec_path` is set to that. So it no longer looks to `AddressSpecs` like there are any targets in each subdirectory, and there are >1 `_go_internal_package` targets in the `go_mod` dir.

Instead, we should use the `subpath` field to determine what directory the targets correspond to.

[ci skip-rust]
[ci skip-build-wheels]
@Eric-Arellano
Copy link
Contributor Author

@tdyas I don't think this is too tricky to implement. I'm thinking we fix our dependency inference mapping to be have a distinct mapping per go_mod. The mapping will only include third-party packages from that go.mod, but it will include all first party packages.

This means that we sidestep third-party package ambiguity, and we keep the ergonomics that you can automatically depend on another first-party module. I don't think we even need to consume the replace directive for this to work.

Wdyt?

@tdyas
Copy link
Contributor

tdyas commented Nov 24, 2021

One issue is 2+ first-party modules referring to different versions of the same third-party dependency. It is not clear what version of that dependency to use. We could support multiple go.mod's but not allow themselves to use each other directly in the repo. I.e., punt on solving the dependency triangle problem.

@wfscheper
Copy link
Contributor

I'm not entirely sure I understand the concern here. What version to use in this situation isn't ambiguous to go, so it seems that it shouldn't be ambiguous to pants either. Could someone elaborate on why this is a problem for pants?

@tdyas
Copy link
Contributor

tdyas commented Jun 9, 2022

I'm not entirely sure I understand the concern here. What version to use in this situation isn't ambiguous to go, so it seems that it shouldn't be ambiguous to pants either. Could someone elaborate on why this is a problem for pants?

It is more a limitation of Pants from us deferring work to get the initial version of this backend released. We deferred work to track the mapping of import path to package separately by go.mod file, which means the mapping is global to the Pants repository. See https://github.com/pantsbuild/pants/blob/8b91de44c31a6b64cfe8c4de7fdfc4f51cc5709b/src/python/pants/backend/go/target_type_rules.py for the applicable code.

@Afourcat
Copy link

Hello,

We are currently experimenting with pants, and, reading the golang plugin documentation, I understood that multiple go.mod per project are not yet (or at least partially?) supported.

Did you guys considered using the new go.work and go.work.sum files available in go 1.18?
It seems that you can rely on it in order to handle dependencies between multiple local go.mod files.

I may be missing the point entirely but if it seems reasonable to rely on this new golang feature, we may be able to find someone who can make a PR if it's not to complecated to implement.

Thank you,

Alexandre,

tdyas pushed a commit that referenced this issue Sep 8, 2022
…#16386)

The package mapping from import path to addresses (`ImportPathToPackages`) was global to the entire repository and not split by Go module. This prevented using multiple Go modules in a single repository. This PR solves the issue by introducing `GoImportPathMappingRequest` which allows the package mapping to be requested per module. 

That per-module mapping relies on a repository-wide mapping available as `AllGoModuleImportPathsMappings`. The `GoModuleImportPathsMappingsHook` union allows plugins to provide their own import path mappings. For example, this support is now used by the protobuf/go codegen backend to supply import paths from generated protobuf code, meaning that the Go backend is able to infer dependencies between Go code and protobuf code automatically.

Fixes #13114.

[ci skip-rust]
tdyas pushed a commit to tdyas/pants that referenced this issue Sep 8, 2022
…pantsbuild#16386)

The package mapping from import path to addresses (`ImportPathToPackages`) was global to the entire repository and not split by Go module. This prevented using multiple Go modules in a single repository. This PR solves the issue by introducing `GoImportPathMappingRequest` which allows the package mapping to be requested per module.

That per-module mapping relies on a repository-wide mapping available as `AllGoModuleImportPathsMappings`. The `GoModuleImportPathsMappingsHook` union allows plugins to provide their own import path mappings. For example, this support is now used by the protobuf/go codegen backend to supply import paths from generated protobuf code, meaning that the Go backend is able to infer dependencies between Go code and protobuf code automatically.

Fixes pantsbuild#13114.

[ci skip-rust]
tdyas pushed a commit that referenced this issue Sep 10, 2022
…s (Cherry pick of #16386) (#16799)

The package mapping from import path to addresses (`ImportPathToPackages`) was global to the entire repository and not split by Go module. This prevented using multiple Go modules in a single repository. This PR solves the issue by introducing `GoImportPathMappingRequest` which allows the package mapping to be requested per module.

That per-module mapping relies on a repository-wide mapping available as `AllGoModuleImportPathsMappings`. The `GoModuleImportPathsMappingsHook` union allows plugins to provide their own import path mappings. For example, this support is now used by the protobuf/go codegen backend to supply import paths from generated protobuf code, meaning that the Go backend is able to infer dependencies between Go code and protobuf code automatically.

Fixes #13114.

[ci skip-rust]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Go Go backend-related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants