Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submodules step 2/2: draw the rest of the owl #5609

Merged
merged 9 commits into from
Jan 23, 2024

Conversation

Groxx
Copy link
Member

@Groxx Groxx commented Jan 19, 2024

What

Step 1 was in PR #5597

This second commit essentially:

  1. splits the gcloud-only dependencies into a submodule which imports the main github.com/uber/cadence module, and not the reverse
  2. also splits the "main" server build (cmd/server) into a submodule which imports the gcloud plugin and the main module (so that binary is unchanged)
  3. removes cloud.google.com/* dependencies from the main go.mod (requires both 1 and 2)
  4. adjusts things to work like they did before

Ultimately reducing our "main" (truly required) dependencies, while not making major changes to using or developing Cadence.

Why

Part on principle, and part due to internal version conflicts.

In principle, "plugins" are "optional" and we should not be forcing all optional dependencies on all users of any of Cadence.
Splitting dependencies into choose-your-own-adventure submodules is simply good library design for the ecosystem, and it's something we should be doing more of.

In practice, this is essentially being forced because the gcloud archiver plugin pulls in some cloud.google.com/* stuff, which we upgraded, which forces breaking google.golang.org/genproto upgrades, which conflicts with other things in our internal monorepo.
Removing this requirement (we do not use the gcloud archiver internally) allows us to continue using these incompatible libraries elsewhere, unblocking us while we wait for them to upgrade.
This may also be true for [waves in the direction of The Internet] all of You, so rejoice! Our pain is your gain. We now no longer force you to use these libraries.

We will likely be doing this with more submodules in the future, this is just the first proof-of-concept and establishes a pattern to follow for the others.
The client library will very likely also be doing this (or possibly in other repos) to separate out some of its more problematic dependencies.

How

I intend to write a more detailed doc soon, but as a tl;dr this "create a new submodule to separate optional dependencies" can be recreated for other future submodules by following this sequence:

1: make sure all imports "into" the to-be-a-submodule folder are reversed, and instead go "out" from the submodule to the "main" module.

E.g. PR #5597 converts a hardcoded switch statement into a registration system, and then this commit uses that to reverse the direction of the code's imports.

This is generally best done up front, because very nearly all of it can be done without any dependency changes, which leaves you with only a small window where your IDE does not quite work (while tidying the modules the first time).
This will also let you make a separate commit with most or all of your code changes in isolation, which is easier to review and verify as non-changing.

Technically you can put this off until later, but you may have to modify your code while you have no working refactoring / autocomplete / navigation tools, and that's just unnecessary pain.

2: create a go.mod in the to-be-a-submodule folder

That must look something like:

module github.com/uber/cadence/some/folder

// don't depend on a released library, depend on the current code:
replace github.com/uber/cadence => ../../

// and the thrift replace, any other replaces that may be necessary in other `go.mod`s, etc

3: go mod tidy in the repo root / the main go.mod

And confirm it removes the dependencies that only the submodule uses.

In particular, this MUST NOT add the new submodule to the dependencies, or you've got a main -> submodule import somewhere. Grep for it if so, it should be very easy to find, and reverse that dependency somehow.

4: go mod tidy in the new submodule.

If this pulls in incompatible versions, I've had good luck "seeding" the new go.mod file with the contents of the main go.mod file. It seems to keep versions more stable than replaces + go mod tidy alone.
(these incompatible versions are probably problematic somehow and may cause issues in the future, but we can at least use this trick or replace as necessary to ignore them for now)

5: add your new module to the root go.work so gopls and some other CLI tools continue to work.

Plus any makefile / CI / etc changes necessary. Hopefully you can just mimic existing multi-module stuff.

And that's essentially it!

In summary:

  • adjust your code
  • make a new go.mod
  • tidy everything
  • adjust tooling as needed

Bundle as much of ^ this as you care into a single commit, and it should all work as long as anyone using the submodule also pulls the same version of all other modules (because that's what you developed against, with replace ... => ../). Only same-version will be checked in CI, so only same-version should be expected to work, though other combinations may work in practice.

How others can use these new submodules

Because we require replace directives, a "simple" go get github.com/uber/cadence[/or/submodules] does not work and essentially never has and never will.
That's fine, they just have to add our replaces in a custom go.mod:

module their/project

// replace thrift (and any other replacements at version X, check our go.mod)
replace github.com/apache/thrift => github.com/apache/thrift v0.0.0-20161221203622-b2a4d4ae21c7

And use a main.go to include the submodules:

// optionally make it not buildable by default, makes some tools happier
//go:build just_for_gomod_purposes
package main
import (
  _ "github.com/uber/cadence/some/submodule"
  _ "github.com/uber/cadence/other/submodules/etc"
  _ "github.com/uber/cadence/cmd/server" // the main module, if you want it
)
func main() {} // does not need to be used / will not be run

And then go get github.com/uber/cadence/{some/submodule,other/submodules/etc,cmd/server,}@version to get all involved modules at the same version (as one command is most likely to work) which will merge all dependencies, update your go.mod, and you can now go run github.com/uber/cadence/cmd/server and it should work. Or start using the various modules to make a new custom main.go or whatever is needed.

If necessary, you can also replace all github.com/uber/cadence-hosted modules with the same version / SHA, which should definitely force it to work.

Updating works similarly - get them all at the same version, make sure your replaces are up to date (as go get will not adjust those), and it should Just Work™ like it did with a single module.


This was originally planned to be landed over multiple commits to allow submodules to refer to the previously-merged commit(s) to set up dependencies, but relative replace directives simplify this a LOT, and bring a MUCH better development experience.

The main pros/cons of using a multi-commit + "refer to other-module@HEAD^ rather than relative paths" approach is:

  • pro: people can go get the/submodule@version and they could be guaranteed to get a working set of dependencies
    • ... at some SHAs anyway. in principle it could be e.g. at every released version or @latest, but it cannot be guaranteed at all SHAs.
    • our thrift replace means this currently does not work anyway, and has not for a long time
  • con: submodules can only refer to the previous commit of other modules, as it must be a pushed SHA
    • this applies during development too, essentially requiring you to push a fork and replace cadence => fork version to do normal development
    • this "requires" multiple commits on master/somewhere permanent to truly release something stable. or they can go get the same version of all modules.
  • con: more complex dev and CI as things always refer to the previous state at best, not the current
    • this can be worked around by manually go getting / replacing in CI as needed to ensure the same versions, but that can become quite complex.

Filesystem-relative requires allow ^ all this in a single commit, immediately, without pushing, with the primary tradeoff being "all submodule-users must manually ensure all repo-hosted modules are at the same version" rather than just some (most of the time).

That's pretty easy to automate, and go get doesn't work out-of-the-box anyway due to thrift, so we're not really losing anything by requiring that.

@coveralls
Copy link

coveralls commented Jan 20, 2024

Pull Request Test Coverage Report for Build 018d3344-8ca6-4c64-b202-a384a93a204d

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.02%) to 62.646%

Totals Coverage Status
Change from base Build 018d32c6-7925-4250-8c00-aa4275f02b20: -0.02%
Covered Lines: 91813
Relevant Lines: 146558

💛 - Coveralls

@Groxx Groxx force-pushed the submod_final_step2 branch from 82f88f0 to 7d71e08 Compare January 20, 2024 00:11
@Groxx Groxx force-pushed the submod_final_step2 branch from 9108bec to 8aa7c8a Compare January 22, 2024 20:47
@Groxx Groxx marked this pull request as ready for review January 22, 2024 20:47
Comment on lines 486 to +488
$Q go build ./...
$Q cd common/archiver/gcloud; go build ./...
$Q cd cmd/server; go build ./...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if/when we get a ton of these, this is pretty easy to template. not worth it for 3 though IMO, make is pretty obtuse

# all tests other than end-to-end integration test fall into the pkg_test category
PKG_TEST_DIRS := $(filter-out $(INTEG_TEST_ROOT)% $(OPT_OUT_TEST), $(TEST_DIRS))
# all tests other than end-to-end integration test fall into the pkg_test category.
# ?= allows passing specific (space-separated) dirs for faster testing
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

came in handy for my own testing, figured I might as well leave it in

Comment on lines +6 to +7
# go.work and `go list -modfile=...` seem to interact badly, and complain about duplicates.
# the good news is that you can just drop that and `cd` to the folder and it works.
Copy link
Member Author

@Groxx Groxx Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they way they "interact badly" varies by Go version :\
none of the go.work-supporting ones (currently) seem to like it though. they all break or misbehave or complain.

Copy link
Member

@Shaddoll Shaddoll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a lot of hard core work. Thanks for doing this.

cmd/server/go.mod Outdated Show resolved Hide resolved
@Groxx Groxx merged commit 08d5994 into cadence-workflow:master Jan 23, 2024
16 checks passed
@Groxx Groxx deleted the submod_final_step2 branch January 23, 2024 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants