-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: slow compilation of a package with a very large API #53092
Comments
To add some context for people coming across this issue: Go's build cache works at the package level, so modifying a package in any way will generally mean rebuilding the entire package. See https://pkg.go.dev/cmd/go#hdr-Build_and_test_caching. |
Adding @randall77 @griesemer @matloob @rsc @bcmills @golang/pkgsite for visibility here. Sorry about the spam, but we're trying to get traction. |
Please don't do that. The majority of the people you pinged don't even work on the compiler regularly. |
Apologies. This is not something I usually do. What's the right process? |
A good starting point might be to make the issue more actionable by tracing the command to figure out where the bottleneck is. That would at least help identify the appropriate maintainers to route this issue to, or perhaps even an opportunity for you to contribute a fix. |
Thanks for the guidance
Which results in
Attaching the files. |
Looks like the trace can be opened by the viewer hosted at https://ui.perfetto.dev/. |
Thanks.
It looks to me that we could save approx half the time if the former was running on multiple thread instead of single threaded. I don't know much about the internals though, is there a specific reason why that's running on a single thread at the moment? For build dependencies, most of the steps "take no time" which I'm assuming means they are being retrieved from cache. However some do take a little bit of time although they haven't changed, could it indicate a cache miss? For the later, could/should the link result be cached to some extent? |
The threading behavior in the trace doesn't match the actual threading behavior of the program — it's kind of an impedance mismatch between Go's threading model and the model expected by the trace viewer. However, the indexing changes coming up in Go 1.19 are intended to speed up that loading phase, especially on platforms with unusually slow filesystems (such as Windows 😅).
Possibly, yes. And we do expect to see at least one cache miss for
Unfortunately, linking a Go program is not incremental: if any of the inputs changes, the entire link step must be redone. It might be worth profiling and/or tracing the linker to see what can be done to reduce the latency of that step, but I know a lot of work has already gone into making the linker faster. For now, the best way to speed up linking is to present the linker with less code: that is, to break up APIs into smaller packages so that the linker doesn't have to process (and discard) a lot of irrelevant code. |
Great, looking forward to that version! Are you looking for feedback on those aspects? (i.e. would you like me to run the trace with go 1.19 and see if we get different results?)
Any action I should take to investigate that further?
Understood, thanks for the context. Is that work going to be part of Go1.19 as well?
We already identified a couple of actions we'll take or the API users can take to reduce the packages sizes. I was encouraged by the community using our SDK to reach out here before they are implemented because even if our code-base is large, some of the behaviour that was observed seemed odd. |
Another aspect we didn't get into so far is the documentation for the models sub package is failing to display https://pkg.go.dev/github.com/microsoftgraph/[email protected]/models |
That is a truly enormous number of source files. I would suggest splitting up the package into smaller, self-contained units rather than trying to get the various bits of toolchain to deal gracefully with such a monolith. |
We unfortunately don't have that option in our case, this package is representing and API that is a graph (no relation with GraphQL). All the types and endpoints are intricated, we've tried to make educated guesses to ship sub-packages on arbitrary divisions in the past but it's an ever-going debate of "you didn't split the cake the way I wanted it". We could potentially reduce the number of types by half IF we can get rid of the interfaces. The main reason we brought interfaces was because it seemed like the only way to support "downcasting" in Go at the moment. myCar := myVehicule.(Carable) // where carable is an interface that reflects all the accessors present on the struct But this doesn't work myCar := myVehicule.(*Car) And the same problem arises for upcasts. |
Just to give a bit of perspective: Go tooling generally assumes that Go packages are reasonably sized. Tools like |
what counts against a package size? types, functions? sub packages?
|
To a first approximation: the number and size of the declarations in the package. (Types, functions, even constants to some extent.) Subpackages within the same module are not entirely free, but much lower-cost: individual packages are more amenable to caching (the “compilation unit” for Go is a package, not a source file), and unused packages don't end up causing a lot of wasted work for the compiler and linker. |
Thanks, so the models package is probably the most painful one at this point. Do you have any advice on my downcast question? If we could solve for this, we could divide the number of types by 2. |
Interfaces are usually the most appropriate way to do what you're describing in Go. |
Hi everyone, What's been successful:
However, when rolled out generics to reduce further the amount of code (mostly type asserting values or collections), we've noticed significant memory increase. This is OKish on local dev machines as they usually have lots of memory, but it becomes a concern for CI since most agents are memory constrained. That results in the build process being killed by agent. I'll be updating the repro soon to reflect the changes. In the meantime, what traces would you like me to collect to investigate this memory impact of generics? |
Hi everyone, We rolled back the use of generics, the compiler uses much more memory with generics which leads it to trash the CPU on memory constrained agents (anything under 16GB of ram). We also flattened most of the packages for the fluent API surface, having a few large packages works much better than having lots of very small packages for the caching system. (e.g. everything that was under /users/id/something is now right away under /users). The only downside of that last change is that pkg.go.dev is now failing for the fluent API surface as well. This is something can live with as we have a public docs platform that not only contains the snippets but also the documentation for the APIs. The only trick left up our sleave would be to map repeating nodes of the fluent API surface instead of duplicating them, we're not yet committed to that change (this is a decision larger than just the Go client that'd impact the accuracy of our clients with regards to the REST API). We anticipate that last change could yield an additional 15% build time improvement. I hope those additional details help people running into similar issues and the broader Go community to improve support for large packages in the future. Thanks everyone here and on the issues in our repos for the help getting to the bottom of this! ❤️ Closing. |
What version of Go are you using (
go version
)?go1.18.2 windows/amd64
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?What did you do?
Now replace the following lines
Then run
go build
again, which takes about 30 seconds, not expected, should take a couple of seconds thanks to caching.What did you expect to see?
Subsequent builds should be in the sub seconds time as the packages should be cached.
What did you see instead?
Takes about the original build time.
Additional context
We're maintaining a large package that's generated from an API description. The API itself is fairly large. The initial ask from our SDK users was about the build time increasing significantly when adding our package and caching seeming like it's not fully working from one go build to another.
With the release of 1.18 and generics, we're already identified opportunities for improvements to generate less code, but haven't acted on it yet. @breml @josharian @dominikh were really helpful with framing the issue and suggested we posted here before making those improvements.
Additionally, the go doc for our package is not working either.
Appendix
main.go content
The text was updated successfully, but these errors were encountered: