Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FYI - Build speed optimization - ccache massively speeds up build time #5018

Open
zackees opened this issue Nov 12, 2024 · 5 comments
Open

Comments

@zackees
Copy link

zackees commented Nov 12, 2024

Hi there, I'm making an emscripten docker file for platformio for a web compile of FastLED.

I've been grinding on build speed for the last week. I want to report that one of the biggest wins was slapping on ccache in front of the compiler.

As many have mentioned before, PlatformIO has a tendency to rebuild everything whenever order of deps/defines changes, which makes sense. ccache is a great way to mitigate that effect.

Today I discovered auto clean and disabling it. This makes it even faster. However I'm not sure what the failure condition is going to be with disabling auto-clean.

This compiler project is kind of a special case. Users pretty much just compile against FastLED. So my use case is how much can I pre-warm the ccache in release/debug/quick modes, stash that as a docker image layer. Then when a file is sent it just compiles then link against fastled. Right now I'm still dealing with linking against a bunch of object files. I've tried to combine this into a giant lib but no luck so far.

@zackees
Copy link
Author

zackees commented Nov 12, 2024

By the way, a super fast cloud compiler for platformio would be amazing. There is so much caching that can be done between all these different projects. There is also the benefit that the user doesn't have to install a giant framework to get something to compile.

In my case, I have a tool called fastled-wasm that just zips up a sketch (which is tiny) and the backed compiles it all in 4 seconds then sends it back. My 4 second compile is on a super weak cloud server at $7 a month.

Platformio, if it did this, could just install the necessary tools to upload the hex/bin to the actual device, and not do the 1.6GB download for RP2350 or the ESP32S3 for the framework dependencies.

Anyway, probably outside the scope of what PlatformIO wants to do. So I just wanted to share the result of the benefits that I'm seeing.

@robertlipe
Copy link

Platformio has a build cache but for small files, I'm not convinced that it's any faster than just recompiling the objects anyway - which it does completely too much.

I'm intrigued by "slapping on ccache". Did you modify scons directly to do this? I once tried the trick of making a front-end wrapper for g++ and friends that called ccache with argv[] but pio build outsmarts that by not running, e.g. xtensa-esp32-elf-g++ from the $PATH (as it has to be clever about calling xtensa-esp32- or xtensa-esp32s3- or riscv-) and instead reaches directly into /.platformio/packages/toolchain- and calls them from there. I wasn't thrilled about replacing those tools with front-ends to ccache.

Since this thing wants to check dependencies in Python, it's often slower than just building the stupid code.

Have you successfully built a maintainable configuration that calls distcc and/or ccache?

As you learned when building nightdriverled, platformio fetches and builds 39 nearly identical copies of most object when building, so those hour-long builds (which almost nobody but us maintainers do) are worth some pain to help.

@ivankravets
Copy link
Member

@Jason2866
Copy link
Contributor

Jason2866 commented Dec 4, 2024

@robertlipe I tried ccache with Platformio a while ago. There is a speed gain even when Platformios cache function is active. Since now and than unexplainable strange compiler and/or linker error occurred i trashed this approach to integrate in my fork.
You can try (the not anymore maintained) branch https://github.com/Jason2866/platform-espressif32/tree/ccache

The commit where ccache is enabled Jason2866/platform-espressif32@8cdf1e3

@zackees
Copy link
Author

zackees commented Dec 5, 2024

Platformio has a build cache but for small files, I'm not convinced that it's any faster than just recompiling the objects anyway - which it does completely too much.

I'm intrigued by "slapping on ccache". Did you modify scons directly to do this? I once tried the trick of making a front-end wrapper for g++ and friends that called ccache with argv[] but pio build outsmarts that by not running, e.g. xtensa-esp32-elf-g++ from the $PATH (as it has to be clever about calling xtensa-esp32- or xtensa-esp32s3- or riscv-) and instead reaches directly into /.platformio/packages/toolchain- and calls them from there. I wasn't thrilled about replacing those tools with front-ends to ccache.

Since this thing wants to check dependencies in Python, it's often slower than just building the stupid code.

Have you successfully built a maintainable configuration that calls distcc and/or ccache?

As you learned when building nightdriverled, platformio fetches and builds 39 nearly identical copies of most object when building, so those hour-long builds (which almost nobody but us maintainers do) are worth some pain to help.

What I mean by slap it on I mean I do a swap of environmental variables at "CC", "CXX", and "LINK" with scons so that instead of the "compiler" it's "ccache compiler". I AM doing this for the emscripten compiler instead of something like avr-gcc, but I think it works the same.

This is for a web compiler so it's MUCH more sensitive on CPU speed than a home computer. This web compiler was getting 8 seconds for a simple re-compile of the 8 or so object files for FastLED, and this dropped to less than a second for all 8 as I'm assuming it was hitting the ccache.

@ivankravets I am overriding the build directory, but what I'm doing is deploying client code into a specific directory and trying to use the previous compilation of shared code again. Something about scons is triggering a rebuild. I turned off the auto-clean feature and this massively sped it up and solved the build time problem, but then I started getting weird errors with the linker so I think I would have to do an in-depth manual clean step to be able to use this feature.

I haven't determined whether putting ccache in front of avr-gcc would work, but I strongly suspect it will because ccache is essentially just a kv database mapping pre-processed C/C++ code to object blobs.

The use case this project is pretty optimal, since the code I'm compiling is a user sketch vs FastLED. The ccache is great here because it solves about 90% of the problem. Scons is still over-calculating what needs to be compiled but ccache makes this a non issue. The fast path for a cached file is 30ms vs 1000ms without ccache per object file. FastLED does have some nasty header issues and tends to pull the whole world for each translation unit. So our library is very sensitive to re-compilations and that's not going to change any time soon.

To see how I did it, see our custom build flags file for platformio you can see it here:

https://github.com/FastLED/FastLED/blob/master/src/platforms/wasm/compiler/wasm_compiler_flags.py

Keep in mind we are doing an emscripten build for the upcoming web compile feature for FastLED. It turns out that platformio is extremely compatible in this with the only downside being that platformIO seems to have internal locks that prevent concurrent builds. I've mitigated this so far by using GCC syntax checking to fast fail invalid sketches so only valid C++ makes it to the critical section. And later, I may rip out the platformio build system altogether and just go with a CMake build system instead to unlock concurrent builds

Whats also surprising is how easy it is to use the emscripten compiler. All you have to do is swap out CC/CXX/LINK in the env variables and that's pretty much it. Obviously a lot of code that's platform specific especially with ASM won't compile, but all of the logic code pretty much does as clang is nearly identical to the familiar gcc toolchain behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants