FYI - Build speed optimization - ccache massively speeds up build time #5018

zackees · 2024-11-12T05:00:52Z

Hi there, I'm making an emscripten docker file for platformio for a web compile of FastLED.

I've been grinding on build speed for the last week. I want to report that one of the biggest wins was slapping on ccache in front of the compiler.

As many have mentioned before, PlatformIO has a tendency to rebuild everything whenever order of deps/defines changes, which makes sense. ccache is a great way to mitigate that effect.

Today I discovered auto clean and disabling it. This makes it even faster. However I'm not sure what the failure condition is going to be with disabling auto-clean.

This compiler project is kind of a special case. Users pretty much just compile against FastLED. So my use case is how much can I pre-warm the ccache in release/debug/quick modes, stash that as a docker image layer. Then when a file is sent it just compiles then link against fastled. Right now I'm still dealing with linking against a bunch of object files. I've tried to combine this into a giant lib but no luck so far.

zackees · 2024-11-12T05:08:09Z

By the way, a super fast cloud compiler for platformio would be amazing. There is so much caching that can be done between all these different projects. There is also the benefit that the user doesn't have to install a giant framework to get something to compile.

In my case, I have a tool called fastled-wasm that just zips up a sketch (which is tiny) and the backed compiles it all in 4 seconds then sends it back. My 4 second compile is on a super weak cloud server at $7 a month.

Platformio, if it did this, could just install the necessary tools to upload the hex/bin to the actual device, and not do the 1.6GB download for RP2350 or the ESP32S3 for the framework dependencies.

Anyway, probably outside the scope of what PlatformIO wants to do. So I just wanted to share the result of the benefits that I'm seeing.

robertlipe · 2024-11-30T22:55:30Z

Platformio has a build cache but for small files, I'm not convinced that it's any faster than just recompiling the objects anyway - which it does completely too much.

I'm intrigued by "slapping on ccache". Did you modify scons directly to do this? I once tried the trick of making a front-end wrapper for g++ and friends that called ccache with argv[] but pio build outsmarts that by not running, e.g. xtensa-esp32-elf-g++ from the $PATH (as it has to be clever about calling xtensa-esp32- or xtensa-esp32s3- or riscv-) and instead reaches directly into /.platformio/packages/toolchain- and calls them from there. I wasn't thrilled about replacing those tools with front-ends to ccache.

Since this thing wants to check dependencies in Python, it's often slower than just building the stupid code.

Have you successfully built a maintainable configuration that calls distcc and/or ccache?

As you learned when building nightdriverled, platformio fetches and builds 39 nearly identical copies of most object when building, so those hour-long builds (which almost nobody but us maintainers do) are worth some pain to help.

ivankravets · 2024-12-04T11:10:04Z

So, and how is bad https://docs.platformio.org/en/latest/projectconf/sections/platformio/options/directory/build_cache_dir.html ?

Jason2866 · 2024-12-04T13:02:10Z

@robertlipe I tried ccache with Platformio a while ago. There is a speed gain even when Platformios cache function is active. Since now and than unexplainable strange compiler and/or linker error occurred i trashed this approach to integrate in my fork.
You can try (the not anymore maintained) branch https://github.com/Jason2866/platform-espressif32/tree/ccache

The commit where ccache is enabled Jason2866/platform-espressif32@8cdf1e3

zackees · 2024-12-05T01:06:47Z

Platformio has a build cache but for small files, I'm not convinced that it's any faster than just recompiling the objects anyway - which it does completely too much.

I'm intrigued by "slapping on ccache". Did you modify scons directly to do this? I once tried the trick of making a front-end wrapper for g++ and friends that called ccache with argv[] but pio build outsmarts that by not running, e.g. xtensa-esp32-elf-g++ from the $PATH (as it has to be clever about calling xtensa-esp32- or xtensa-esp32s3- or riscv-) and instead reaches directly into /.platformio/packages/toolchain- and calls them from there. I wasn't thrilled about replacing those tools with front-ends to ccache.

Since this thing wants to check dependencies in Python, it's often slower than just building the stupid code.

Have you successfully built a maintainable configuration that calls distcc and/or ccache?

As you learned when building nightdriverled, platformio fetches and builds 39 nearly identical copies of most object when building, so those hour-long builds (which almost nobody but us maintainers do) are worth some pain to help.

What I mean by slap it on I mean I do a swap of environmental variables at "CC", "CXX", and "LINK" with scons so that instead of the "compiler" it's "ccache compiler". I AM doing this for the emscripten compiler instead of something like avr-gcc, but I think it works the same.

This is for a web compiler so it's MUCH more sensitive on CPU speed than a home computer. This web compiler was getting 8 seconds for a simple re-compile of the 8 or so object files for FastLED, and this dropped to less than a second for all 8 as I'm assuming it was hitting the ccache.

@ivankravets I am overriding the build directory, but what I'm doing is deploying client code into a specific directory and trying to use the previous compilation of shared code again. Something about scons is triggering a rebuild. I turned off the auto-clean feature and this massively sped it up and solved the build time problem, but then I started getting weird errors with the linker so I think I would have to do an in-depth manual clean step to be able to use this feature.

I haven't determined whether putting ccache in front of avr-gcc would work, but I strongly suspect it will because ccache is essentially just a kv database mapping pre-processed C/C++ code to object blobs.

The use case this project is pretty optimal, since the code I'm compiling is a user sketch vs FastLED. The ccache is great here because it solves about 90% of the problem. Scons is still over-calculating what needs to be compiled but ccache makes this a non issue. The fast path for a cached file is 30ms vs 1000ms without ccache per object file. FastLED does have some nasty header issues and tends to pull the whole world for each translation unit. So our library is very sensitive to re-compilations and that's not going to change any time soon.

To see how I did it, see our custom build flags file for platformio you can see it here:

https://github.com/FastLED/FastLED/blob/master/src/platforms/wasm/compiler/wasm_compiler_flags.py

Keep in mind we are doing an emscripten build for the upcoming web compile feature for FastLED. It turns out that platformio is extremely compatible in this with the only downside being that platformIO seems to have internal locks that prevent concurrent builds. I've mitigated this so far by using GCC syntax checking to fast fail invalid sketches so only valid C++ makes it to the critical section. And later, I may rip out the platformio build system altogether and just go with a CMake build system instead to unlock concurrent builds

Whats also surprising is how easy it is to use the emscripten compiler. All you have to do is swap out CC/CXX/LINK in the env variables and that's pretty much it. Obviously a lot of code that's platform specific especially with ASM won't compile, but all of the logic code pretty much does as clang is nearly identical to the familiar gcc toolchain behavior.

ivankravets added the help wanted label Dec 4, 2024

ivankravets added feature build system and removed help wanted labels Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FYI - Build speed optimization - ccache massively speeds up build time #5018

FYI - Build speed optimization - ccache massively speeds up build time #5018

zackees commented Nov 12, 2024

zackees commented Nov 12, 2024

robertlipe commented Nov 30, 2024

ivankravets commented Dec 4, 2024

Jason2866 commented Dec 4, 2024 •

edited

Loading

zackees commented Dec 5, 2024 •

edited

Loading

FYI - Build speed optimization - ccache massively speeds up build time #5018

FYI - Build speed optimization - ccache massively speeds up build time #5018

Comments

zackees commented Nov 12, 2024

zackees commented Nov 12, 2024

robertlipe commented Nov 30, 2024

ivankravets commented Dec 4, 2024

Jason2866 commented Dec 4, 2024 • edited Loading

zackees commented Dec 5, 2024 • edited Loading

Jason2866 commented Dec 4, 2024 •

edited

Loading

zackees commented Dec 5, 2024 •

edited

Loading