Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow overriding sitl_gazebo build_cores with an environment variable #18047

Merged
merged 3 commits into from
Nov 7, 2021

Conversation

guilhermelawless
Copy link
Contributor

@guilhermelawless guilhermelawless commented Aug 10, 2021

Closes #17691

This is far from a perfect solution, it simply tries to offer a minimum of flexibility when deploying automated builds.

This allows the possibility of overriding the parallelism when building sitl_gazebo, which is memory intensive and can result in oom ( see #17691 , other issues have been closed in the past but not actually solved: #14747 #14430 ). This is especially the case as new CPUs have more and more logical cores, but systems still generally have 16GB or less of RAM. In my 16GB system, master currently uses 14 parallel jobs which results in oom and with no way of resolving it since it is hardcoded to N-2 jobs in ExternalProject_Add.

Example of usage: BUILD_CORES=8 DONT_RUN=1 make px4_sitl_default gazebo

An important caveat with this solution is that it only works during configuration, not build time. Unless the CMake file is reloaded, only the first time the example above is run will affect the number of parallel jobs. Trying to set a different BUILD_CORES environment variable afterwards does not work, because ExternalProject_Add is already defined

I was unable to find a workaround or a better solution. I was hoping the BUILD_COMMAND could be given in a way that it reads an environment variable during build time, but couldn't. The problem overall seems to come from cmake + make and not passing parallelism over to external projects.

Any alternative suggestions?

…ILD_CORES

This is far from a perfect solution, it simply tries to offer a minimum of flexibility when deploying automated builds.

This allows the possibility of overriding the parallelism when building sitl_gazebo, which is memory intensive and can result in oom ( see PX4#17691 ). This is especially the case as new CPUs have more and more logical cores, but systems still generally have 16GB or less of RAM. In my 16GB system, `master` currently uses 14 parallel jobs which results in oom and with no way of resolving it since it is hardcoded to N-2 jobs in `ExternalProject_Add`.

Example of usage: `BUILD_CORES=8 DONT_RUN=1 make px4_sitl_default gazebo`

Important caveat with this solution. Unless the CMake file is reloaded, only the first time the example above is run will affect the number of parallel jobs. Trying to set a different `BUILD_CORES` environment variable afterwards does not work, because `ExternalProject_Add` is already defined. I was unable to find a workaround or a better solution.
@dagar
Copy link
Member

dagar commented Aug 10, 2021

I believe what started happening is the builds were falling over on the CI machines with somewhat limited memory. Then combined with the possibility of a parallel build within a parallel build (ninja) things can get ugly.

@dagar
Copy link
Member

dagar commented Aug 10, 2021

The other possible option to try is --max-load or the load (-l) option for ninja build.

https://www.gnu.org/software/make/manual/make.html#index-_002d_002dmax_002dload-1

@dagar
Copy link
Member

dagar commented Aug 10, 2021

Yet another option is to try and set it form the cmake host information.

https://cmake.org/cmake/help/latest/command/cmake_host_system_information.html

We could make a rough guess based on TOTAL_PHYSICAL_MEMORY or AVAILABLE_PHYSICAL_MEMORY (at configure time). https://cmake.org/cmake/help/latest/command/cmake_host_system_information.html#mebibytes

…mate the parallel jobs

Using cmake_host_system_information, grabs AVAILABLE_PHYSICAL_MEMORY and adds another job for every 1.5GB of available memory.

This is tested on a single system with 16 logical cores and 16GB RAM (~11.5GB available, reported correctly by cmake).
@guilhermelawless
Copy link
Contributor Author

@dagar
I don't know enough about the whole PX4 build system to tinker with the other suggestion, but made an alternative solution based on your cmake_host_system_information suggestion, which meets the minimum supported cmake.

It falls back to the master N-2 solution if AVAILABLE_PHYSICAL_MEMORY does not report a value. Otherwise it adds 1 job for every 1.5GB available, based on empirical evidence.

endif()

message(STATUS "${NUMBER_OF_LOGICAL_CORES} logical cores detected and ${AVAILABLE_PHYSICAL_MEMORY} megabytes of memory available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a debug message or get rid of it. Since it only applies to the simulation builds I think it's misleading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to DEBUG and specified it's related to gazebo and ignition

@dagar dagar merged commit 47a1914 into PX4:master Nov 7, 2021
@guilhermelawless guilhermelawless deleted the fix/gazebo-build-oom branch April 11, 2022 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

px4_sitl gazebo ran out of RAM
2 participants