Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1105 Speed up and reduce memory for builds #1106

Merged
merged 10 commits into from
Oct 10, 2020
Merged

Conversation

lifflander
Copy link
Collaborator

@lifflander lifflander commented Oct 9, 2020

Fixes #1105

  • Switch to using the gold linker by default when applicable--ELF format (less memory, and faster)
  • Update all docker CI images to use CMake >=3.16 to take advantage of Unity/Jumbo builds
  • Fix ODR violations/header bugs discovered from Unity builds
  • Combine tests by directory into a single executable per type (basic, extended, nompi)
  • With combined tests, Unity builds now invoke one build/link command per test directly per type of test

With these changes, up to 50% faster in build performance across the board. The faster CI targets now take 15 minutes completely out of cache. We use half as much disk space as before with these changes!

Still waiting for all my images to build/push to see how it affects nvcc/intel performance.

The only downside of Unity builds is that more will need to be rebuilt if you modify a file. By default, Unity is not enabled unless running in CI (which we could change?)

@codecov
Copy link

codecov bot commented Oct 9, 2020

Codecov Report

Merging #1106 into develop will decrease coverage by 1.69%.
The diff coverage is 81.25%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1106      +/-   ##
===========================================
- Coverage    78.13%   76.43%   -1.70%     
===========================================
  Files          671      698      +27     
  Lines        25960    26475     +515     
===========================================
- Hits         20283    20237      -46     
- Misses        5677     6238     +561     
Impacted Files Coverage Δ
src/vt/pipe/signal/signal_holder.h 100.00% <ø> (ø)
src/vt/vrt/collection/balance/model/norm.h 100.00% <ø> (ø)
tests/unit/active/test_active_bcast_put.cc 100.00% <ø> (ø)
tests/unit/active/test_active_broadcast.cc 100.00% <ø> (ø)
tests/unit/active/test_active_send.cc 100.00% <ø> (ø)
tests/unit/active/test_active_send_put.cc 100.00% <ø> (ø)
tests/unit/collection/test_broadcast.cc 100.00% <ø> (ø)
tests/unit/collection/test_broadcast.extended.cc 100.00% <ø> (ø)
tests/unit/collection/test_broadcast.h 7.14% <ø> (ø)
.../unit/collection/test_construct_no_idx.extended.cc 22.22% <ø> (ø)
... and 149 more

@lifflander lifflander self-assigned this Oct 9, 2020
@lifflander lifflander marked this pull request as ready for review October 9, 2020 03:37
@lifflander
Copy link
Collaborator Author

Looks like a test is failing consistently on gcc-9 with these changes. Have to investigate that.

@pnstickne
Copy link
Contributor

Not sure if the GCC test failure is related (doesn't immediately appear to be). Re-running.

@pnstickne
Copy link
Contributor

Hmm, it's consistent failure:

vt: Runtime Initializing: interop=true: mode: single-thread per rank
2101
vt: Program: termination_basic (./termination_basic)
2102
vt: Running on: 2 nodes
2103
vt: Machine Hostname: 1a1f35f1ebc8
2104
vt: MPI Version: 3.1
2105
vt: MPI Max tag: 268435455
2106
vt: Build SHA: 9ebed97295b994d4af8063f57ef5fc0241283d9c
2107
vt: Build Ref: 
2108
vt: Description: remotes/pull/1106/merge-0-g9ebed97295 
2109
vt: Compile-time Features Enabled:
2110
vt: 	C++ Trait Detector
2111
vt: 	Load Balancing for Collections
2112
vt: 	OpenMP Threading
2113
vt: 	Production Build
2114
vt: 	Message priorities
2115
vt: 	Memory Pooling
2116
vt: 	MPI access guards
2117
vt: 	Zoltan for load balancing
2118
vt: Runtime Configuration:
2119
vt: 	Option: flag --vt_sched_num_progress on: Running MPI progress 2 times each invocation
2120
vt: 	Option: flag --vt_sched_progress_han on: Running MPI progress function at least every 0 handler(s) executed
2121
vt: 	Option: flag --vt_print_no_progress on: Printing warnings when progress is stalls
2122
vt: 	Default: Printing verbose epoch graphs when hang detected, use --vt_epoch_graph_terse to disable
2123
vt: 	Option: flag --vt_epoch_graph_on_hang on: Epoch graph output enabled if hang detected
2124
vt: 	Default: Termination hang detection enabled by default, use --vt_no_detect_hang to disable
2125
vt: 	Option: flag --vt_hang_detect on: Printing stall warning every 1024 tree traversals 
2126
vt: 	Default: SIGINT signal handling enabled by default, use --vt_no_SIGINT to disable
2127
vt: 	Default: SIGSEGV signal handling enabled by default, use --vt_no_SIGSEGV to disable
2128
vt: 	Default: std::terminate signal handling enabled by default, use --vt_no_terminate to disable
2129
vt: 	Default: Color output enabled, use --vt_no_color to disable
2130
vt: 	Default: Stack dumps enabled by default, use --vt_no_stack to disable
2131
vt: 	Option: flag --vt_memory_reporters on: Memory usage checker precedence: mstats,machinfo,selfstat,selfstatm,sbrk,mallinfo,getrusage,ps
2132
vt: 	Working memory reporters: selfstatm,sbrk,mallinfo,getrusage
2133
vt: 	Initial memory usage: selfstatm=14.2891 sbrk=0.128906 mallinfo=0.564301 getrusage=14.2891 (MiB)
2134
vt: 	Default: Memory usage printing disabled, use --vt_print_memory_each_phase to enable
2135
vt: Pass-through Arguments:
2136
vt: 	None. All arguments handled.
2137
vt: [0] termination: Termination counts constant (no progress) for: traversals=1024 epoch=fffffffffffffc18 produced=39 consumed=31 rooted=true, ds=false
2138
vt: [0] termination: Termination counts constant (no progress) for: traversals=2048 epoch=fffffffffffffc18 produced=39 consumed=31 rooted=true, ds=false
2139
vt: [0] termination: Termination counts constant (no progress) for: traversals=3072 epoch=fffffffffffffc18 produced=39 consumed=31 rooted=true, ds=false
2140
vt: [0] termination: Termination counts constant (no progress) for: traversals=4096 epoch=fffffffffffffc18 produced=39 consumed=31 rooted=true, ds=false
2141
vt: [0] termination: Termination counts constant (no progress) for: traversals=5120 epoch=fffffffffffffc18 produced=39 consumed=31 rooted=true, ds=false
2142
vt: [0] termination: Termination counts constant (no progress) for: traversals=6144 epoch=fffffffffffffc18 produced=39 consumed=31 rooted=true, ds=false
2143
vt: [0] termination: Termination counts constant (no progress) for: traversals=7168 epoch=fffffffffffffc18 produced=39 consumed=31 rooted=true, ds=false
2144
vt: [0] termination: Termination counts constant (no progress) for: traversals=8192 epoch=fffffffffffffc18 produced=39 consumed=31 rooted=true, ds=false
2145
vt: [0] termination: Termination counts constant (no progress) for: traversals=9216 epoch=fffffffffffffc18 produced=39 consumed=31 rooted=true, ds=false
2146
vt: [0] termination: Termination counts constant (no progress) for: traversals=10240 epoch=fffffffffffffc18 produced=39 consumed=31 rooted=true, ds=false
2147
0:hangCheckHandler
2148
1:hangCheckHandler
2149
vt: [0] termination: Detected hang: write graph to file=true
2150
file=epoch_graph.0.dot
2151
file=epoch_graph.1.dot
2152
file desc=0x56258b395590
2153
file desc=0x55c63e2eb5a0
2154
file=epoch_graph.global.dot
2155
file desc=0x55c63e2eb5a0
2156
vt: [0] ------------------------------------------------------------------------------------------------------------------------
2157
vt: [0] ------------------------------------------- Runtime Error: System Aborting! --------------------------------------------
2158
vt: [0] ------------------------------------------------ Fatal Error on Node 0 -------------------------------------------------
2159
vt: [0] ------------------------------------------------------------------------------------------------------------------------
2160
vt: [0] 
2161
vt: [0] Message: FATAL ERROR: Detected hang indicating no further progress is possible
2162

2163
vt: [0] 
2164
vt: [0] ------------------------------------------------------------------------------------------------------------------------
2165
vt: [0] -------------------------------------------- Dump Stack Backtrace on Node 0 --------------------------------------------
2166
vt: [0] ------------------------------------------------------------------------------------------------------------------------
2167
vt: [0] 0   18  0x55c63d207ac5 vt::debug::stack::dumpStack[abi:cxx11](int) + 69
2168
vt: [0] 1   18  0x55c63d19ec32 vt::runtime::Runtime::output(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, bool, bool, bool) + 2738
2169
vt: [0] 2   18  0x55c63d19eeee vt::runtime::Runtime::abort(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) + 158
2170
vt: [0] 3   18  0x55c63d21170c vt::CollectiveAnyOps<(vt::runtime::eRuntimeInstance)0>::abort(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) + 92
2171
vt: [0] 4   18  0x55c63d1fd276 vt::abort(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) + 70
2172
vt: [0] 5   18  0x55c63d1c9f73 vt::term::TerminationDetector::propagateEpoch(vt::term::TermState&) + 2595
2173
vt: [0] 6   18  0x55c63cfde183 vt::runnable::Runnable<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> >::run(long, void (*)(vt::messaging::BaseMsg*), vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope>*, short, int) + 67
2174
vt: [0] 7   18  0x55c63d0b8865 vt::messaging::ActiveMessenger::deliverActiveMsg(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> > const&, short const&, bool, std::function<void ()>) + 1269
2175
vt: [0] 8   18  0x55c63d0bb3d0 vt::messaging::ActiveMessenger::processActiveMsg(vt::messaging::MsgSharedPtr<vt::messaging::ActiveMsg<vt::messaging::ActiveEnvelope> > const&, short const&, int const&, bool, std::function<void ()>) + 272
2176
vt: [0] 9   18  0x55c63d0bb49f ./termination_basic(+0x35549f) [0x55c63d0bb49f] + 0
2177
vt: [0] 10  18  0x55c63d0946b2 vt::sched::Scheduler::runWorkUnit(vt::sched::PriorityUnit&) + 50
2178
vt: [0] 11  18  0x55c63d0977f3 vt::sched::Scheduler::scheduler(bool) + 259
2179
vt: [0] 12  18  0x55c63cfb80b7 vt::tests::unit::epochs::TestTermCollect_test_term_detect_collect_epoch_Test::TestBody() + 1143
2180
vt: [0] 13  18  0x55c63d07e041 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 81
2181
vt: [0] 14  18  0x55c63d0719ba ./termination_basic(+0x30b9ba) [0x55c63d0719ba] + 0
2182
vt: [0] 15  18  0x55c63d071e52 ./termination_basic(+0x30be52) [0x55c63d071e52] + 0
2183
vt: [0] 16  18  0x55c63d07205e ./termination_basic(+0x30c05e) [0x55c63d07205e] + 0
2184
vt: [0] 17  18  0x55c63d07311d testing::internal::UnitTestImpl::RunAllTests() + 3485
2185
vt: [0] 18  18  0x55c63d073688 testing::UnitTest::Run() + 152
2186
vt: [0] 19  18  0x55c63cf86378 main + 72
2187
vt: [0] 20  18  0x7fcfa56ea0b3 __libc_start_main + 243
2188
vt: [0] 21  18  0x55c63cf9125e _start + 46
2189

@pnstickne pnstickne self-requested a review October 10, 2020 07:36
@lifflander lifflander force-pushed the 1105-speed-up-builds branch from 3e75f54 to 5c62334 Compare October 10, 2020 17:56
Copy link
Collaborator Author

Codacy Here is an overview of what got changed by this pull request:

Clones added
============
- tests/unit/termination/test_termination_action_callable.extended.cc  1
- tests/unit/termination/test_termination_action_common.cc  1
         

Clones removed
==============
+ tests/unit/pipe/test_callback_send.cc  -1
+ tests/unit/pipe/test_callback_bcast.cc  -1
+ tests/unit/pipe/test_callback_func_ctx.cc  -1
+ tests/unit/pipe/test_callback_func.cc  -1
         

See the complete overview on Codacy

@lifflander
Copy link
Collaborator Author

@pnstickne I've fixed it. Static linkage problem with the unity build with header includes.

@lifflander lifflander merged commit 3d06389 into develop Oct 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overhaul build system to reduce memory usage and speed up builds
2 participants