Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1435: Print time units and adjust accordingly for LB times #1436

Merged
merged 26 commits into from
Jan 13, 2022

Conversation

cz4rs
Copy link
Contributor

@cz4rs cz4rs commented May 12, 2021

fixes #1435

@cz4rs cz4rs force-pushed the 1435-print-time-units-for-lb-times branch from 2d5e157 to 2836a7f Compare May 12, 2021 18:34
@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (gcc-5, ubuntu, mpich)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (gcc-6, ubuntu, mpich)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (clang-3.9, ubuntu, mpich)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (clang-5.0, ubuntu, mpich)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (clang-10, ubuntu, mpich)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (clang-9, ubuntu, mpich)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (intel 19, ubuntu, mpich)

Build for 5ec7cb6

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (gcc-10, ubuntu, openmpi, no LB)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (nvidia cuda 11.0, ubuntu, mpich)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (gcc-7, ubuntu, mpich, trace runtime, LB)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@cz4rs cz4rs marked this pull request as draft May 12, 2021 19:06
@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (intel 18.03, ubuntu, mpich)

Build for 5ec7cb6

Compilation - successful

Testing - passed

Build log

@cz4rs
Copy link
Contributor Author

cz4rs commented May 12, 2021

converting to a draft as there is more work in BaseLB to be done:

vt_print(
lb,
"BaseLB: Statistic={}: "
" max={:.2f}, min={:.2f}, sum={:.2f}, avg={:.2f}, var={:.2f},"
" stdev={:.2f}, nproc={}, cardinality={} skewness={:.2f}, kurtosis={:.2f},"
" npr={}, imb={:.2f}, num_stats={}\n",
lb_stat_name_[the_stat],
max, min, sum, avg, var, stdv, npr, car, skew, krte, npr, imb,
stats.size()
);

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (gcc-9, ubuntu, mpich, zoltan)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (gcc-8, ubuntu, mpich, address sanitizer)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (clang-10, alpine, mpich)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@github-actions
Copy link

github-actions bot commented May 12, 2021

PR tests (nvidia cuda 10.1, ubuntu, mpich)

Build for 65af4ee

Compilation - successful

Testing - passed

Build log

@cz4rs cz4rs force-pushed the 1435-print-time-units-for-lb-times branch from ecd89eb to 587d81e Compare May 12, 2021 21:52
@codecov
Copy link

codecov bot commented May 12, 2021

Codecov Report

Merging #1436 (65af4ee) into develop (f88c3ba) will increase coverage by 0.01%.
The diff coverage is 80.72%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1436      +/-   ##
===========================================
+ Coverage    83.51%   83.52%   +0.01%     
===========================================
  Files          794      797       +3     
  Lines        30784    30816      +32     
===========================================
+ Hits         25708    25739      +31     
- Misses        5076     5077       +1     
Impacted Files Coverage Δ
src/vt/configs/debug/debug_fmt.h 0.00% <ø> (ø)
src/vt/configs/error/assert_out.impl.h 38.46% <ø> (ø)
src/vt/configs/error/assert_out_info.impl.h 0.00% <ø> (ø)
src/vt/configs/error/error.impl.h 100.00% <ø> (ø)
src/vt/configs/error/keyval_printer.impl.h 0.00% <ø> (ø)
src/vt/configs/error/pretty_print_message.cc 96.66% <ø> (ø)
src/vt/configs/error/soft_error.h 90.00% <ø> (ø)
src/vt/configs/error/stack_out.cc 97.05% <ø> (ø)
src/vt/elm/elm_stats.h 100.00% <ø> (ø)
src/vt/event/event.cc 60.26% <0.00%> (ø)
... and 28 more

@cz4rs
Copy link
Contributor Author

cz4rs commented May 13, 2021

Before:

vt: [0] (t) lb: BaseLB: Statistic=O_l:  max=0.00, min=0.00, sum=0.00, avg=0.00, var=0.00, stdev=0.00, nproc=64, cardinality=64 skewness=0.00, kurtosis=-1.25, npr=64, imb=0.88, num_stats=2

After:

vt: [0] (t) lb: BaseLB: Statistic=O_l:  max=103.5 ns, min=12.9 ns, sum=3.3 μs, avg=50.9 ns, var=0.0 ns, stdev=22.6 ns, nproc=64, cardinality=64 skewness=77.3 μs, kurtosis=-1060434.6 ns, npr=64, imb=1.0 ms, num_stats=2

FIXME: negative numbers are not handled correctly

kurtosis=-1060434.6 ns

edit:
Another thing to tackle is to stop converting statistics to milliseconds and use seconds (original unit) consistently across all LBs.

edit2:
An example after switching to engineering notation:

cz4rs   1435-print-time-units-for-lb-times  …  build  examples  collection  ./lb_iter 
vt: Runtime Initializing: mode: single-thread per rank
(...)
lb_iter: elms=64, weight=1, num_iter=8
0: iterWork: idx=idx(0)
iteration: iter=0,time=0.22737550735473633
0: iterWork: idx=idx(0)
iteration: iter=1,time=0.22463583946228027
vt: [0] (t) phase: PhaseManager::printSummary, phase=1, time=229e-3 s
0: iterWork: idx=idx(0)
iteration: iter=2,time=0.22517824172973633
vt: [0] (t) phase: PhaseManager::printSummary, phase=2, time=229e-3 s
0: iterWork: idx=idx(0)
iteration: iter=3,time=0.22557663917541504
vt: [0] (t) phase: PhaseManager::printSummary, phase=3, time=230e-3 s
(...)
vt: Runtime Finalizing

@lifflander
Copy link
Collaborator

@cz4rs @nlslatt I'm wondering if we should have a print out of how long each phase takes by default. If we are printing finishedLB anyway, we could also print a timer that could be very convenient for some apps.

@PhilMiller
Copy link
Member

I believe kurtosis and probably some of the other statistics are not in dimensions of time, and so shouldn't be printed that way. Imbalance is dimensionless

@cz4rs cz4rs force-pushed the 1435-print-time-units-for-lb-times branch from ad6995f to f89084a Compare January 12, 2022 08:10
@cz4rs
Copy link
Contributor Author

cz4rs commented Jan 12, 2022

Note: the codecov/patch failure doesn't seem concerning - this PR does a lot of minor changes to untested code, but the actual added functionality is covered with unit tests.

@cz4rs cz4rs force-pushed the 1435-print-time-units-for-lb-times branch from f89084a to 65af4ee Compare January 13, 2022 14:11
@PhilMiller PhilMiller merged commit 176190c into develop Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Print time units and adjust accordingly for LB times
5 participants