Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1469 Output LB statistics as JSON #1475

Merged
merged 62 commits into from
Jun 28, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
3f72cef
#1469: lib: add nlohmann/json library (v3.9.1)
lifflander Jun 12, 2021
b250f7b
#1469: lib: add brotli library (v1.0.9)
lifflander Jun 12, 2021
78b6c04
#1469: cmake: add brotli and json library to bundled build
lifflander Jun 12, 2021
6caf883
#1469: utils: implement streaming compressor using brotli interface
lifflander Jun 12, 2021
6630911
#1469: utils: implement output adaptor for compression json
lifflander Jun 12, 2021
4edac72
#1469: utils: implement incremental json appender with compression
lifflander Jun 12, 2021
87a8da5
#1469: cmake: add new directories to build
lifflander Jun 12, 2021
a2879ac
#1469: utils: implement base appender to reduce header deps
lifflander Jun 13, 2021
2fa8259
#1469: lb: implement JSON writer using streaming append
lifflander Jun 13, 2021
1b8ccf1
#1469: lb: fix a small bug
lifflander Jun 13, 2021
8f8d01b
#1469: lb: remove old code, use proper name for file
lifflander Jun 13, 2021
b96f9a0
#1469: utils: just call down the write_characters
lifflander Jun 14, 2021
1f1d37d
#1469: utils: add missing base appender (forgot to add)
lifflander Jun 14, 2021
774106f
#1469: utils: remove redundant assertions
lifflander Jun 14, 2021
4db70b0
#1469: utils: add missing virtual destructor
lifflander Jun 15, 2021
6392f57
#1469: lb: make StatData a seperate data structure
lifflander Jun 15, 2021
3115e6b
#1469: runtime: add flush on fatalError
lifflander Jun 16, 2021
4f75baf
#1469: utils: implement JSON reader for stat files, switch restart re…
lifflander Jun 16, 2021
fb690fc
#1469: utils: implement streaming decompressor
lifflander Jun 16, 2021
d888dbf
#1469: utils: implement decompression input container
lifflander Jun 16, 2021
2dbd218
#1469: lb: add index, write from/to JSON methods
lifflander Jun 16, 2021
0a949b5
#1469: util: automatically determine if compressed or not
lifflander Jun 16, 2021
d7998fe
#1469: args: add flag for disabling stat file compression
lifflander Jun 16, 2021
ba0aae4
#1469: args: use %p in file name for the rank
lifflander Jun 17, 2021
353ceff
#1469: util: generalize decompression over any stream-like T
lifflander Jun 17, 2021
0259864
#1469: util: improve error message when file is invalid
lifflander Jun 17, 2021
758b240
#1469: lb: fix missing tests when stat data is empty
lifflander Jun 17, 2021
d5bc1d0
#1469: util: remove/comment out debugging code
lifflander Jun 17, 2021
e10bfe6
#1469: util: add finish method to extract stream if desired
lifflander Jun 17, 2021
93e092d
#1469: lb: fix assumptions about JSON entites that exist
lifflander Jun 17, 2021
e5383e6
#1469: lb: fix stats restart reader to take data from a stream
lifflander Jun 17, 2021
4740a7b
#1469: test: rewrite restart test to do it all in-memory
lifflander Jun 17, 2021
5cdef77
#1469: test: rewrite node stats dumper test
lifflander Jun 17, 2021
96c75a7
#1469: lib: remove pkg_config causing failure on CI
lifflander Jun 17, 2021
b6591c8
#1469: lib: brotli cmake fixes for Intel and cmake
lifflander Jun 17, 2021
5534db6
#1469: lib: brotli add version to project command
lifflander Jun 17, 2021
58be599
#1469: lib: brotli explicitly set policy as NEW
lifflander Jun 17, 2021
b684943
#1469: lib: json library fix whitespace causing CI error
lifflander Jun 17, 2021
a26bb6a
#1469: lib: json library work around Intel warning
lifflander Jun 17, 2021
73f8fb1
#1469: util: fix warning (-1) for std::size_t
lifflander Jun 17, 2021
b613e13
#1469: lib: json fix Hedley TPL to force attributes off not properly …
lifflander Jun 17, 2021
a5387b4
#1469: lib: json fix warning in nvcc 11
lifflander Jun 17, 2021
4134874
#1469: docker: add support for nvidia nvcc 10.2 (useful in the future)
lifflander Jun 18, 2021
3c5b95d
#1469: lib: json work around nvcc 10.1 bug after identifying it
lifflander Jun 22, 2021
7a2f1da
#1469: license: fix headers with new template generated
lifflander Jun 22, 2021
dc479f4
#1469: args: fix duplicated code
lifflander Jun 22, 2021
6b740c5
#1469: utils: fix accidentally added whitespace
lifflander Jun 22, 2021
2c19c2e
#1469: lib: build brotli in portable mode to avoid undefined behavior
lifflander Jun 22, 2021
8f7d74f
#1469: lib: remove new option from brotli to avoid policy problems
lifflander Jun 22, 2021
4eebc6f
#1469: docs: update documentation on stat file output along with some…
lifflander Jun 22, 2021
92127ac
#1469: lb: read node from file instead of using this_node
lifflander Jun 24, 2021
be62a55
#1469: docs: fix typo about communication
lifflander Jun 24, 2021
c7d5099
#1469: util: improve error messages from Brotli
lifflander Jun 24, 2021
b771338
#1469: lb: optimize restart reader with emplace
lifflander Jun 24, 2021
77ab6ec
#1469: tests: simplify expression for equality
lifflander Jun 24, 2021
6d0ea70
#1469: util: use class variable for consistency
lifflander Jun 24, 2021
96110b3
#1469: util: abstract into variable for clarity
lifflander Jun 24, 2021
c16659b
#1469: util: change visibility to private
lifflander Jun 24, 2021
bf2fa1e
#1469: util: abstract isCompressed into a function in JSON reader
lifflander Jun 24, 2021
121711c
#1469: util: change type of buffer to uint8_t to reduce casting
lifflander Jun 25, 2021
89926e8
#1469: lb: use automatic conversion for vector
lifflander Jun 25, 2021
1852358
#1469: util: use const ref instead of std::unique_ptr when possible
cz4rs Jun 25, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion ci/docker/ubuntu-18.04-nvidia-cpp.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,19 @@ RUN if test ${compiler} = "nvcc-10"; then \
rm -rf /var/lib/apt/lists/* && \
rm -rf cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb && \
ln -s /usr/local/cuda-10.1 /usr/local/cuda-versioned; \
else \
elif test ${compiler} = "nvcc-10.2"; then \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin && \
mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 && \
wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb && \
dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb && \
apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub && \
apt-get update && \
apt-get -y install cuda && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
rm -rf cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb && \
ln -s /usr/local/cuda-10.2 /usr/local/cuda-versioned; \
else \
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin && \
mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600 && \
wget http://developer.download.nvidia.com/compute/cuda/11.0.1/local_installers/cuda-repo-ubuntu1804-11-0-local_11.0.1-450.36.06-1_amd64.deb && \
Expand Down
14 changes: 14 additions & 0 deletions cmake/link_vt.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ function(link_target_with_vt)
LINK_DL
LINK_ZOLTAN
LINK_FORT
LINK_JSON
LINK_BROTLI
)
set(
multiValueArg
Expand All @@ -49,6 +51,18 @@ function(link_target_with_vt)
message(STATUS "link_target_with_vt: default link=${ARG_DEFAULT_LINK_SET}")
endif()

if (NOT DEFINED ARG_LINK_JSON AND ${ARG_DEFAULT_LINK_SET} OR ARG_LINK_JSON)
target_link_libraries(
${ARG_TARGET} PRIVATE ${ARG_BUILD_TYPE} ${JSON_LIBRARY}
)
endif()

if (NOT DEFINED ARG_LINK_BROTLI AND ${ARG_DEFAULT_LINK_SET} OR ARG_LINK_BROTLI)
target_link_libraries(
${ARG_TARGET} PRIVATE ${ARG_BUILD_TYPE} ${BROTLI_LIBRARY}
)
endif()

if (NOT DEFINED ARG_LINK_FORT AND ${ARG_DEFAULT_LINK_SET} OR ARG_LINK_FORT)
if (vt_libfort_enabled)
target_link_libraries(
Expand Down
14 changes: 14 additions & 0 deletions cmake/load_bundled_libraries.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,20 @@ add_subdirectory(${PROJECT_LIB_DIR}/CLI)
set(FMT_LIBRARY fmt)
add_subdirectory(${PROJECT_LIB_DIR}/fmt)

# json library always included in the build
set(JSON_BuildTests OFF)
set(JSON_MultipleHeaders ON)
set(JSON_LIBRARY nlohmann_json)
add_subdirectory(${PROJECT_LIB_DIR}/json)

# brotli library always included in the build
set(BROTLI_DISABLE_TESTS ON)
# we need to disable bundled mode so it will install properly
set(BROTLI_BUNDLED_MODE OFF)
set(BROTLI_BUILD_PORTABLE ON)
set(BROTLI_LIBRARY brotlicommon brotlienc brotlidec)
add_subdirectory(${PROJECT_LIB_DIR}/brotli)

# Optionally include mimalloc (alternative memory allocator)
if (vt_mimalloc_enabled)
add_subdirectory(${PROJECT_LIB_DIR}/mimalloc)
Expand Down
4 changes: 3 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# clang-3.9, clang-4.0, clang-5.0, clang-6.0, clang-7, clang-8,
# clang-9, clang-10,
# icc-18, icc-19,
# nvcc-10, nvcc-11}
# nvcc-10, nvcc-10.2, nvcc-11}
# REPO=lifflander1/vt
# UBUNTU={18.04, 20.04}
# ULIMIT_CORE=0
Expand Down Expand Up @@ -87,6 +87,7 @@ volumes:
amd64-ubuntu-18.04-icc-19-cache:
amd64-ubuntu-18.04-icc-20-cache:
amd64-ubuntu-18.04-nvcc-10-cache:
amd64-ubuntu-18.04-nvcc-10.2-cache:
amd64-ubuntu-18.04-nvcc-11-cache:
amd64-alpine-clang-3.9-cache:
amd64-alpine-clang-4.0-cache:
Expand All @@ -106,6 +107,7 @@ volumes:
amd64-alpine-icc-19-cache:
amd64-alpine-icc-20-cache:
amd64-alpine-nvcc-10-cache:
amd64-alpine-nvcc-10.2-cache:
amd64-alpine-nvcc-11-cache:
arm64v8-ubuntu-18.04-gcc-7-cache:
arm64v8-alpine-gcc-7-cache:
Expand Down
158 changes: 148 additions & 10 deletions docs/md/node-stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,23 +29,161 @@ the statistics and mapping.

\subsection stats-file-format File Format

Each line in the file will one of two formats. The first line is a computation
time line for each phase, that breaks time down into subphases:
The VOM files are output in JSON format, either compressed with brotli
compression (default on) or pure JSON if the argument `--vt_lb_stats_compress`
is set to `false`.

\code
<phase>, <object-id>, <time-in-seconds> <#-of-subphases> '[' [<subphase-time-1>] ... [<subphase-time-N>] ']'
The JSON files contain an array of `phases` that have been captured by \vt and
output to the file. Each phase has an `id` indicating which phase it was while
the application was running. Each phase also has an array of `tasks` that
represent work that was done during that phase. Each task has a `time`,
`resource`, `node`, `entity`, and optionally a list of `subphases`. The `entity`
contains information about the task that performed this work. If that `entity`
is a virtual collection object, it will specify the unique `id` for the object,
and optionally the `index`, `home`, and `collection_id` for that object.

\code{.json}
{
"phases": [
{
"id": 0,
"tasks": [
{
"entity": {
"collection_id": 7,
"home": 0,
"id": 12884901888,
"index": [
3
],
"type": "object"
},
"node": 0,
"resource": "cpu",
"subphases": [
{
"id": 0,
"time": 0.014743804931640625
}
],
"time": 0.014743804931640625
},
{
"entity": {
"collection_id": 7,
"home": 0,
"id": 4294967296,
"index": [
1
],
"type": "object"
},
"node": 0,
"resource": "cpu",
"subphases": [
{
"id": 0,
"time": 0.013672113418579102
}
],
"time": 0.013672113418579102
}
]
},
{
"id": 1,
"tasks": [
{
"entity": {
"collection_id": 7,
"home": 0,
"id": 12884901888,
"index": [
3
],
"type": "object"
},
"node": 0,
"resource": "cpu",
"subphases": [
{
"id": 0,
"time": 0.014104127883911133
}
],
"time": 0.014104127883911133
}
]
}
]
}
\endcode

The second line format is a communication line:
Each phase in the file may also have a `communications` array that specify any
communication between tasks that occurred during the phase. Each communication
has `type`, which is described below in the following table. Additionally, it
specifies the `bytes`, number of `messages`, and the two entities that were
involved in the operator as `to` and `from`. The entities may be of different
types, like an `object` or `node` depending on the type of communication.

\code
<phase>, <object-id1-to/recv>, <object-id2-from/send>, <num-bytes>, <comm-type={1..6}>
\code{.json}
{
"phases": [
{
"communications": [
{
"bytes": 262.0,
"from": {
"home": 1,
"id": 1,
"type": "object"
},
"messages": 1,
"to": {
"home": 0,
"id": 4294967296,
"type": "object"
},
"type": "SendRecv"
},
{
"bytes": 96.0,
"from": {
"home": 0,
"id": 4294967296,
"type": "object"
},
"messages": 1,
"to": {
"id": 1,
"type": "node"
},
"type": "CollectionToNode"
},
{
"bytes": 259.0,
"from": {
"id": 0,
"type": "node"
},
"messages": 1,
"to": {
"home": 0,
"id": 0,
"type": "object"
},
"type": "NodeToCollection"
}
],
"id": 0
}
]
}
\endcode


Where `<comm-type>` is the type of communication occurred. The type of
communication lines up the enum `vt::vrt::collection::balance::CommCategory` in
the code.
The type of communication lines up with the enum
`vt::vrt::collection::balance::CommCategory` in the code.

| Value | Enum entry | Description |
| ----- | ---------- | ----------- |
Expand Down
Loading