Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update branding in docs #2

Merged
merged 33 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
5e965ff
Update branding in docs
dgaliffiAMD Sep 29, 2024
3d1fb76
Rename image used in documentation
dgaliffiAMD Sep 29, 2024
caa5fb8
Update names of code samples.
dgaliffiAMD Sep 29, 2024
48e7b0a
Update ASCII art
dgaliffiAMD Sep 29, 2024
8dc8206
update Doxyfile strip_from_path
peterjunpark Oct 4, 2024
9d2be64
Add a "Formerly known as" message.
dgaliffiAMD Oct 7, 2024
cf77eb9
Fixed typo in product name
dgaliffiAMD Oct 7, 2024
d7b60cc
Add "Omnitrace" back to the metadata keywords
dgaliffiAMD Oct 7, 2024
06ca9e7
Update "install via package manager" section
dgaliffiAMD Oct 7, 2024
05d6d55
Update paths to user API files
dgaliffiAMD Oct 7, 2024
3d229c1
Rename configuration and environment settings
dgaliffiAMD Oct 7, 2024
ca55f19
Update Doxyfiles
dgaliffiAMD Oct 7, 2024
ba518f9
Update docs/what-is-rocprof-sys.rst
dgaliffiAMD Oct 7, 2024
152fe7e
Update docs/conceptual/data-collection-modes.rst
dgaliffiAMD Oct 7, 2024
15994e9
Update docs/tutorials/video-tutorials.rst
dgaliffiAMD Oct 7, 2024
0008e4f
Update docs/conceptual/rocprof-sys-feature-set.rst
dgaliffiAMD Oct 7, 2024
f1d3cfb
Update docs/how-to/configuring-runtime-options.rst
dgaliffiAMD Oct 7, 2024
d9452be
Update docs/how-to/configuring-validating-environment.rst
dgaliffiAMD Oct 7, 2024
b8b59e5
Update docs/how-to/general-tips-using-rocprof-sys.rst
dgaliffiAMD Oct 7, 2024
a8e43ef
Update docs/reference/rocprof-sys-glossary.rst
dgaliffiAMD Oct 7, 2024
fd6aaea
Update docs/reference/development-guide.rst
dgaliffiAMD Oct 7, 2024
a997b4c
Update docs/how-to/instrumenting-rewriting-binary-application.rst
dgaliffiAMD Oct 7, 2024
fec705b
Update docs/install/quick-start.rst
dgaliffiAMD Oct 7, 2024
5f82d22
Note that videos were recorded using the "Omnitrace" name.
dgaliffiAMD Oct 8, 2024
3562b2d
Rebase and update some file paths
dgaliffiAMD Oct 16, 2024
32cd63c
Update paths to doc images
dgaliffiAMD Oct 17, 2024
c196cc4
Update Omnitrace references in code snippets
dgaliffiAMD Oct 17, 2024
aed1428
Rename examples still using the "omni" prefix.
dgaliffiAMD Oct 17, 2024
3a51968
Update docs/how-to/performing-causal-profiling.rst
dgaliffiAMD Oct 17, 2024
7e71feb
Update docs/how-to/profiling-python-scripts.rst
dgaliffiAMD Oct 17, 2024
f118dc4
Update docs/how-to/sampling-call-stack.rst
dgaliffiAMD Oct 17, 2024
7a6e3a1
Update docs/how-to/understanding-rocprof-sys-output.rst
dgaliffiAMD Oct 17, 2024
9fb0fac
Update docs/install/install.rst
dgaliffiAMD Oct 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 41 additions & 40 deletions docs/conceptual/data-collection-modes.rst
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
.. meta::
:description: Omnitrace documentation and reference
:keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD
:description: ROCm Systems Profiler data collection modes documentation
:keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, profiler, data collection, tracking, visualization, tool, Instinct, accelerator, AMD

**********************
Data collection modes
**********************

Omnitrace supports several modes of recording trace and profiling data for your application.
ROCm Systems Profiler supports several modes of recording trace and profiling data for your application.

.. note::
For an explanation of the terms used in this topic, see
the :doc:`Omnitrace glossary <../reference/omnitrace-glossary>`.

For an explanation of the terms used in this topic, see
the :doc:`ROCm Systems Profiler glossary <../reference/rocprof-sys-glossary>`.

+-----------------------------+---------------------------------------------------------+
| Mode | Description |
Expand All @@ -23,61 +23,62 @@ Omnitrace supports several modes of recording trace and profiling data for your
| | and records various metrics for the given call stack |
+-----------------------------+---------------------------------------------------------+
| Callback APIs | Parallelism frameworks such as ROCm, OpenMP, and Kokkos |
| | make callbacks into Omnitrace to provide information |
| | about the work the API is performing |
| | make callbacks into ROCm Systems Profiler to provide |
| | information about the work the API is performing |
+-----------------------------+---------------------------------------------------------+
| Dynamic Symbol Interception | Wrap function symbols defined in a position independent |
| | dynamic library/executable, like ``pthread_mutex_lock`` |
| | in ``libpthread.so`` or ``MPI_Init`` in the MPI library |
+-----------------------------+---------------------------------------------------------+
| User API | User-defined regions and controls for Omnitrace |
| User API | User-defined regions and controls for ROCm Systems |
| | Profiler |
+-----------------------------+---------------------------------------------------------+

The two most generic and important modes are binary instrumentation and statistical sampling.
The two most generic and important modes are binary instrumentation and statistical sampling.
It is important to understand their advantages and disadvantages.
Binary instrumentation and statistical sampling can be performed with the ``omnitrace-instrument``
Binary instrumentation and statistical sampling can be performed with the ``rocprof-sys-instrument``
executable. For statistical sampling, it's highly recommended to use the
``omnitrace-sample`` executable instead if binary instrumentation isn't required or needed.
``rocprof-sys-sample`` executable instead if binary instrumentation isn't required or needed.
Callback APIs and dynamic symbol interception can be utilized with either tool.

Binary instrumentation
-----------------------------------

Binary instrumentation lets you record deterministic measurements for
Binary instrumentation lets you record deterministic measurements for
every single invocation of a given function.
Binary instrumentation effectively adds instructions to the target application to
collect the required information. It therefore has the potential to cause performance
changes which might, in some cases, lead to inaccurate results. The effect depends on
the information being collected and which features are activated in Omnitrace.
Binary instrumentation effectively adds instructions to the target application to
collect the required information. It therefore has the potential to cause performance
changes which might, in some cases, lead to inaccurate results. The effect depends on
the information being collected and which features are activated in ROCm Systems Profiler.
For example, collecting only the wall-clock timing data
has less of an effect than collecting the wall-clock timing, CPU-clock timing,
memory usage, cache-misses, and number of instructions that were run. Similarly,
collecting a flat profile has less overhead than a hierarchical profile
and collecting a trace OR a profile has less overhead than collecting a
has less of an effect than collecting the wall-clock timing, CPU-clock timing,
memory usage, cache-misses, and number of instructions that were run. Similarly,
collecting a flat profile has less overhead than a hierarchical profile
and collecting a trace OR a profile has less overhead than collecting a
trace AND a profile.

In Omnitrace, the primary heuristic for controlling the overhead with binary
instrumentation is the minimum number of instructions for selecting functions
In ROCm Systems Profiler, the primary heuristic for controlling the overhead with binary
instrumentation is the minimum number of instructions for selecting functions
for instrumentation.

Statistical sampling
-----------------------------------

Statistical call-stack sampling periodically interrupts the application at
Statistical call-stack sampling periodically interrupts the application at
regular intervals using operating system interrupts.
Sampling is typically less numerically accurate and specific, but the
Sampling is typically less numerically accurate and specific, but the
target program runs at nearly full speed.
In contrast to the data derived from binary instrumentation, the resulting
In contrast to the data derived from binary instrumentation, the resulting
data is not exact but is instead a statistical approximation.
However, sampling often provides a more accurate picture of the application
However, sampling often provides a more accurate picture of the application
execution because it is less intrusive to the target application and has fewer
side effects on memory caches or instruction decoding pipelines. Furthermore,
side effects on memory caches or instruction decoding pipelines. Furthermore,
because sampling does not affect the execution speed as much, is it
relatively immune to over-evaluating the cost of small, frequently called
relatively immune to over-evaluating the cost of small, frequently called
functions or "tight" loops.

In Omnitrace, the overhead for statistical sampling depends on the
sampling rate and whether the samples are taken with respect to the CPU time
In ROCm Systems Profiler, the overhead for statistical sampling depends on the
sampling rate and whether the samples are taken with respect to the CPU time
and/or real time.

Binary instrumentation vs. statistical sampling example
Expand Down Expand Up @@ -112,24 +113,24 @@ Consider the following code:
return 0;
}

Binary instrumentation of the ``fib`` function will record **every single invocation**
Binary instrumentation of the ``fib`` function will record **every single invocation**
of the function. For a very small function
such as ``fib``, this results in **significant** overhead since this simple function
such as ``fib``, this results in **significant** overhead since this simple function
takes about 20 instructions, whereas the entry and
exit snippets are ~1024 instructions. Therefore, you generally want to avoid
exit snippets are ~1024 instructions. Therefore, you generally want to avoid
instrumenting functions where the instrumented function has significantly fewer
instructions than entry and exit instrumentation. (Note that many of the
instructions than entry and exit instrumentation. (Note that many of the
instructions in entry and exit functions are either logging functions or
depend on the runtime settings and thus might never run). However,
depend on the runtime settings and thus might never run). However,
due to the number of potential instructions in the entry and exit snippets,
the default behavior of ``omnitrace-instrument`` is to only instrument functions
the default behavior of ``rocprof-sys-instrument`` is to only instrument functions
which contain at least 1024 instructions.

However, recording every single invocation of the function can be extremely
However, recording every single invocation of the function can be extremely
useful for detecting anomalies, such as profiles that show minimum or maximum values much smaller or larger
than the average or a high standard deviation. In this case, the traces help you
than the average or a high standard deviation. In this case, the traces help you
identify exactly when and where those instances deviated from the norm.
Compare the level of detail in the following traces. In the top image,
Compare the level of detail in the following traces. In the top image,
every instance of the ``fib`` function is instrumented, while in the bottom image,
the ``fib`` call-stack is derived via sampling.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
.. meta::
:description: Omnitrace documentation and reference
:keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD
:description: ROCm Systems Profiler feature set documentation and reference
:keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, profiler, feature set, use cases, tracking, visualization, tool, Instinct, accelerator, AMD

***************************************
The Omnitrace feature set and use cases
The ROCm Systems Profiler feature set and use cases
***************************************

`Omnitrace <https://github.com/ROCm/omnitrace>`_ is designed to be highly extensible.
Internally, it leverages the `Timemory performance analysis toolkit <https://github.com/NERSC/timemory>`_
to manage extensions, resources, data, and other items. It supports the following features,
`ROCm Systems Profiler <https://github.com/ROCm/rocprofiler-systems>`_ is designed to be highly extensible.
Internally, it leverages the `Timemory performance analysis toolkit <https://github.com/NERSC/timemory>`_
to manage extensions, resources, data, and other items. It supports the following features,
modes, metrics, and APIs.

Data collection modes
Expand All @@ -22,11 +22,6 @@ Data collection modes
* Statistical sampling: Periodic software interrupts per-thread
* Process-level sampling: A background thread records process-, system- and device-level metrics while the application runs
* Causal profiling: Quantifies the potential impact of optimizations in parallel code

.. note::

Critical trace support was removed in Omnitrace v1.11.0.
It was replaced by the causal profiling feature.

Data analysis
========================================
Expand Down Expand Up @@ -98,40 +93,40 @@ Third-party API support
* NVTX
* ROCTX

Omnitrace use cases
ROCm Systems Profiler use cases
========================================

When analyzing the performance of an application, do NOT
When analyzing the performance of an application, do NOT
assume you know where the performance bottlenecks are
and why they are happening. Omnitrace is a tool for analyzing the entire
and why they are happening. ROCm Systems Profiler is a tool for analyzing the entire
application and its performance. It is
ideal for characterizing where optimization would have the greatest impact
ideal for characterizing where optimization would have the greatest impact
on an end-to-end run of the application and for
viewing what else is happening on the system during a performance bottleneck.

When GPUs are involved, there is a tendency to assume that
When GPUs are involved, there is a tendency to assume that
the quickest path to performance improvement is minimizing
the runtime of the GPU kernels. This is a highly flawed assumption.
the runtime of the GPU kernels. This is a highly flawed assumption.
If you optimize the runtime of a kernel from one millisecond
to 1 microsecond (1000x speed-up) but the original application never
to 1 microsecond (1000x speed-up) but the original application never
spent time waiting for kernels to complete,
there would be no statistically significant reduction in the end-to-end
there would be no statistically significant reduction in the end-to-end
runtime of your application. In other words, it does not matter
how fast or slow the code on GPU is if the application has a
how fast or slow the code on GPU is if the application has a
bottleneck on waiting on the GPU.

Use Omnitrace to obtain a high-level view of the entire application. Use it
Use ROCm Systems Profiler to obtain a high-level view of the entire application. Use it
to determine where the performance bottlenecks are and
obtain clues to why these bottlenecks are happening. Rather than worrying about kernel
performance, start your investigation with Omnitrace, which characterizes the
performance, start your investigation with ROCm Systems Profiler, which characterizes the
broad picture.

.. note::

For insight into the execution of individual kernels on the GPU,
use `Omniperf <https://github.com/rocm/omniperf>`_.
For insight into the execution of individual kernels on the GPU,
use `ROCm Compute Profiler <https://github.com/rocm/rocprofiler-compute>`_.
dgaliffiAMD marked this conversation as resolved.
Show resolved Hide resolved

In terms of CPU analysis, Omnitrace does not target any specific vendor.
In terms of CPU analysis, ROCm Systems Profiler does not target any specific vendor.
It works just as well on AMD and non-AMD CPUs.
With regard to the GPU, Omnitrace is currently restricted to HIP and HSA APIs
With regard to the GPU, ROCm Systems Profiler is currently restricted to HIP and HSA APIs
and kernels running on AMD GPUs.
6 changes: 3 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,14 @@
raise ValueError("VERSION not found!")
version_number = match[1]

external_projects_current_project = "omnitrace"
external_projects_current_project = "rocprofiler-systems"

project = "omnitrace"
project = "rocprofiler-systems"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved."
version = version_number
release = version_number
html_title = f"Omnitrace {version} documentation"
html_title = f"ROCm Systems Profiler {version} documentation"

external_toc_path = "./sphinx/_toc.yml"

Expand Down
Binary file removed docs/data/omnitrace-perfetto.png
Binary file not shown.
Binary file removed docs/data/omnitrace-rocm-flow.png
Binary file not shown.
Binary file removed docs/data/omnitrace-rocm.png
Binary file not shown.
Binary file removed docs/data/omnitrace-user-api.png
Binary file not shown.
24 changes: 12 additions & 12 deletions docs/doxygen/Doxyfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# Project related configuration options
#---------------------------------------------------------------------------
DOXYFILE_ENCODING = UTF-8
PROJECT_NAME = omnitrace
PROJECT_NAME = rocprofiler-systems
PROJECT_NUMBER = 1.11.3
PROJECT_BRIEF = "High-level and comprehensive application tracing and profiling on both the CPU and GPU"
PROJECT_LOGO =
Expand All @@ -19,8 +19,8 @@ ABBREVIATE_BRIEF =
ALWAYS_DETAILED_SEC = YES
INLINE_INHERITED_MEMB = YES
FULL_PATH_NAMES = YES
STRIP_FROM_PATH = /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-omnitrace/checkouts/
STRIP_FROM_INC_PATH = /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-omnitrace/checkouts/
STRIP_FROM_PATH = /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rocprofiler-systems/checkouts/
STRIP_FROM_INC_PATH = /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rocprofiler-systems/checkouts/
SHORT_NAMES = NO
JAVADOC_AUTOBRIEF = NO
JAVADOC_BANNER = NO
Expand Down Expand Up @@ -114,10 +114,10 @@ WARN_LOGFILE = doc/warnings.log
# Configuration options related to the input files
#---------------------------------------------------------------------------
INPUT = ../../README.md \
../../source/lib/omnitrace-user/omnitrace/types.h \
../../source/lib/omnitrace-user/omnitrace/categories.h \
../../source/lib/omnitrace-user/omnitrace/user.h \
../../source/lib/omnitrace-user/omnitrace/causal.h
../../source/lib/rocprof-sys-user/rocprofiler-systems/types.h \
../../source/lib/rocprof-sys-user/rocprofiler-systems/categories.h \
../../source/lib/rocprof-sys-user/rocprofiler-systems/user.h \
../../source/lib/rocprof-sys-user/rocprofiler-systems/causal.h
INPUT_ENCODING = UTF-8
FILE_PATTERNS = *.h \
*.hh \
Expand Down Expand Up @@ -198,9 +198,9 @@ HTML_DYNAMIC_SECTIONS = YES
HTML_INDEX_NUM_ENTRIES = 1000
dgaliffiAMD marked this conversation as resolved.
Show resolved Hide resolved
GENERATE_DOCSET = NO
DOCSET_FEEDNAME = "Doxygen generated docs"
DOCSET_BUNDLE_ID = org.doxygen.omnitrace
DOCSET_PUBLISHER_ID = org.doxygen.amdresearch
DOCSET_PUBLISHER_NAME = "Audacious Software Group"
DOCSET_BUNDLE_ID = org.doxygen.rocprofiler-systems
DOCSET_PUBLISHER_ID = org.doxygen.amd
DOCSET_PUBLISHER_NAME = "Advanced Micro Devices, Inc."
GENERATE_HTMLHELP = NO
CHM_FILE =
HHC_LOCATION =
Expand All @@ -217,7 +217,7 @@ QHP_CUST_FILTER_ATTRS =
QHP_SECT_FILTER_ATTRS =
QHG_LOCATION =
GENERATE_ECLIPSEHELP = NO
ECLIPSE_DOC_ID = org.doxygen.omnitrace
ECLIPSE_DOC_ID = org.doxygen.rocprofiler-systems
DISABLE_INDEX = NO
GENERATE_TREEVIEW = NO
ENUM_VALUES_PER_LINE = 1
Expand Down Expand Up @@ -311,7 +311,7 @@ ENABLE_PREPROCESSING = YES
MACRO_EXPANSION = YES
EXPAND_ONLY_PREDEF = NO
SEARCH_INCLUDES = YES
INCLUDE_PATH = ../../source/lib/omnitrace-user
INCLUDE_PATH = ../../source/lib/rocprof-sys-user
INCLUDE_FILE_PATTERNS = *.h \
*.hpp
PREDEFINED = ROCPROFSYS_PUBLIC_API= \
Expand Down
Loading