From 5e965ff191ae7c3ed226c0eeff59c55ee1fe5867 Mon Sep 17 00:00:00 2001 From: David Galiffi Date: Sat, 28 Sep 2024 23:54:24 -0400 Subject: [PATCH 01/33] Update branding in docs --- docs/conceptual/data-collection-modes.rst | 81 +- ...re-set.rst => rocprof-sys-feature-set.rst} | 47 +- docs/conf.py | 6 +- docs/doxygen/Doxyfile | 12 +- docs/how-to/configuring-runtime-options.rst | 154 +- .../configuring-validating-environment.rst | 48 +- ...rst => general-tips-using-rocprof-sys.rst} | 34 +- ...rumenting-rewriting-binary-application.rst | 174 +- docs/how-to/performing-causal-profiling.rst | 194 +- docs/how-to/profiling-python-scripts.rst | 94 +- docs/how-to/sampling-call-stack.rst | 334 +- ...t => understanding-rocprof-sys-output.rst} | 74 +- ...race-api.rst => using-rocprof-sys-api.rst} | 111 +- docs/index.rst | 31 +- docs/install/install.rst | 143 +- docs/install/quick-start.rst | 21 +- docs/reference/development-guide.rst | 184 +- ...-glossary.rst => rocprof-sys-glossary.rst} | 78 +- docs/sphinx/_toc.yml.in | 38 +- docs/tutorials/video-tutorials.rst | 6 +- ...-omnitrace.rst => what-is-rocprof-sys.rst} | 14 +- source/docs/.gitignore | 5 - source/docs/.nojekyll | 0 source/docs/Makefile | 20 - source/docs/about.md | 53 - source/docs/causal_profiling.md | 535 --- source/docs/conf.py | 169 - source/docs/critical_trace.md | 10 - source/docs/development.md | 307 -- source/docs/environment.yml | 196 -- source/docs/features.md | 86 - source/docs/generate-doxyfile.cmake | 19 - source/docs/getting_started.md | 189 -- source/docs/images/causal-foobar.png | Bin 27358 -> 0 bytes source/docs/images/fibonacci-instrumented.png | Bin 108994 -> 0 bytes source/docs/images/fibonacci-sampling.png | Bin 417752 -> 0 bytes source/docs/images/rocprof-sys-perfetto.png | Bin 320757 -> 0 bytes source/docs/images/rocprof-sys-rocm-flow.png | Bin 199986 -> 0 bytes source/docs/images/rocprof-sys-rocm.png | Bin 235660 -> 0 bytes source/docs/images/rocprof-sys-user-api.png | Bin 283929 -> 0 bytes source/docs/index.md | 24 - source/docs/installation.md | 281 -- source/docs/instrumenting.md | 835 ----- source/docs/make.bat | 35 - source/docs/output.md | 888 ----- source/docs/python.md | 297 -- source/docs/rocprof-sys.dox.in | 2967 ----------------- source/docs/runtime.md | 1309 -------- source/docs/sampling.md | 353 -- source/docs/setup.md | 49 - source/docs/update-docs.sh | 36 - source/docs/update-doxygen.sh | 9 - source/docs/user_api.md | 270 -- source/docs/youtube.md | 23 - 54 files changed, 938 insertions(+), 9905 deletions(-) rename docs/conceptual/{omnitrace-feature-set.rst => rocprof-sys-feature-set.rst} (73%) rename docs/how-to/{general-tips-using-omnitrace.rst => general-tips-using-rocprof-sys.rst} (72%) rename docs/how-to/{understanding-omnitrace-output.rst => understanding-rocprof-sys-output.rst} (95%) rename docs/how-to/{using-omnitrace-api.rst => using-rocprof-sys-api.rst} (78%) rename docs/reference/{omnitrace-glossary.rst => rocprof-sys-glossary.rst} (75%) rename docs/{what-is-omnitrace.rst => what-is-rocprof-sys.rst} (72%) delete mode 100644 source/docs/.gitignore delete mode 100644 source/docs/.nojekyll delete mode 100644 source/docs/Makefile delete mode 100644 source/docs/about.md delete mode 100644 source/docs/causal_profiling.md delete mode 100644 source/docs/conf.py delete mode 100644 source/docs/critical_trace.md delete mode 100644 source/docs/development.md delete mode 100644 source/docs/environment.yml delete mode 100644 source/docs/features.md delete mode 100644 source/docs/generate-doxyfile.cmake delete mode 100644 source/docs/getting_started.md delete mode 100644 source/docs/images/causal-foobar.png delete mode 100644 source/docs/images/fibonacci-instrumented.png delete mode 100644 source/docs/images/fibonacci-sampling.png delete mode 100644 source/docs/images/rocprof-sys-perfetto.png delete mode 100644 source/docs/images/rocprof-sys-rocm-flow.png delete mode 100644 source/docs/images/rocprof-sys-rocm.png delete mode 100644 source/docs/images/rocprof-sys-user-api.png delete mode 100644 source/docs/index.md delete mode 100644 source/docs/installation.md delete mode 100644 source/docs/instrumenting.md delete mode 100644 source/docs/make.bat delete mode 100644 source/docs/output.md delete mode 100644 source/docs/python.md delete mode 100644 source/docs/rocprof-sys.dox.in delete mode 100644 source/docs/runtime.md delete mode 100644 source/docs/sampling.md delete mode 100644 source/docs/setup.md delete mode 100755 source/docs/update-docs.sh delete mode 100755 source/docs/update-doxygen.sh delete mode 100644 source/docs/user_api.md delete mode 100644 source/docs/youtube.md diff --git a/docs/conceptual/data-collection-modes.rst b/docs/conceptual/data-collection-modes.rst index 3c28b7b3..93fe2d17 100644 --- a/docs/conceptual/data-collection-modes.rst +++ b/docs/conceptual/data-collection-modes.rst @@ -1,17 +1,17 @@ .. meta:: - :description: Omnitrace documentation and reference - :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + :description: ROCm Profiler Systems documentation and reference + :keywords: rocprof-sys, rocprofiler-systems, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD ********************** Data collection modes ********************** -Omnitrace supports several modes of recording trace and profiling data for your application. +ROCm Systems Profiler supports several modes of recording trace and profiling data for your application. .. note:: - - For an explanation of the terms used in this topic, see - the :doc:`Omnitrace glossary <../reference/omnitrace-glossary>`. + + For an explanation of the terms used in this topic, see + the :doc:`ROCm Profiler Systems glossary <../reference/rocprof-sys-glossary>`. +-----------------------------+---------------------------------------------------------+ | Mode | Description | @@ -23,61 +23,62 @@ Omnitrace supports several modes of recording trace and profiling data for your | | and records various metrics for the given call stack | +-----------------------------+---------------------------------------------------------+ | Callback APIs | Parallelism frameworks such as ROCm, OpenMP, and Kokkos | -| | make callbacks into Omnitrace to provide information | -| | about the work the API is performing | +| | make callbacks into ROCm Systems Profiler to provide | +| | information about the work the API is performing | +-----------------------------+---------------------------------------------------------+ | Dynamic Symbol Interception | Wrap function symbols defined in a position independent | | | dynamic library/executable, like ``pthread_mutex_lock`` | | | in ``libpthread.so`` or ``MPI_Init`` in the MPI library | +-----------------------------+---------------------------------------------------------+ -| User API | User-defined regions and controls for Omnitrace | +| User API | User-defined regions and controls for ROCm Systems | +| | Profiler | +-----------------------------+---------------------------------------------------------+ -The two most generic and important modes are binary instrumentation and statistical sampling. +The two most generic and important modes are binary instrumentation and statistical sampling. It is important to understand their advantages and disadvantages. -Binary instrumentation and statistical sampling can be performed with the ``omnitrace-instrument`` +Binary instrumentation and statistical sampling can be performed with the ``rocprof-sys-instrument`` executable. For statistical sampling, it's highly recommended to use the -``omnitrace-sample`` executable instead if binary instrumentation isn't required or needed. +``rocprof-sys-sample`` executable instead if binary instrumentation isn't required or needed. Callback APIs and dynamic symbol interception can be utilized with either tool. Binary instrumentation ----------------------------------- -Binary instrumentation lets you record deterministic measurements for +Binary instrumentation lets you record deterministic measurements for every single invocation of a given function. -Binary instrumentation effectively adds instructions to the target application to -collect the required information. It therefore has the potential to cause performance -changes which might, in some cases, lead to inaccurate results. The effect depends on -the information being collected and which features are activated in Omnitrace. +Binary instrumentation effectively adds instructions to the target application to +collect the required information. It therefore has the potential to cause performance +changes which might, in some cases, lead to inaccurate results. The effect depends on +the information being collected and which features are activated in ROCm Systems Profiler. For example, collecting only the wall-clock timing data -has less of an effect than collecting the wall-clock timing, CPU-clock timing, -memory usage, cache-misses, and number of instructions that were run. Similarly, -collecting a flat profile has less overhead than a hierarchical profile -and collecting a trace OR a profile has less overhead than collecting a +has less of an effect than collecting the wall-clock timing, CPU-clock timing, +memory usage, cache-misses, and number of instructions that were run. Similarly, +collecting a flat profile has less overhead than a hierarchical profile +and collecting a trace OR a profile has less overhead than collecting a trace AND a profile. -In Omnitrace, the primary heuristic for controlling the overhead with binary -instrumentation is the minimum number of instructions for selecting functions +In ROCm Systems Profiler, the primary heuristic for controlling the overhead with binary +instrumentation is the minimum number of instructions for selecting functions for instrumentation. Statistical sampling ----------------------------------- -Statistical call-stack sampling periodically interrupts the application at +Statistical call-stack sampling periodically interrupts the application at regular intervals using operating system interrupts. -Sampling is typically less numerically accurate and specific, but the +Sampling is typically less numerically accurate and specific, but the target program runs at nearly full speed. -In contrast to the data derived from binary instrumentation, the resulting +In contrast to the data derived from binary instrumentation, the resulting data is not exact but is instead a statistical approximation. -However, sampling often provides a more accurate picture of the application +However, sampling often provides a more accurate picture of the application execution because it is less intrusive to the target application and has fewer -side effects on memory caches or instruction decoding pipelines. Furthermore, +side effects on memory caches or instruction decoding pipelines. Furthermore, because sampling does not affect the execution speed as much, is it -relatively immune to over-evaluating the cost of small, frequently called +relatively immune to over-evaluating the cost of small, frequently called functions or "tight" loops. -In Omnitrace, the overhead for statistical sampling depends on the -sampling rate and whether the samples are taken with respect to the CPU time +In ROCm Systems Profiler, the overhead for statistical sampling depends on the +sampling rate and whether the samples are taken with respect to the CPU time and/or real time. Binary instrumentation vs. statistical sampling example @@ -112,24 +113,24 @@ Consider the following code: return 0; } -Binary instrumentation of the ``fib`` function will record **every single invocation** +Binary instrumentation of the ``fib`` function will record **every single invocation** of the function. For a very small function -such as ``fib``, this results in **significant** overhead since this simple function +such as ``fib``, this results in **significant** overhead since this simple function takes about 20 instructions, whereas the entry and -exit snippets are ~1024 instructions. Therefore, you generally want to avoid +exit snippets are ~1024 instructions. Therefore, you generally want to avoid instrumenting functions where the instrumented function has significantly fewer -instructions than entry and exit instrumentation. (Note that many of the +instructions than entry and exit instrumentation. (Note that many of the instructions in entry and exit functions are either logging functions or -depend on the runtime settings and thus might never run). However, +depend on the runtime settings and thus might never run). However, due to the number of potential instructions in the entry and exit snippets, -the default behavior of ``omnitrace-instrument`` is to only instrument functions +the default behavior of ``rocprof-sys-instrument`` is to only instrument functions which contain at least 1024 instructions. -However, recording every single invocation of the function can be extremely +However, recording every single invocation of the function can be extremely useful for detecting anomalies, such as profiles that show minimum or maximum values much smaller or larger -than the average or a high standard deviation. In this case, the traces help you +than the average or a high standard deviation. In this case, the traces help you identify exactly when and where those instances deviated from the norm. -Compare the level of detail in the following traces. In the top image, +Compare the level of detail in the following traces. In the top image, every instance of the ``fib`` function is instrumented, while in the bottom image, the ``fib`` call-stack is derived via sampling. diff --git a/docs/conceptual/omnitrace-feature-set.rst b/docs/conceptual/rocprof-sys-feature-set.rst similarity index 73% rename from docs/conceptual/omnitrace-feature-set.rst rename to docs/conceptual/rocprof-sys-feature-set.rst index 4a8aceaf..39c6c1e7 100644 --- a/docs/conceptual/omnitrace-feature-set.rst +++ b/docs/conceptual/rocprof-sys-feature-set.rst @@ -1,14 +1,14 @@ .. meta:: - :description: Omnitrace documentation and reference - :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + :description: ROCm Systems Profiler documentation and reference + :keywords: rocprof-sys, rocprofiler-systems, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD *************************************** -The Omnitrace feature set and use cases +The ROCm Systems Profiler feature set and use cases *************************************** -`Omnitrace `_ is designed to be highly extensible. -Internally, it leverages the `Timemory performance analysis toolkit `_ -to manage extensions, resources, data, and other items. It supports the following features, +`ROCm Systems Profiler `_ is designed to be highly extensible. +Internally, it leverages the `Timemory performance analysis toolkit `_ +to manage extensions, resources, data, and other items. It supports the following features, modes, metrics, and APIs. Data collection modes @@ -22,11 +22,6 @@ Data collection modes * Statistical sampling: Periodic software interrupts per-thread * Process-level sampling: A background thread records process-, system- and device-level metrics while the application runs * Causal profiling: Quantifies the potential impact of optimizations in parallel code - -.. note:: - - Critical trace support was removed in Omnitrace v1.11.0. - It was replaced by the causal profiling feature. Data analysis ======================================== @@ -98,40 +93,40 @@ Third-party API support * NVTX * ROCTX -Omnitrace use cases +ROCm Systems Profiler use cases ======================================== -When analyzing the performance of an application, do NOT +When analyzing the performance of an application, do NOT assume you know where the performance bottlenecks are -and why they are happening. Omnitrace is a tool for analyzing the entire +and why they are happening. ROCm Systems Profiler is a tool for analyzing the entire application and its performance. It is -ideal for characterizing where optimization would have the greatest impact +ideal for characterizing where optimization would have the greatest impact on an end-to-end run of the application and for viewing what else is happening on the system during a performance bottleneck. -When GPUs are involved, there is a tendency to assume that +When GPUs are involved, there is a tendency to assume that the quickest path to performance improvement is minimizing -the runtime of the GPU kernels. This is a highly flawed assumption. +the runtime of the GPU kernels. This is a highly flawed assumption. If you optimize the runtime of a kernel from one millisecond -to 1 microsecond (1000x speed-up) but the original application never +to 1 microsecond (1000x speed-up) but the original application never spent time waiting for kernels to complete, -there would be no statistically significant reduction in the end-to-end +there would be no statistically significant reduction in the end-to-end runtime of your application. In other words, it does not matter -how fast or slow the code on GPU is if the application has a +how fast or slow the code on GPU is if the application has a bottleneck on waiting on the GPU. -Use Omnitrace to obtain a high-level view of the entire application. Use it +Use ROCm Systems Profiler to obtain a high-level view of the entire application. Use it to determine where the performance bottlenecks are and obtain clues to why these bottlenecks are happening. Rather than worrying about kernel -performance, start your investigation with Omnitrace, which characterizes the +performance, start your investigation with ROCm Systems Profiler, which characterizes the broad picture. .. note:: - For insight into the execution of individual kernels on the GPU, - use `Omniperf `_. + For insight into the execution of individual kernels on the GPU, + use `ROCm Compute Profiler `_. -In terms of CPU analysis, Omnitrace does not target any specific vendor. +In terms of CPU analysis, ROCm Systems Profiler does not target any specific vendor. It works just as well on AMD and non-AMD CPUs. -With regard to the GPU, Omnitrace is currently restricted to HIP and HSA APIs +With regard to the GPU, ROCm Systems Profiler is currently restricted to HIP and HSA APIs and kernels running on AMD GPUs. \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py index 718797ac..d9730f8f 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -36,14 +36,14 @@ raise ValueError("VERSION not found!") version_number = match[1] -external_projects_current_project = "omnitrace" +external_projects_current_project = "rocprofiler-systems" -project = "omnitrace" +project = "rocprofiler-systems" author = "Advanced Micro Devices, Inc." copyright = "Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved." version = version_number release = version_number -html_title = f"Omnitrace {version} documentation" +html_title = f"ROCm Systems Profiler {version} documentation" external_toc_path = "./sphinx/_toc.yml" diff --git a/docs/doxygen/Doxyfile b/docs/doxygen/Doxyfile index 19ede70a..92177189 100644 --- a/docs/doxygen/Doxyfile +++ b/docs/doxygen/Doxyfile @@ -4,7 +4,7 @@ # Project related configuration options #--------------------------------------------------------------------------- DOXYFILE_ENCODING = UTF-8 -PROJECT_NAME = omnitrace +PROJECT_NAME = rocprofiler-systems PROJECT_NUMBER = 1.11.3 PROJECT_BRIEF = "High-level and comprehensive application tracing and profiling on both the CPU and GPU" PROJECT_LOGO = @@ -19,8 +19,8 @@ ABBREVIATE_BRIEF = ALWAYS_DETAILED_SEC = YES INLINE_INHERITED_MEMB = YES FULL_PATH_NAMES = YES -STRIP_FROM_PATH = /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-omnitrace/checkouts/ -STRIP_FROM_INC_PATH = /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-omnitrace/checkouts/ +STRIP_FROM_PATH = /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rocprof-sys/checkouts/ +STRIP_FROM_INC_PATH = /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-rocprof-sys/checkouts/ SHORT_NAMES = NO JAVADOC_AUTOBRIEF = NO JAVADOC_BANNER = NO @@ -198,8 +198,8 @@ HTML_DYNAMIC_SECTIONS = YES HTML_INDEX_NUM_ENTRIES = 1000 GENERATE_DOCSET = NO DOCSET_FEEDNAME = "Doxygen generated docs" -DOCSET_BUNDLE_ID = org.doxygen.omnitrace -DOCSET_PUBLISHER_ID = org.doxygen.amdresearch +DOCSET_BUNDLE_ID = org.doxygen.rocprof-sys +DOCSET_PUBLISHER_ID = org.doxygen.rocm DOCSET_PUBLISHER_NAME = "Audacious Software Group" GENERATE_HTMLHELP = NO CHM_FILE = @@ -217,7 +217,7 @@ QHP_CUST_FILTER_ATTRS = QHP_SECT_FILTER_ATTRS = QHG_LOCATION = GENERATE_ECLIPSEHELP = NO -ECLIPSE_DOC_ID = org.doxygen.omnitrace +ECLIPSE_DOC_ID = org.doxygen.rocprof-sys DISABLE_INDEX = NO GENERATE_TREEVIEW = NO ENUM_VALUES_PER_LINE = 1 diff --git a/docs/how-to/configuring-runtime-options.rst b/docs/how-to/configuring-runtime-options.rst index 16767087..7fc4646e 100644 --- a/docs/how-to/configuring-runtime-options.rst +++ b/docs/how-to/configuring-runtime-options.rst @@ -1,31 +1,33 @@ .. meta:: - :description: Omnitrace documentation and reference - :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + :description: ROCm Systems Profiler documentation and reference + :keywords: rocprof-sys, rocprofiler-systems, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD **************************************************** Configuring runtime options **************************************************** -The ``omnitrace.cfg`` file maintains a list of the `Omnitrace `_ runtime options. To create this configuration -file and view the current runtime options, use the ``omnitrace-avail`` executable. +The ``rocprof-sys.cfg`` file maintains a list of the +`ROCm Systems Profiler `_ runtime +options. To create this configuration +file and view the current runtime options, use the ``rocprof-sys-avail`` executable. -The omnitrace-avail executable +The rocprof-sys-avail executable ======================================== -The ``omnitrace-avail`` executable provides information about the runtime settings, +The ``rocprof-sys-avail`` executable provides information about the runtime settings, data collection capabilities, and, when built with PAPI support, the available hardware counters. The executable is effectively -self-updating. As new capabilities and settings are added to the Omnitrace source code, they are -propagated to ``omnitrace-avail``. ``omnitrace-avail`` should be viewed as the ultimate authority +self-updating. As new capabilities and settings are added to the ROCm Systems Profiler source code, they are +propagated to ``rocprof-sys-avail``. ``rocprof-sys-avail`` should be viewed as the ultimate authority in the event of any conflicts with this documentation. -It is recommended that you create a default configuration file in -``${HOME}/.omnitrace.cfg``. This can be done by -running the command ``omnitrace-avail -G ~/.omnitrace.cfg``. Alternatively, -use the ``omnitrace-avail -G ~/.omnitrace.cfg --all`` option +It is recommended that you create a default configuration file in +``${HOME}/.rocprof-sys.cfg``. This can be done by +running the command ``rocprof-sys-avail -G ~/.rocprof-sys.cfg``. Alternatively, +use the ``rocprof-sys-avail -G ~/.rocprof-sys.cfg --all`` option for a verbose configuration file with descriptions, categories, and additional information. -Modify ``${HOME}/.omnitrace.cfg`` as required. For example, enable `Perfetto `_, +Modify ``${HOME}/.rocprof-sys.cfg`` as required. For example, enable `Perfetto `_, `Timemory `_, sampling, and process-level sampling by default and tweak the default sampling values. @@ -44,34 +46,34 @@ and tweak the default sampling values. Exploring runtime settings ----------------------------------- -Use the following command to view the list of the available runtime settings, their current values, and descriptions +Use the following command to view the list of the available runtime settings, their current values, and descriptions for each setting: .. code-block:: shell - omnitrace-avail --description + rocprof-sys-avail --description .. note:: Use ``--brief`` to suppress printing the current value and/or ``-c 0`` to suppress truncation of the descriptions. -Any Boolean setting (``omnitrace-avail --settings --value --brief --filter bool``) -accepts a case insensitive match for nearly all common Boolean logic expressions: +Any Boolean setting (``rocprof-sys-avail --settings --value --brief --filter bool``) +accepts a case insensitive match for nearly all common Boolean logic expressions: ``ON``, ``OFF``, ``YES``, ``NO``, ``TRUE``, ``FALSE``, ``0``, ``1``, etc. Exploring components ----------------------------------- -Omnitrace uses `Timemory `_ extensively to provide +ROCm Systems Profiler uses `Timemory `_ extensively to provide various capabilities and manage -data and resources. By default, with ``OMNITRACE_PROFILE=ON``, Omnitrace only collects wall-clock -timing values. However, by modifying the ``OMNITRACE_TIMEMORY_COMPONENTS`` setting, -Omnitrace can be configured to +data and resources. By default, with ``OMNITRACE_PROFILE=ON``, ROCm Systems Profiler only collects wall-clock +timing values. However, by modifying the ``OMNITRACE_TIMEMORY_COMPONENTS`` setting, +ROCm Systems Profiler can be configured to collect hardware counters, CPU-clock timers, memory usage, context switches, page faults, network statistics, -and much more. Omnitrace can even be used as a dynamic instrumentation vehicle +and much more. ROCm Systems Profiler can even be used as a dynamic instrumentation vehicle for other third-party profiling APIs such as `Caliper `_ and `LIKWID `_. -To leverage this capability, build Omnitrace from source with the CMake +To leverage this capability, build ROCm Systems Profiler from source with the CMake options ``TIMEMORY_USE_CALIPER=ON`` or ``TIMEMORY_USE_LIKWID=ON`` and then add ``caliper_marker``, ``likwid_marker``, or both to ``OMNITRACE_TIMEMORY_COMPONENTS``. @@ -79,30 +81,30 @@ To view all possible components and their descriptions: .. code-block:: shell - omnitrace-avail --components --description + rocprof-sys-avail --components --description To restrict the output to available components and view the string identifiers for ``OMNITRACE_TIMEMORY_COMPONENTS``: .. code-block:: shell - omnitrace-avail --components --available --string --brief + rocprof-sys-avail --components --available --string --brief Exploring hardware counters ----------------------------------- -Omnitrace supports hardware counter collection via PAPI and ROCm. +ROCm Systems Profiler supports hardware counter collection via PAPI and ROCm. Generally, PAPI is used to collect CPU-based hardware counters and ROCm is used to collect GPU-based hardware -counters. Although it is possible to install PAPI with ROCm support and use it to -collect GPU-based hardware counters, this is not recommended because PAPI +counters. Although it is possible to install PAPI with ROCm support and use it to +collect GPU-based hardware counters, this is not recommended because PAPI cannot simultaneously collect CPU and GPU hardware counters. To view all possible hardware counters and their descriptions, use the following command: .. code-block:: shell - omnitrace-avail --hw-counters --description + rocprof-sys-avail --hw-counters --description -Appending the ``-c CPU`` option restricts the list of hardware counters to +Appending the ``-c CPU`` option restricts the list of hardware counters to those available through PAPI, while ``-c GPU`` limits the list to those available from ROCm. Enabling hardware counters @@ -123,7 +125,7 @@ Here is a sample configuration for hardware counters: # using perf identifiers OMNITRACE_PAPI_EVENTS = perf::INSTRUCTIONS perf::CACHE-REFERENCES perf::CACHE-MISSES -.. _omnitrace_papi_events: +.. _rocprof-sys_papi_events: OMNITRACE_PAPI_EVENTS ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -135,18 +137,18 @@ has a value <= 2. If you have ``sudo`` access, use the following command to modi echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid -However this value is not retained upon reboot. +However this value is not retained upon reboot. Use the following command to preserve this setting after a reboot: .. code-block:: shell echo 'kernel.perf_event_paranoid=0' | sudo tee -a /etc/sysctl.conf -PAPI events use a concept similar to a namespace. All specified hardware +PAPI events use a concept similar to a namespace. All specified hardware counters must be from the same namespace. -For hardware counters starting with the ``PAPI_`` prefix, these are high-level +For hardware counters starting with the ``PAPI_`` prefix, these are high-level aggregates of multiple hardware counters. -Otherwise, most events use two or three colons (``::`` or ``:::``) between the +Otherwise, most events use two or three colons (``::`` or ``:::``) between the component name and the counter name, for example, ``amd64_rapl::RAPL_ENERGY_PKG`` and ``perf::PERF_COUNT_HW_CPU_CYCLES``. @@ -165,22 +167,22 @@ PAPI components from different namespaces: .. note:: - If Omnitrace was configured with the default ``OMNITRACE_BUILD_PAPI=ON`` setting, + If ROCm Systems Profiler was configured with the default ``OMNITRACE_BUILD_PAPI=ON`` setting, standard PAPI command-line tools such as - ``papi_avail`` and ``papi_event_chooser`` are not able to provide information - about the PAPI library used by Omnitrace - (because Omnitrace statically links to ``libpapi``). However, all of these tools are - installed with the prefix ``omnitrace-`` with - underscores replaced with hypens, for example ``papi_avail`` becomes ``omnitrace-papi-avail``. + ``papi_avail`` and ``papi_event_chooser`` are not able to provide information + about the PAPI library used by ROCm Systems Profiler + (because ROCm Systems Profiler statically links to ``libpapi``). However, all of these tools are + installed with the prefix ``rocprof-sys-`` with + underscores replaced with hypens, for example ``papi_avail`` becomes ``rocprof-sys-papi-avail``. OMNITRACE_ROCM_EVENTS ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Omnitrace reads the ROCm events from the ``${ROCM_PATH}/lib/rocprofiler/metrics.xml`` +ROCm Systems Profiler reads the ROCm events from the ``${ROCM_PATH}/lib/rocprofiler/metrics.xml`` file. Use the ``ROCP_METRICS`` environment -variable to point Omnitrace to a different XML metrics file, for example, +variable to point ROCm Systems Profiler to a different XML metrics file, for example, ``export ROCP_METRICS=${PWD}/custom_metrics.xml``. -``omnitrace-avail -H -c GPU`` shows event names with a suffix of ``:device=N`` +``rocprof-sys-avail -H -c GPU`` shows event names with a suffix of ``:device=N`` where ``N`` is the device number. For example, if you have two devices, the output is: @@ -190,7 +192,7 @@ For example, if you have two devices, the output is: ... | Wavefronts:device=1 | Derived counter: SQ_WAVES | -To collect the event on all devices, specify the event, +To collect the event on all devices, specify the event, such as ``Wavefronts``, without the ``:device=`` suffix. To collect the event only on specific devices, use the ``:device=`` suffix. @@ -204,10 +206,10 @@ The following example: OMNITRACE_ROCM_EVENTS = GPUBusy SQ_WAVES:device=0 SQ_INSTS_VALU:device=1 -omnitrace-avail examples +rocprof-sys-avail examples ----------------------------------- -The following examples demonstrate how to use ``omnitrace-avail`` to perform several common +The following examples demonstrate how to use ``rocprof-sys-avail`` to perform several common configuration tasks. Generating a default configuration file @@ -215,10 +217,10 @@ Generating a default configuration file .. code-block:: shell - $ omnitrace-avail -G ~/.omnitrace.cfg - [omnitrace-avail] Outputting text configuration file '/home/user/.omnitrace.cfg'... - $ cat ~/.omnitrace.cfg - # auto-generated by omnitrace-avail (version 1.2.0) on 2022-06-27 @ 19:15 + $ rocprof-sys-avail -G ~/.rocprof-sys.cfg + [rocprof-sys-avail] Outputting text configuration file '/home/user/.rocprof-sys.cfg'... + $ cat ~/.rocprof-sys.cfg + # auto-generated by rocprof-sys-avail (version 1.2.0) on 2022-06-27 @ 19:15 OMNITRACE_CONFIG_FILE = OMNITRACE_MODE = trace @@ -231,7 +233,7 @@ Generating a default configuration file OMNITRACE_USE_KOKKOSP = false OMNITRACE_USE_CODE_COVERAGE = false OMNITRACE_USE_PID = true - OMNITRACE_OUTPUT_PATH = omnitrace-%tag%-output + OMNITRACE_OUTPUT_PATH = rocprof-sys-%tag%-output OMNITRACE_OUTPUT_PREFIX = OMNITRACE_CI = false OMNITRACE_THREAD_POOL_SIZE = 8 @@ -301,10 +303,10 @@ Generating a default configuration file When creating a new configuration file, the following recommendations apply: * Use the ``--all`` option to view all descriptions, choices, and other information in the configuration file. -* To create a new configuration without inheriting from an existing ``${HOME}/.omnitrace.cfg`` file, +* To create a new configuration without inheriting from an existing ``${HOME}/.rocprof-sys.cfg`` file, set ``OMNITRACE_SUPPRESS_CONFIG=ON`` in the environment beforehand. * To create a new configuration that makes minor changes to an existing configuration, - set ``OMNITRACE_CONFIG_FILE=/path/to/existing/file`` and define the changes as environment + set ``OMNITRACE_CONFIG_FILE=/path/to/existing/file`` and define the changes as environment variables before generating it. Viewing the setting descriptions @@ -312,7 +314,7 @@ Viewing the setting descriptions .. code-block:: shell - $ omnitrace-avail -S -bd + $ rocprof-sys-avail -S -bd |-----------------------------------------|-----------------------------------------| | ENVIRONMENT VARIABLE | DESCRIPTION | |-----------------------------------------|-----------------------------------------| @@ -320,13 +322,13 @@ Viewing the setting descriptions | OMNITRACE_ADD_SECONDARY | Enable/disable components adding sec... | | OMNITRACE_COLLAPSE_PROCESSES | Enable/disable combining process-spe... | | OMNITRACE_COLLAPSE_THREADS | Enable/disable combining thread-spec... | - | OMNITRACE_CONFIG_FILE | Configuration file for omnitrace | + | OMNITRACE_CONFIG_FILE | Configuration file for rocprof-sys | | OMNITRACE_COUT_OUTPUT | Write output to stdout | | OMNITRACE_CPU_AFFINITY | Enable pinning threads to CPUs (Linu... | | OMNITRACE_THREAD_POOL_SIZE | Number of threads to use when genera... | | OMNITRACE_DEBUG | Enable debug output | | OMNITRACE_DIFF_OUTPUT | Generate a difference output vs. a p... | - | OMNITRACE_DL_VERBOSE | Verbosity within the omnitrace-dl li... | + | OMNITRACE_DL_VERBOSE | Verbosity within the rocprof-sys-dl ... | | OMNITRACE_ENABLED | Activation state of timemory | | OMNITRACE_ENABLE_SIGNAL_HANDLER | Enable signals in timemory_init | | OMNITRACE_FILE_OUTPUT | Write output to files | @@ -402,7 +404,7 @@ Viewing components .. code-block:: shell - $ omnitrace-avail -C -bd + $ rocprof-sys-avail -C -bd |-----------------------------------|----------------------------------------------| | COMPONENT | DESCRIPTION | |-----------------------------------|----------------------------------------------| @@ -460,7 +462,7 @@ Viewing components | wall_clock | Real-clock timer (i.e. wall-clock timer). | | written_bytes | Number of bytes sent to the storage layer. | | written_char | Number of bytes which this task has cause... | - | omnitrace | Invokes instrumentation functions omnitr... | + | rocprof-sys | Invokes instrumentation functions omnitr... | | roctracer | High-precision ROCm API and kernel tracing. | | sampling_wall_clock | Wall-clock timing. Derived from statistic... | | sampling_cpu_clock | CPU-clock timing. Derived from statistica... | @@ -476,7 +478,7 @@ Viewing hardware counters .. code-block:: shell - $ omnitrace-avail -H -bd + $ rocprof-sys-avail -H -bd |---------------------------------------|---------------------------------------| | HARDWARE COUNTER | DESCRIPTION | |---------------------------------------|---------------------------------------| @@ -1197,17 +1199,17 @@ Viewing hardware counters Creating a configuration file ======================================== -Omnitrace supports three configuration file formats: JSON, XML, and plain text. -Use ``omnitrace-avail -G -F txt json xml`` to generate default +ROCm Systems Profiler supports three configuration file formats: JSON, XML, and plain text. +Use ``rocprof-sys-avail -G -F txt json xml`` to generate default configuration files in each format. Optionally include the ``--all`` flag to include full descriptions and other information. Configuration files are specified by the ``OMNITRACE_CONFIG_FILE`` environment variable -which by default looks for ``${HOME}/.omnitrace.cfg`` and ``${HOME}/.omnitrace.json``. +which by default looks for ``${HOME}/.rocprof-sys.cfg`` and ``${HOME}/.rocprof-sys.json``. Multiple configuration files can be concatenated using the ``:`` symbol, for example: .. code-block:: shell - export OMNITRACE_CONFIG_FILE=~/.config/omnitrace.cfg:~/.config/omnitrace.json + export OMNITRACE_CONFIG_FILE=~/.config/rocprof-sys.cfg:~/.config/rocprof-sys.json If a configuration variable is specified in both a configuration file and in the environment, the environment variable takes precedence. @@ -1220,7 +1222,7 @@ Variables are created when an lvalue starts with a ``$`` and are de-referenced when they appear as rvalues. Entries in the text configuration file which do not match a known setting -in ``omnitrace-avail`` but are prefixed with ``OMNITRACE_`` are interpreted as +in ``rocprof-sys-avail`` but are prefixed with ``OMNITRACE_`` are interpreted as environment variables. They are exported via ``setenv`` but do not override an existing value for the environment variable. @@ -1241,7 +1243,7 @@ but do not override an existing value for the environment variable. OMNITRACE_VERBOSE = 1 # output fields - OMNITRACE_OUTPUT_PATH = omnitrace-output + OMNITRACE_OUTPUT_PATH = rocprof-sys-output OMNITRACE_OUTPUT_PREFIX = %tag%/ OMNITRACE_TIME_OUTPUT = OFF OMNITRACE_USE_PID = OFF @@ -1269,7 +1271,7 @@ The full JSON specification for a configuration value contains a lot of informat .. code-block:: json { - "omnitrace": { + "rocprof-sys": { "settings": { "OMNITRACE_ADD_SECONDARY": { "count": -1, @@ -1279,7 +1281,7 @@ The full JSON specification for a configuration value contains a lot of informat "value": true, "max_count": 1, "cmdline": [ - "--omnitrace-add-secondary" + "--rocprof-sys-add-secondary" ], "environ": "OMNITRACE_ADD_SECONDARY", "cereal_class_version": 1, @@ -1294,13 +1296,13 @@ The full JSON specification for a configuration value contains a lot of informat } } -However when writing an JSON configuration file, the following example is minimally acceptable +However when writing an JSON configuration file, the following example is minimally acceptable for ``OMNITRACE_ADD_SECONDARY``: .. code-block:: json { - "omnitrace": { + "rocprof-sys": { "settings": { "OMNITRACE_ADD_SECONDARY": { "value": true @@ -1318,7 +1320,7 @@ The full XML specification for a configuration value contains the same informati - + 2 @@ -1330,7 +1332,7 @@ The full XML specification for a configuration value contains the same informati -1 1 - --omnitrace-add-secondary + --rocprof-sys-add-secondary component @@ -1343,21 +1345,21 @@ The full XML specification for a configuration value contains the same informati - + -However, when writing an XML configuration file, it is minimally acceptable +However, when writing an XML configuration file, it is minimally acceptable to set ``OMNITRACE_ADD_SECONDARY=false``: .. code-block:: xml - + false - + diff --git a/docs/how-to/configuring-validating-environment.rst b/docs/how-to/configuring-validating-environment.rst index 80097634..bb14d024 100644 --- a/docs/how-to/configuring-validating-environment.rst +++ b/docs/how-to/configuring-validating-environment.rst @@ -1,47 +1,47 @@ .. meta:: - :description: Omnitrace documentation and reference - :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + :description: ROCm Systems Profiler documentation and reference + :keywords: rocprof-sys, rocprofiler-systems, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD **************************************************** Configuring and validating the environment **************************************************** -After installing `Omnitrace `_, additional steps are required to set up +After installing `ROCm Systems Profiler `_, additional steps are required to set up and validate the environment. .. note:: - The following instructions use the installation path ``/opt/omnitrace``. If - Omnitrace is installed elsewhere, substitute the actual installation path. + The following instructions use the installation path ``/opt/rocprof-sys``. If + ROCm Systems Profiler is installed elsewhere, substitute the actual installation path. Configuring the environment ======================================== -After Omnitrace is installed, source the ``setup-env.sh`` script to prefix the +After ROCm Systems Profiler is installed, source the ``setup-env.sh`` script to prefix the ``PATH``, ``LD_LIBRARY_PATH``, and other environment variables: .. code-block:: shell - source /opt/omnitrace/share/omnitrace/setup-env.sh + source /opt/rocprof-sys/share/rocprof-sys/setup-env.sh Alternatively, if environment modules are supported, add the ``/share/modulefiles`` directory to ``MODULEPATH``: .. code-block:: shell - module use /opt/omnitrace/share/modulefiles + module use /opt/rocprof-sys/share/modulefiles .. note:: - + As an alternative, the above line can be added to the ``${HOME}/.modulerc`` file. -After Omnitrace has been added to the ``MODULEPATH``, it can be loaded -using ``module load omnitrace/`` and unloaded using ``module unload omnitrace/``. +After ROCm Systems Profiler has been added to the ``MODULEPATH``, it can be loaded +using ``module load rocprof-sys/`` and unloaded using ``module unload rocprof-sys/``. .. code-block:: shell - module load omnitrace/1.0.0 - module unload omnitrace/1.0.0 + module load rocprof-sys/1.0.0 + module unload rocprof-sys/1.0.0 .. note:: @@ -51,21 +51,21 @@ using ``module load omnitrace/`` and unloaded using ``module unload omn Validating the environment configuration ======================================== -If the following commands all run successfully with the expected output, -then you are ready to use Omnitrace: +If the following commands all run successfully with the expected output, +then you are ready to use ROCm Systems Profiler: .. code-block:: shell - which omnitrace - which omnitrace-avail - which omnitrace-sample - omnitrace-instrument --help - omnitrace-avail --all - omnitrace-sample --help + which rocprof-sys + which rocprof-sys-avail + which rocprof-sys-sample + rocprof-sys-instrument --help + rocprof-sys-avail --all + rocprof-sys-sample --help -If Omnitrace was built with Python support, validate these additional commands: +If ROCm Systems Profiler was built with Python support, validate these additional commands: .. code-block:: shell - which omnitrace-python - omnitrace-python --help + which rocprof-sys-python + rocprof-sys-python --help diff --git a/docs/how-to/general-tips-using-omnitrace.rst b/docs/how-to/general-tips-using-rocprof-sys.rst similarity index 72% rename from docs/how-to/general-tips-using-omnitrace.rst rename to docs/how-to/general-tips-using-rocprof-sys.rst index da4c5be0..7c8c7340 100644 --- a/docs/how-to/general-tips-using-omnitrace.rst +++ b/docs/how-to/general-tips-using-rocprof-sys.rst @@ -1,19 +1,19 @@ .. meta:: - :description: Omnitrace documentation and reference - :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + :description: ROCm Systems Profiler documentation and reference + :keywords: rocprof-sys, rocprofiler-systems, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD ********************************** -General tips for using Omnitrace +General tips for using ROCm Systems Profiler ********************************** -Follow these general guidelines when using Omnitrace. For an explanation of the terms used in this topic, see -the :doc:`Omnitrace glossary <../reference/omnitrace-glossary>`. +Follow these general guidelines when using ROCm Systems Profiler. For an explanation of the terms used in this topic, see +the :doc:`ROCm Systems Profiler glossary <../reference/rocprof-sys-glossary>`. -* Use ``omnitrace-avail`` to look up configuration settings, hardware counters, and data collection components +* Use ``rocprof-sys-avail`` to look up configuration settings, hardware counters, and data collection components * Use the ``-d`` flag for descriptions -* Generate a default configuration with ``omnitrace-avail -G ${HOME}/.omnitrace.cfg`` and adjust it +* Generate a default configuration with ``rocprof-sys-avail -G ${HOME}/.rocprof-sys.cfg`` and adjust it to the desired default behavior * **Decide whether binary instrumentation, statistical sampling, or both** provides the desired performance data (for non-Python applications) * Compile code with optimization enabled (``-O2`` or higher), disable asserts (i.e. ``-DNDEBUG``), and include debug info (for instance, ``-g1`` at a minimum) @@ -24,26 +24,26 @@ the :doc:`Omnitrace glossary <../reference/omnitrace-glossary>`. * **Use binary instrumentation for characterizing the performance of every invocation of specific functions** * **Use statistical sampling to characterize the performance of the entire application while minimizing overhead** * Enable statistical sampling after binary instrumentation to help "fill in the gaps" between instrumented regions -* Use the user API to create custom regions and enable/disable Omnitrace for specific processes, threads, and regions +* Use the user API to create custom regions and enable/disable ROCm Systems Profiler for specific processes, threads, and regions * Dynamic symbol interception, callback APIs, and the user API are always available with binary instrumentation and sampling - * Dynamic symbol interception and callback APIs are (generally) controlled through ``OMNITRACE_USE_`` - options, for example, ``OMNITRACE_USE_KOKKOSP`` and ``OMNITRACE_USE_OMPT`` enable Kokkos-Tools and OpenMP-Tools + * Dynamic symbol interception and callback APIs are (generally) controlled through ``OMNITRACE_USE_`` + options, for example, ``OMNITRACE_USE_KOKKOSP`` and ``OMNITRACE_USE_OMPT`` enable Kokkos-Tools and OpenMP-Tools callbacks, respectively * When generically seeking regions for performance improvement: * **Start off by collecting a flat profile** * Look for functions with high call counts, large cumulative runtimes/values, or large standard deviations - + * When call counts are high, improving the performance of this function or "inlining" the function can result in quick and easy performance improvements - * When the standard deviation is high, collect a hierarchical profile and see if the high variation can be attributable to the calling context. + * When the standard deviation is high, collect a hierarchical profile and see if the high variation can be attributable to the calling context. In this scenario, consider creating a specialized version of the function for the longer-running contexts - * **Collect a hierarchical profile** and verify the functions that are part of the "critical path" of your + * **Collect a hierarchical profile** and verify the functions that are part of the "critical path" of your application, as indicated in the flat profile - * For example, functions with high call counts but which are part of a "setup" or "post-processing" + * For example, functions with high call counts but which are part of a "setup" or "post-processing" phase that does not consume much time relative to the overall time are generally a lower priority for optimization * **Use the information from the profiles when analyzing detailed traces** @@ -54,7 +54,7 @@ the :doc:`Omnitrace glossary <../reference/omnitrace-glossary>`. * When using binary instrumentation with MPI, avoid runtime instrumentation * Runtime instrumentation requires a fork and a ``ptrace``, which is generally incompatible with how MPI applications spawn processes - * Perform a binary rewrite of the executable (and optionally, libraries used by the executable) using MPI and run - the generated instrumented executable using ``omnitrace-run`` instead of the original. - For example, instead of ``mpirun -n 2 ./myexe``, use ``mpirun -n 2 omnitrace-run -- ./myexe.inst``, where + * Perform a binary rewrite of the executable (and optionally, libraries used by the executable) using MPI and run + the generated instrumented executable using ``rocprof-sys-run`` instead of the original. + For example, instead of ``mpirun -n 2 ./myexe``, use ``mpirun -n 2 rocprof-sys-run -- ./myexe.inst``, where ``myexe.inst`` is the instrumented ``myexe`` executable that was generated. diff --git a/docs/how-to/instrumenting-rewriting-binary-application.rst b/docs/how-to/instrumenting-rewriting-binary-application.rst index c3c3083c..739d5b64 100644 --- a/docs/how-to/instrumenting-rewriting-binary-application.rst +++ b/docs/how-to/instrumenting-rewriting-binary-application.rst @@ -1,12 +1,12 @@ .. meta:: - :description: Omnitrace documentation and reference - :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + :description: ROCm Systems Profiler documentation and reference + :keywords: rocprof-sys, rocprofiler-systems, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD **************************************************** Instrumenting and rewriting a binary application **************************************************** -There are three ways to perform instrumentation with the ``omnitrace-instrument`` executable: +There are three ways to perform instrumentation with the ``rocprof-sys-instrument`` executable: * Runtime instrumentation * Attaching to an already running process @@ -14,11 +14,11 @@ There are three ways to perform instrumentation with the ``omnitrace-instrument` Here is a comparison of the three modes: -* Runtime instrumentation of the application using the ``omnitrace-instrument`` executable +* Runtime instrumentation of the application using the ``rocprof-sys-instrument`` executable (analogous to ``gdb --args ``) * This mode is the default if neither the ``-p`` nor ``-o`` command-line options are used - * Runtime instrumentation supports instrumenting not only the target executable but also + * Runtime instrumentation supports instrumenting not only the target executable but also the shared libraries loaded by the target executable. Consequently, this mode consumes more memory, takes longer to perform the instrumentation, and tends to add more significant overhead to the runtime of the application. @@ -26,7 +26,7 @@ Here is a comparison of the three modes: libraries but also the performance of the library dependencies * Attaching to a process that is currently running (analogous to ``gdb -p ``) - + * This mode is activated using ``-p `` * The same caveats from the first example apply with respect to memory and overhead @@ -39,25 +39,25 @@ Here is a comparison of the three modes: * This mode is activated through the ``-o `` option * Binary rewriting is limited to the text section of the target executable or library. It does not instrument - the dynamically-linked libraries. Consequently, this mode performs the + the dynamically-linked libraries. Consequently, this mode performs the instrumentation significantly faster and has a much lower overhead when running the instrumented executable and libraries. - * Binary rewriting is the recommended mode when the target executable uses + * Binary rewriting is the recommended mode when the target executable uses process-level parallelism (for example, MPI) - * If the target executable has a minimal ``main`` routine and the bulk of your + * If the target executable has a minimal ``main`` routine and the bulk of your application is in one specific dynamic library, see :ref:`binary-rewriting-library-label` for help -The omnitrace-instrument executable +The rocprof-sys-instrument executable ======================================== -Instrumentation is performed with the ``omnitrace-instrument`` executable. For more details, use the ``-h`` or ``--help`` option to +Instrumentation is performed with the ``rocprof-sys-instrument`` executable. For more details, use the ``-h`` or ``--help`` option to view the help menu. .. code-block:: shell - $ omnitrace-instrument --help - [omnitrace-instrument] Usage: omnitrace-instrument [ --help (count: 0, dtype: bool) + $ rocprof-sys-instrument --help + [rocprof-sys-instrument] Usage: rocprof-sys-instrument [ --help (count: 0, dtype: bool) --version (count: 0, dtype: bool) --verbose (max: 1, dtype: bool) --error (max: 1, dtype: boolean) @@ -161,8 +161,8 @@ view the help menu. [MODE OPTIONS] -o, --output Enable generation of a new executable (binary-rewrite). If a filename is not provided, - omnitrace will use the basename and output to the cwd, unless the target binary is in the - cwd. In the latter case, omnitrace will either use ${PWD}/.inst (non-libraries) + rocprof-sys will use the basename and output to the cwd, unless the target binary is in the + cwd. In the latter case, rocprof-sys will either use ${PWD}/.inst (non-libraries) or ${PWD}/instrumented/ (libraries) -p, --pid Connect to running process -M, --mode [ coverage | sampling | trace ] @@ -177,7 +177,7 @@ view the help menu. [LIBRARY OPTIONS] --prefer [ shared | static ] Prefer this library types when available - -L, --library Libraries with instrumentation routines (default: "libomnitrace-dl") + -L, --library Libraries with instrumentation routines (default: "librocprof-sys-dl") -m, --main-function The primary function to instrument around, e.g. \'main\' --load Supplemental instrumentation library names w/o extension (e.g. \'libinstr\' for \'libinstr.so\' or \'libinstr.a\') @@ -200,16 +200,16 @@ view the help menu. -ME, --module-exclude Regex(es) for excluding modules/files/libraries (always applied) -MR, --module-restrict Regex(es) for restricting modules/files/libraries only to those that match the provided regular-expressions - --internal-function-include Regex(es) for including functions which are (likely) utilized by omnitrace itself. Use + --internal-function-include Regex(es) for including functions which are (likely) utilized by rocprof-sys itself. Use this option with care. - --internal-module-include Regex(es) for including modules/libraries which are (likely) utilized by omnitrace + --internal-module-include Regex(es) for including modules/libraries which are (likely) utilized by rocprof-sys itself. Use this option with care. --instruction-exclude Regex(es) for excluding functions containing certain instructions --internal-library-deps Treat the libraries linked to the internal libraries as internal libraries. This increase the internal library processing time and consume more memory (so use with care) but may be useful when the application uses Boost libraries and Dyninst is dynamically linked against the same boost libraries - --internal-library-append Append to the list of libraries which omnitrace treats as being used internally, e.g. + --internal-library-append Append to the list of libraries which rocprof-sys treats as being used internally, e.g. OmniTrace will find all the symbols in this library and prevent them from being instrumented. --internal-library-remove [ ld-linux-x86-64.so.2 @@ -287,11 +287,11 @@ view the help menu. options to gain more information about the function signature or location of the functions -C, --config Read in a configuration file and encode these values as the defaults in the executable - -d, --default-components Default components to instrument (only useful when timemory is enabled in omnitrace + -d, --default-components Default components to instrument (only useful when timemory is enabled in rocprof-sys library) --env Environment variables to add to the runtime in form VARIABLE=VALUE. E.g. use \'--env OMNITRACE_PROFILE=ON\' to default to using timemory instead of perfetto - --mpi Enable MPI support (requires omnitrace built w/ full or partial MPI support). NOTE: this + --mpi Enable MPI support (requires rocprof-sys built w/ full or partial MPI support). NOTE: this will automatically be activated if MPI_Init, MPI_Init_thread, MPI_Finalize, MPI_Comm_rank, or MPI_Comm_size are found in the symbol table of target @@ -322,8 +322,8 @@ view the help menu. --allow-overlapping Allow dyninst to instrument either multiple functions which overlap (share part of same function body) or single functions with multiple entry points. For more info, see Section 2 of the DyninstAPI documentation. - --parse-all-modules By default, omnitrace simply requests Dyninst to provide all the procedures in the - application image. If this option is enabled, omnitrace will iterate over all the modules + --parse-all-modules By default, rocprof-sys simply requests Dyninst to provide all the procedures in the + application image. If this option is enabled, rocprof-sys will iterate over all the modules and extract the functions. Theoretically, it should be the same but the data is slightly different, possibly due to weak binding scopes. In general, enabling option will probably have no visible effect @@ -344,17 +344,17 @@ view the help menu. TypeChecking ] Advanced dyninst options: BPatch::set