Expand NVTX usage #18683

ptrendx · 2020-07-09T21:14:12Z

Description

This PR improves the experience when profiling MXNet under nvprof or Nsight Systems, by expanding NVTX functionality.
It was used successfully for some time in the NVIDIA NGC container.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Added C API for NVTX ranges push/pop
Added C API to start/stop profiling via nvprof/NSight Systems
Added Python API corresponding to the C API
Added automatic ranges to MXNet operators

mxnet-bot · 2020-07-09T21:14:16Z

Hey @ptrendx , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [windows-cpu, centos-cpu, sanity, website, clang, unix-gpu, centos-gpu, miscellaneous, edge, windows-gpu, unix-cpu]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

szha · 2020-08-14T05:46:36Z

@mxnet-bot run ci [centos-cpu, miscellaneous]

mxnet-bot · 2020-08-14T05:46:44Z

Jenkins CI successfully triggered : [centos-cpu, miscellaneous]

szha · 2020-08-20T23:26:37Z

@ptrendx still WIP?

ptrendx · 2020-08-20T23:40:51Z

Yes, I will return back to it sometime next week probably (@DickJC123 is currently merging 1.7 into our codebase and cherry picking stuff after that merge will be easier).

szha · 2021-02-21T21:40:39Z

What's the status for this change?

ptrendx · 2021-02-22T19:32:17Z

It is currently on hold - I got sidetracked to other things and am not sure exactly when I will have time to finish it - I still intend to do that, though.

ptrendx · 2021-09-07T19:47:12Z

@mxnet-bot run ci [unix-cpu]

mxnet-bot · 2021-09-07T19:47:16Z

Jenkins CI successfully triggered : [unix-cpu]

DickJC123 · 2021-09-11T05:37:50Z

python/mxnet/cuda_utils.py

+"""Provides python interface to cuda-related functions of the mxnet library."""
+from .base import _LIB, mx_uint, c_str, check_call
+
+def nvtx_range_push(name, color=13323030):


I don't enjoy suggesting a code repetition, but isn't the following an improvement over letting python users have to make their own colors:

class Nvtx: # Palette of colors RED = 0xFF0000 GREEN = 0x00FF00 BLUE = 0x0000FF YELLOW = 0xB58900 ORANGE = 0xCB4B16 RED = 0xDC322F MAGENTA = 0xD33682 VIOLET = 0x6C71C4 BLUE1 = 0x268BD2 CYAN = 0x2AA198 GREEN = 0x859900 def nvtx_range_push(name, color=Nvtx.ORANGE):

If you're not fond of the code repetition, at least the following smaller edit explains the magic number:

def nvtx_range_push(name, color=0xCB4B16): # default color is orange

I am refactoring this to be nicer and more Pythonic. Will send an updated version tomorrow, once I test it.

Couple of new comments. First, I somehow miscopied the enums in my suggestion above. The second 'RED' should be 'RED1' to mirror what's in nvtx.h. Second, the colors set up in the backend are currently only used to assign colors by default based on a hash of the range name. Perhaps you'd consider exposing this the same way to Python? So for example,

def nvtx_range_push(name, color=None): # if color is None, then assign a color based on a hash of name

The problem with this is mainly that the standard hash in Python is not deterministic, which would make colors change on subsequent runs. Using a real hashing function like the SHA from Python's hashlib on the other hand seems to be quite a heavy hammer (and most probably introduce additional overheads, which we especially do not want to do when profiling the CPU part of the training).

What I tried to suggest was that the 'None' color would be conveyed to the backend (via -1 or alternate API with no color arg), and the backend would assign the color with the existing backend nameToColor() routine.

DickJC123 · 2021-09-11T05:50:20Z

src/common/nvtx.h

+  static const uint32_t kCyan    = 0x2AA198;
+  static const uint32_t kGreen1  = 0x859900;
+
+  static void gpuRangeStart(const uint32_t rgb, const std::string& range_name) {


Do we know if the nvtxRangePushEx(&att) makes a copy of att.message.ascii? I suspect not, in which case I'm worried about the lifetime of range_name.c_str().

gpuRangeStart() is called with a const char *, so won't that require making a short-lived temporary string (with a copy of the char array) in the caller's environment?

Not sure I your concern - nvtxRangePush is not an async call. It just puts the name and a timestamp into the database and that's it (and yes, it copies the string in the process).

So if nvtxRangePush copies the att.message.ascii string (as opposed to the pointer), that answers my primary concern here.

I'm left with one nit then, why don't you have const char * range_name above, which avoids the needless string copy and keeps the type consistent between the callers (in c_api.cc and threaded_engine_perdevice.cc) and what att.message.ascii wants?

It is a reference to a string, so there is no copying here (c_str() does not copy either). I believe in the threaded engine the nvtx_name is a string anyway, so that is also not doing any copies.

In threaded engine, the following does a copy that could be avoided if everything was handled as a const char *:

auto nvtx_name = opr_block->opr->opr_name != "" ? opr_block->opr->opr_name : "Op";

so perhaps like:

auto nvtx_name = opr_block->opr->opr_name.size() ? opr_block->opr->opr_name.c_str() : "Op";

It's a nit as I say, so I'm OK leaving it as: const std::string& range_name. Given that though, the two overloaded nameToColor() signatures should be consistent in their use of std::string (see an additional suggestion).

src/common/nvtx.h

ptrendx · 2021-09-21T22:22:00Z

@mxnet-bot run ci [unix-cpu, website]

mxnet-bot · 2021-09-21T22:22:06Z

Jenkins CI successfully triggered : [unix-cpu, website]

ptrendx · 2021-09-22T20:58:48Z

@mxnet-bot run ci [unix-cpu, website]

mxnet-bot · 2021-09-22T20:58:55Z

Jenkins CI successfully triggered : [unix-cpu, website]

ptrendx · 2021-09-23T17:02:47Z

@mxnet-bot run ci [unix-cpu]

mxnet-bot · 2021-09-23T17:02:54Z

Jenkins CI successfully triggered : [unix-cpu]

DickJC123

Thanks for responding to my comments. LGTM.

ptrendx requested a review from DickJC123 July 9, 2020 21:14

ptrendx requested a review from szha as a code owner July 9, 2020 21:14

ptrendx force-pushed the pr_nvtx branch from 266314c to f642b4b Compare August 10, 2020 23:56

ptrendx requested a review from eric-haibin-lin as a code owner August 10, 2020 23:56

lanking520 added the pr-work-in-progress PR is still work in progress label Nov 2, 2020

ptrendx force-pushed the pr_nvtx branch from 1b4297f to 6057d85 Compare August 31, 2021 17:46

ptrendx changed the title ~~[WIP] Expand NVTX usage~~ Expand NVTX usage Sep 7, 2021

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 7, 2021

DickJC123 reviewed Sep 11, 2021

View reviewed changes

ptrendx added 3 commits September 14, 2021 11:04

Unifying NVTX support files

e33d2d7

Add C API

0732f4b

NVTX around ops in the engine

1e3eba8

ptrendx force-pushed the pr_nvtx branch from 6da66cd to 1e3eba8 Compare September 14, 2021 18:11

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-review PR is waiting for code review labels Sep 14, 2021

Moving nvtx.h to common/cuda

a793f2f

mseth10 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 14, 2021

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 21, 2021

Fix lint

873da6f

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 21, 2021

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 22, 2021

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 23, 2021

DickJC123 approved these changes Sep 23, 2021

View reviewed changes

DickJC123 merged commit c1e06aa into apache:master Sep 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand NVTX usage #18683

Expand NVTX usage #18683

ptrendx commented Jul 9, 2020 •

edited

Loading

mxnet-bot commented Jul 9, 2020

szha commented Aug 14, 2020

mxnet-bot commented Aug 14, 2020

szha commented Aug 20, 2020

ptrendx commented Aug 20, 2020

szha commented Feb 21, 2021

ptrendx commented Feb 22, 2021

ptrendx commented Sep 7, 2021

mxnet-bot commented Sep 7, 2021

DickJC123 Sep 11, 2021 •

edited

Loading

DickJC123 Sep 15, 2021

ptrendx Sep 16, 2021

DickJC123 Sep 21, 2021

ptrendx Sep 21, 2021

DickJC123 Sep 21, 2021

DickJC123 Sep 11, 2021

ptrendx Sep 13, 2021

DickJC123 Sep 13, 2021

ptrendx Sep 13, 2021

DickJC123 Sep 15, 2021

ptrendx commented Sep 21, 2021

mxnet-bot commented Sep 21, 2021

ptrendx commented Sep 22, 2021

mxnet-bot commented Sep 22, 2021

ptrendx commented Sep 23, 2021

mxnet-bot commented Sep 23, 2021

DickJC123 left a comment

Expand NVTX usage #18683

Expand NVTX usage #18683

Conversation

ptrendx commented Jul 9, 2020 • edited Loading

Description

Checklist

Essentials

Changes

mxnet-bot commented Jul 9, 2020

szha commented Aug 14, 2020

mxnet-bot commented Aug 14, 2020

szha commented Aug 20, 2020

ptrendx commented Aug 20, 2020

szha commented Feb 21, 2021

ptrendx commented Feb 22, 2021

ptrendx commented Sep 7, 2021

mxnet-bot commented Sep 7, 2021

DickJC123 Sep 11, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptrendx commented Sep 21, 2021

mxnet-bot commented Sep 21, 2021

ptrendx commented Sep 22, 2021

mxnet-bot commented Sep 22, 2021

ptrendx commented Sep 23, 2021

mxnet-bot commented Sep 23, 2021

DickJC123 left a comment

Choose a reason for hiding this comment

ptrendx commented Jul 9, 2020 •

edited

Loading

DickJC123 Sep 11, 2021 •

edited

Loading