-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
Hey @ptrendx , Thanks for submitting the PR
CI supported jobs: [windows-cpu, centos-cpu, sanity, website, clang, unix-gpu, centos-gpu, miscellaneous, edge, windows-gpu, unix-cpu] Note: |
@mxnet-bot run ci [centos-cpu, miscellaneous] |
Jenkins CI successfully triggered : [centos-cpu, miscellaneous] |
@ptrendx still WIP? |
Yes, I will return back to it sometime next week probably (@DickJC123 is currently merging 1.7 into our codebase and cherry picking stuff after that merge will be easier). |
What's the status for this change? |
It is currently on hold - I got sidetracked to other things and am not sure exactly when I will have time to finish it - I still intend to do that, though. |
@mxnet-bot run ci [unix-cpu] |
Jenkins CI successfully triggered : [unix-cpu] |
python/mxnet/cuda_utils.py
Outdated
"""Provides python interface to cuda-related functions of the mxnet library.""" | ||
from .base import _LIB, mx_uint, c_str, check_call | ||
|
||
def nvtx_range_push(name, color=13323030): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't enjoy suggesting a code repetition, but isn't the following an improvement over letting python users have to make their own colors:
class Nvtx:
# Palette of colors
RED = 0xFF0000
GREEN = 0x00FF00
BLUE = 0x0000FF
YELLOW = 0xB58900
ORANGE = 0xCB4B16
RED = 0xDC322F
MAGENTA = 0xD33682
VIOLET = 0x6C71C4
BLUE1 = 0x268BD2
CYAN = 0x2AA198
GREEN = 0x859900
def nvtx_range_push(name, color=Nvtx.ORANGE):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're not fond of the code repetition, at least the following smaller edit explains the magic number:
def nvtx_range_push(name, color=0xCB4B16): # default color is orange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am refactoring this to be nicer and more Pythonic. Will send an updated version tomorrow, once I test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of new comments. First, I somehow miscopied the enums in my suggestion above. The second 'RED' should be 'RED1' to mirror what's in nvtx.h. Second, the colors set up in the backend are currently only used to assign colors by default based on a hash of the range name. Perhaps you'd consider exposing this the same way to Python? So for example,
def nvtx_range_push(name, color=None): # if color is None, then assign a color based on a hash of name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this is mainly that the standard hash in Python is not deterministic, which would make colors change on subsequent runs. Using a real hashing function like the SHA from Python's hashlib on the other hand seems to be quite a heavy hammer (and most probably introduce additional overheads, which we especially do not want to do when profiling the CPU part of the training).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I tried to suggest was that the 'None' color would be conveyed to the backend (via -1 or alternate API with no color arg), and the backend would assign the color with the existing backend nameToColor() routine.
src/common/nvtx.h
Outdated
static const uint32_t kCyan = 0x2AA198; | ||
static const uint32_t kGreen1 = 0x859900; | ||
|
||
static void gpuRangeStart(const uint32_t rgb, const std::string& range_name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know if the nvtxRangePushEx(&att) makes a copy of att.message.ascii? I suspect not, in which case I'm worried about the lifetime of range_name.c_str().
gpuRangeStart() is called with a const char *, so won't that require making a short-lived temporary string (with a copy of the char array) in the caller's environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I your concern - nvtxRangePush is not an async call. It just puts the name and a timestamp into the database and that's it (and yes, it copies the string in the process).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if nvtxRangePush copies the att.message.ascii string (as opposed to the pointer), that answers my primary concern here.
I'm left with one nit then, why don't you have const char * range_name
above, which avoids the needless string copy and keeps the type consistent between the callers (in c_api.cc and threaded_engine_perdevice.cc) and what att.message.ascii wants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a reference to a string, so there is no copying here (c_str()
does not copy either). I believe in the threaded engine the nvtx_name is a string anyway, so that is also not doing any copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In threaded engine, the following does a copy that could be avoided if everything was handled as a const char *
:
auto nvtx_name = opr_block->opr->opr_name != "" ? opr_block->opr->opr_name : "Op";
so perhaps like:
auto nvtx_name = opr_block->opr->opr_name.size() ? opr_block->opr->opr_name.c_str() : "Op";
It's a nit as I say, so I'm OK leaving it as: const std::string& range_name
. Given that though, the two overloaded nameToColor() signatures should be consistent in their use of std::string (see an additional suggestion).
@mxnet-bot run ci [unix-cpu, website] |
Jenkins CI successfully triggered : [unix-cpu, website] |
@mxnet-bot run ci [unix-cpu, website] |
Jenkins CI successfully triggered : [unix-cpu, website] |
@mxnet-bot run ci [unix-cpu] |
Jenkins CI successfully triggered : [unix-cpu] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for responding to my comments. LGTM.
Description
This PR improves the experience when profiling MXNet under nvprof or Nsight Systems, by expanding NVTX functionality.
It was used successfully for some time in the NVIDIA NGC container.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes