Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Expand NVTX usage #18683

Merged
merged 7 commits into from
Sep 23, 2021
Merged

Expand NVTX usage #18683

merged 7 commits into from
Sep 23, 2021

Conversation

ptrendx
Copy link
Member

@ptrendx ptrendx commented Jul 9, 2020

Description

This PR improves the experience when profiling MXNet under nvprof or Nsight Systems, by expanding NVTX functionality.
It was used successfully for some time in the NVIDIA NGC container.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Added C API for NVTX ranges push/pop
  • Added C API to start/stop profiling via nvprof/NSight Systems
  • Added Python API corresponding to the C API
  • Added automatic ranges to MXNet operators

@ptrendx ptrendx requested a review from DickJC123 July 9, 2020 21:14
@ptrendx ptrendx requested a review from szha as a code owner July 9, 2020 21:14
@mxnet-bot
Copy link

Hey @ptrendx , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [windows-cpu, centos-cpu, sanity, website, clang, unix-gpu, centos-gpu, miscellaneous, edge, windows-gpu, unix-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@szha
Copy link
Member

szha commented Aug 14, 2020

@mxnet-bot run ci [centos-cpu, miscellaneous]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-cpu, miscellaneous]

@szha
Copy link
Member

szha commented Aug 20, 2020

@ptrendx still WIP?

@ptrendx
Copy link
Member Author

ptrendx commented Aug 20, 2020

Yes, I will return back to it sometime next week probably (@DickJC123 is currently merging 1.7 into our codebase and cherry picking stuff after that merge will be easier).

@lanking520 lanking520 added the pr-work-in-progress PR is still work in progress label Nov 2, 2020
@szha
Copy link
Member

szha commented Feb 21, 2021

What's the status for this change?

@ptrendx
Copy link
Member Author

ptrendx commented Feb 22, 2021

It is currently on hold - I got sidetracked to other things and am not sure exactly when I will have time to finish it - I still intend to do that, though.

@ptrendx ptrendx changed the title [WIP] Expand NVTX usage Expand NVTX usage Sep 7, 2021
@ptrendx
Copy link
Member Author

ptrendx commented Sep 7, 2021

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 7, 2021
"""Provides python interface to cuda-related functions of the mxnet library."""
from .base import _LIB, mx_uint, c_str, check_call

def nvtx_range_push(name, color=13323030):
Copy link
Contributor

@DickJC123 DickJC123 Sep 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't enjoy suggesting a code repetition, but isn't the following an improvement over letting python users have to make their own colors:

class Nvtx:
    # Palette of colors
    RED = 0xFF0000
    GREEN = 0x00FF00
    BLUE = 0x0000FF
    YELLOW = 0xB58900
    ORANGE = 0xCB4B16
    RED = 0xDC322F
    MAGENTA = 0xD33682
    VIOLET = 0x6C71C4
    BLUE1 = 0x268BD2
    CYAN = 0x2AA198
    GREEN = 0x859900

def nvtx_range_push(name, color=Nvtx.ORANGE):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're not fond of the code repetition, at least the following smaller edit explains the magic number:

def nvtx_range_push(name, color=0xCB4B16):    # default color is orange

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am refactoring this to be nicer and more Pythonic. Will send an updated version tomorrow, once I test it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of new comments. First, I somehow miscopied the enums in my suggestion above. The second 'RED' should be 'RED1' to mirror what's in nvtx.h. Second, the colors set up in the backend are currently only used to assign colors by default based on a hash of the range name. Perhaps you'd consider exposing this the same way to Python? So for example,

def nvtx_range_push(name, color=None):    # if color is None, then assign a color based on a hash of name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is mainly that the standard hash in Python is not deterministic, which would make colors change on subsequent runs. Using a real hashing function like the SHA from Python's hashlib on the other hand seems to be quite a heavy hammer (and most probably introduce additional overheads, which we especially do not want to do when profiling the CPU part of the training).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I tried to suggest was that the 'None' color would be conveyed to the backend (via -1 or alternate API with no color arg), and the backend would assign the color with the existing backend nameToColor() routine.

static const uint32_t kCyan = 0x2AA198;
static const uint32_t kGreen1 = 0x859900;

static void gpuRangeStart(const uint32_t rgb, const std::string& range_name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if the nvtxRangePushEx(&att) makes a copy of att.message.ascii? I suspect not, in which case I'm worried about the lifetime of range_name.c_str().

gpuRangeStart() is called with a const char *, so won't that require making a short-lived temporary string (with a copy of the char array) in the caller's environment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I your concern - nvtxRangePush is not an async call. It just puts the name and a timestamp into the database and that's it (and yes, it copies the string in the process).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if nvtxRangePush copies the att.message.ascii string (as opposed to the pointer), that answers my primary concern here.

I'm left with one nit then, why don't you have const char * range_name above, which avoids the needless string copy and keeps the type consistent between the callers (in c_api.cc and threaded_engine_perdevice.cc) and what att.message.ascii wants?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a reference to a string, so there is no copying here (c_str() does not copy either). I believe in the threaded engine the nvtx_name is a string anyway, so that is also not doing any copies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In threaded engine, the following does a copy that could be avoided if everything was handled as a const char *:

auto nvtx_name = opr_block->opr->opr_name != "" ? opr_block->opr->opr_name : "Op";

so perhaps like:

auto nvtx_name = opr_block->opr->opr_name.size() ? opr_block->opr->opr_name.c_str() : "Op";

It's a nit as I say, so I'm OK leaving it as: const std::string& range_name. Given that though, the two overloaded nameToColor() signatures should be consistent in their use of std::string (see an additional suggestion).

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-review PR is waiting for code review labels Sep 14, 2021
@mseth10 mseth10 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 14, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 21, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 21, 2021
@ptrendx
Copy link
Member Author

ptrendx commented Sep 21, 2021

@mxnet-bot run ci [unix-cpu, website]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, website]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 21, 2021
@ptrendx
Copy link
Member Author

ptrendx commented Sep 22, 2021

@mxnet-bot run ci [unix-cpu, website]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, website]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 22, 2021
@ptrendx
Copy link
Member Author

ptrendx commented Sep 23, 2021

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 23, 2021
Copy link
Contributor

@DickJC123 DickJC123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for responding to my comments. LGTM.

@DickJC123 DickJC123 merged commit c1e06aa into apache:master Sep 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants