-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCCL_PARAM per ncclComm #225
Comments
|
@pietern , can you just disable the caching in NCCL_PARAM for your testing? If not, perhaps have a global counter... when it increases then all the NCCL_PARAMs can know to reload from cache. Something like: extern uint32_t globalReset;
#define NCCL_PARAM(name, env, deftVal) \
int64_t ncclParam##name() { \
constexpr int64_t uninitialized = INT64_MIN; \
static_assert(deftVal != uninitialized, "default value cannot be the uninitialized value."); \
static int64_t cache = uninitialized; \
static uint32_t resetCount = 0; \
if (resetCount < globalReset) cache = uninitialized;
if (__builtin_expect(__atomic_load_n(&cache, __ATOMIC_RELAXED) == uninitialized, false)) { \
ncclLoadParam("NCCL_" env, deftVal, uninitialized, &cache); \
} \
return cache; \
}
void resetParams() {
++globalReset;
} Then in src/misc/param.cc: uint32_t globalReset = 0; Note that the ncclLoadParam function also checks cache, so if you just disable it then you'll need to disable it there, too. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently the NCCL_PARAM macro uses a static variable to store the value and initializes it only once. This means that changing them at runtime is not possible (e.g. if you want run a sweep to find optimal settings). It would be great to change these on the fly. Since creating a new communicator isn't in the fast path it should be possible to make it non-static and load it every time a new communicator is created.
The text was updated successfully, but these errors were encountered: