Skip to content

Commit

Permalink
Ensure NVLS and NVLSTree chunksizes are matched
Browse files Browse the repository at this point in the history
This would prevent failures when NVLS size is smaller than the NVLSTree
chunksize or silent fallback to a lower value, if NCCL were to change
defaults.

Signed-off-by: Raghu Raja <[email protected]>
  • Loading branch information
rajachan committed Apr 4, 2024
1 parent c5e9b22 commit 3defe9e
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions src/platform-aws.c
Original file line number Diff line number Diff line change
Expand Up @@ -447,6 +447,9 @@ int platform_init(const char **provider_filter)
* Setting this unconditionally without relying on ncclGetVersion symbol
* being available, since the parameter did not exist in versions prior
* to v2.20.
*
* The NVLSTree chunk size can not be larger than the NVLS chunk size,
* so we ensure both are set to 512KiB.
*/
NCCL_OFI_INFO(NCCL_INIT | NCCL_NET, "Setting NCCL_NVLSTREE_MAX_CHUNKSIZE to 512KiB");
ret = setenv("NCCL_NVLSTREE_MAX_CHUNKSIZE", "524288", 0);
Expand All @@ -456,6 +459,13 @@ int platform_init(const char **provider_filter)
goto exit;
}

NCCL_OFI_INFO(NCCL_INIT | NCCL_NET, "Setting NCCL_NVLS_CHUNKSIZE to 512KiB");
ret = setenv("NCCL_NVLS_CHUNKSIZE", "524288", 0);
if (ret != 0) {
NCCL_OFI_WARN("Unable to set NCCL_NVLS_CHUNKSIZE");
ret = -errno;
goto exit;
}
#endif

/*
Expand Down

0 comments on commit 3defe9e

Please sign in to comment.