Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: [UB] memory_peers_paths is missing null terminator causing out of bound reads #1469

Closed
LunNova opened this issue Dec 18, 2024 · 0 comments · Fixed by #1470
Closed

Comments

@LunNova
Copy link
Contributor

LunNova commented Dec 18, 2024

ncclIbGdrSupport reads out of bounds if none of the files in memory_peers_paths exist because it relies on a null entry at the end which is missing. This can cause a confusing crash on systems on a mainline Linux kernel.

const char* memory_peers_paths[] = {"/sys/kernel/mm/memory_peers/amdkfd/version",
"/sys/kernel/memory_peers/amdkfd/version",
"/sys/memory_peers/amdkfd/version"};
int i = 0;
while (memory_peers_paths[i]) {
if (access(memory_peers_paths[i], F_OK) == 0) {
moduleLoaded = 1;
INFO(NCCL_INIT,"Found %s", memory_peers_paths[i]);
break;
} else {
moduleLoaded = 0;
}
++i;
}

/build/source/build/hipify/src/transport/net_ib.cc:592:12: runtime error: index 3 out of bounds for type 'const char *[3]'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /build/source/build/hipify/src/transport/net_ib.cc:592:12 
=================================================================
==4070438==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ff82df5b038 at pc 0x7fff7c15a3b5 bp 0x7ff8312e28f0 sp 0x7ff8312e28e8
READ of size 8 at 0x7ff82df5b038 thread T6
    #0 0x7fff7c15a3b4 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:592:12
    #1 0x7fff7c15adfe in ncclIbGetProperties(int, ncclNetProperties_v8_t*) /build/source/build/hipify/src/transport/net_ib.cc:688:7
    #2 0x7fff7bf38198 in ncclNetCheckDeviceVersion(ncclComm*, ncclNet_v8_t*, int) /build/source/build/hipify/src/net.cc:501:3
    #3 0x7fff7bf388d5 in ncclNetInit(ncclComm*) /build/source/build/hipify/src/net.cc:562:24
    #4 0x7fff7bf0b0ad in commAlloc(ncclComm*, ncclComm*, int, int) /build/source/build/hipify/src/init.cc:533:3
    #5 0x7fff7bf00395 in ncclCommInitRankFunc(ncclAsyncJob*) /build/source/build/hipify/src/init.cc:2002:5
    #6 0x7fff7bee824e in ncclAsyncJobMain(void*) /build/source/build/hipify/src/group.cc:67:17
    #7 0x7ffff749f0d4 in asan_thread_start(void*) (/nix/store/7r6z6nb443psc1ghiyjlqmhwkll7wiia-clr-6.3.0/llvm/lib/linux/libclang_rt.asan-x86_64.so+0x9f0d4)
    #8 0x7ffff69b0d01 in start_thread (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x90d01) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e)
    #9 0x7ffff6a303ab in __GI___clone3 (/nix/store/pacbfvpzqz2mksby36awvbcn051zcji3-glibc-2.40-36/lib/libc.so.6+0x1103ab) (BuildId: 2de6548b3bd2f2857c3c1d5f85e5e817ce2c4a7e)

Address 0x7ff82df5b038 is located in stack of thread T6 at offset 56 in frame
    #0 0x7fff7c159ae7 in ncclIbGdrSupport() /build/source/build/hipify/src/transport/net_ib.cc:568

  This frame has 3 object(s):
    [32, 56) 'memory_peers_paths' (line 587) <== Memory access at offset 56 overflows this variable
    [96, 351) 'strValue' (line 603)
    [416, 672) 'buf' (line 613)

Related to #1454

@LunNova LunNova changed the title [UB] memory_peers_paths is missing null terminator causing out of bound reads [Issue]: [UB] memory_peers_paths is missing null terminator causing out of bound reads Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant