Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix][Target] Added null check to fix segfault at ->defined() in cpu.cc DetectSystemTriple() #16766

Merged
merged 8 commits into from
Apr 11, 2024

Conversation

OttoWRas
Copy link
Contributor

I ran into a problem running a very simple ONNX compile, i would get a segfault at a FoldConstantExpr() call from TVMC. This only happens if the compile flag set(USE_LLVM OFF) is OFF.

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007fffc94ac78c in tvm::runtime::ObjectPtr<tvm::runtime::Object>::operator!=(decltype(nullptr)) const (this=0x0, null=<error reading variable: Attempt to dereference a generic pointer.>) at /home/otto/tvm/include/tvm/runtime/object.h:470
470       bool operator!=(std::nullptr_t null) const { return data_ != null; }

I had compiled TVM Using GCC:

COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) 

This was caused by a call to defined() from DetectSystemTriple() in cpu.cc that was added in #16513. When the previous call

auto pf = tvm::runtime::Registry::Get("target.llvm_get_system_triple");

would fail, and return null. The consecutive call to defined() would segfault after trying to dereference the null value. This commit adds a check to see if the function pointer is null. This might not be the best solution, but it worked for me, so it might also help someone else struggling with this. Please suggest a better solution, if you know one.

@lhutton1
Copy link
Contributor

Thanks for fixing @OttoWRas, would it be possible to add a test somewhere around here for when LLVM is not available?

@OttoWRas
Copy link
Contributor Author

I just tried to compile the make cpptest without LLVM and it even fails compiling, with a missing tvm::codegen::LLVMTargetInfo and tvm::codegen::LLVMInstance. So I added some compile guarding around the test including LLVM. This also seemed like a better solution for the initial problem, so i changed that fix to a compile guard.

I'll look into adding a test here

@lhutton1
Copy link
Contributor

lhutton1 commented Apr 8, 2024

Hi @OttoWRas, did you get chance to look at adding a test? I should have some time to take a look in the next couple of days if not

@OttoWRas
Copy link
Contributor Author

OttoWRas commented Apr 8, 2024

Hi @lhutton1, I just fixed the test, it actually seemed to already fail with out LLVM, but i just wasn't good enough at using the test system to realize. I added an ASSERT, instead of the previous check to make the test actually test something.

But looking back on this issue, it seems to me like, defined() was either used wrong here, or it should be changed to be able to handle null and not only null_pointer's.

I hope this was what you meant, otherwise feel free to make your own changes.

Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for persisting with this fix @OttoWRas! Overall your changes make sense to me - apologies for introducing the issues in the first place!

I had a closer look at the changes and left some comments which should help get it to pass CI. Let me know if they don't make sense, I'm happy to expand further on them.

src/target/parsers/cpu.cc Outdated Show resolved Hide resolved
src/target/parsers/cpu.cc Outdated Show resolved Hide resolved
tests/cpp/target_test.cc Show resolved Hide resolved
tests/cpp/target_test.cc Outdated Show resolved Hide resolved
src/target/parsers/cpu.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @OttoWRas, LGTM!

@OttoWRas
Copy link
Contributor Author

Thanks for being indulgent @lhutton1, even though it got a bit messy. I think it's all there now

@lhutton1 lhutton1 merged commit c67a055 into apache:main Apr 11, 2024
20 checks passed
@lhutton1
Copy link
Contributor

Thanks @OttoWRas!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants