-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cling PR 284 #4616
cling PR 284 #4616
Conversation
…ices - change CUDA to CUDAHost and add CUDADevice to the InvocationOptions - in the PTX mode, some ASTTransformer will not be used, which are useful for the x86 mode
This ASTTransformer adds an inline attribute to any CUDA __device__ kernel that does not have the attribute. Inlining solves a problem caused by incremental compilation of PTX code. In a normal compiler, all definitions of __global__ and __device__ kernels are in the same translation unit. In the incremental compiler, each kernel has its own translation unit. In case a __global__ kernel uses a __device__ function, this design caused an error. Instead of generating the PTX code of the __device__ kernel in the same file as the __global__ kernel, there is only an external declaration of the __device__ function. However, normal PTX code does not support an external declaration of functions. The transformer only works if the target device is nvptx.
Replaced the old version of the PTX compiler which used external tools and llvm::ExecuteAndWait with an internal implementation. The new incremental PTX compiler uses a modified version of the cling::Interpreter instance. The instance can process the PTX built-ins and generates LLVM IR. The LLVM IR is be compiled to PTX via an additional NVPTX backend implemented in the IncrementalCUDADeviceCompiler. The new implementation has many advantages: - it's much faster than the old version - less error-prone because the ASTPrinter and some unnecessary cling transformations are avoided - reduction of problems in searching for external tools (can be very complicated on HPC systems) The IncrementalCUDADeviceCompiler is moved from the cling::IncrementalParser to the cling::Interpreter, because the second interpreter needs the input without wrappers or transformations.
- instead of using the NVIDIA tool fatbin, the fatbin is now generated directly in the cling - clean up the IncrementalCUDADeviceCompiler class depending on the new fatbin implementation (e.g. remove findToolChain()) - last I/O-operation is required -> write the fatbin code to a file for the CodeGen
- it is more similar to the interface of cling::Interpreter - replace function compileDeviceCode() with process() - add declare() and parse() functions - the functions have only the argument input, because the rest of the missing arguments (e.g. Transaction) requires modifications at the transaction system - it also fixes a bug in the I/O system of the xeus-cling kernel
- a really weak solution, which should replaced by a generic solution
- add Author to CUDA test cases - optimize DeviceKernelInliner - improve some comments - remove deprecated opt level variables - change interface of IncrementalCUDADeviceCompiler::process() IncrementalCUDADeviceCompiler::declare()
Starting build on |
Build failed on ROOT-ubuntu16/rtcxxmod. Errors:
|
Build failed on ROOT-fedora30/cxx14. Errors:
|
Build failed on ROOT-fedora27/noimt. Errors:
|
- the buffer is needed to send the fatbinary code from the device JIT to the host JIT without file I/O - the modification was needed because the backend usse the function llvm::MemoryBuffer::getFileOrSTDIN() which does not support a virtual file system - behavior: If the buffer is valid, use the buffer. Otherwise load fatbinary code from file.
Starting build on |
@SimeonEhrig you said (in root-project/cling#284 (comment) )
Yet the errors above seem to come from a missing llvm patch. Could you clarify? |
Build failed on ROOT-fedora30/cxx14. Errors:
|
Build failed on ROOT-performance-centos7-multicore/default. Errors:
|
Build failed on ROOT-ubuntu16/rtcxxmod. Errors:
|
Build failed on ROOT-fedora27/noimt. Errors:
|
Build failed on mac1014/cxx17. Errors:
|
Build failed on ROOT-ubuntu18.04-i386/cxx14. Errors:
Warnings:
|
Build failed on windows10/cxx14. Errors:
|
Build failed on ROOT-fedora29/python3. Errors:
|
I've removed any part that requires changes to the Clang base. The error occurs because the nvptx is not enabled. I'm not familiar with the root build process, so I need a little time to check it. |
I see there is a mistake at the PR. The commit fe12679 is no longer needed. |
I have developed a solution to enable the NVPTX backend in Root by default. I have also implemented a check in the CMakeLists.txt of Cling that throw an error if the NVPTX backend is not enabled. Unfortunately, I don't know how to push commits to this PR, so I forked your repo and modified the PR-284 branch: https://github.com/SimeonEhrig/root/tree/PR-284 Attention: I have removed the last commit of your branch. |
@phsft-bot build! |
Starting build on |
Build failed on ROOT-fedora30/cxx14. Errors:
|
Build failed on ROOT-ubuntu16/rtcxxmod. Errors:
|
Build failed on ROOT-fedora27/noimt. Errors:
|
Build failed on mac1014/cxx17. Errors:
|
Build failed on ROOT-fedora29/python3. Errors:
|
Build failed on ROOT-performance-centos7-multicore/default. Errors:
|
Build failed on windows10/cxx14. Errors:
|
Build failed on ROOT-ubuntu18.04-i386/cxx14. Errors:
Warnings:
|
Superseded by #4618 |
See root-project/cling#284