Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There are several problems with the current PyTorch CodeGen.
First, the emitted
.pytorch.h
file incorrectly exposes functions with internal linkage (which is inconsistent with the C codegen), and leads to compile errors when actually compiling the header file. I added a check for internal linkage to fix this.Next is the problems when using CUDA. Currently when using the generated
.pytorch.h
file, the linker will complain that there is undefined reference to a symbol related tohalide_cuda_device_interface
(the name is mangled when reported by the linker). This is because the forward declaration was not in aextern "C"
block.Moreover, there is a really subtle problem when the generated pipeline is invoked by PyTorch (which I actually have described in Matrix). The Halide pipeline runs in a different CUDA stream than that of PyTorch kernels. Afterwards I discovered that
HalidePyTorchCudaHelpers.h
was not included asapps/HelloPyTorch/setup.py
did, and weakly-linked CUDA handles are not overridden. I fix this by simply including this header in the codegen.