-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CodeGenC] Handle GlobalVar callee as internal function call #15103
[CodeGenC] Handle GlobalVar callee as internal function call #15103
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
3860422
to
2a9ddc3
Compare
Prior to this commit, the tests in `test_tir_transform_inject_ptx_async_copy.py` registered the `"tvm_callback_cuda_postproc"` function during at pytest collection time, and used a global variable to disable its functionality outside of the tests in this file. This had two major issues. First, if any other test also installs a postproc function, these tests would fail. Second, if these tests fail, the global variable controlling the postproc function would also fail, causing any subsequent CUDA-related tests to fail. This commit updates these NVPTX tests to conditionally install the postproc function, to de-register it after the test instead of disabling its functionality, and to de-register it regardless of the test result. This issue was initially found when debugging apache#15103, when a failure in `test_tir_transform_inject_ptx_async_copy.py::test_cp_async_in_if_then_else` caused failures in 32 unrelated tests ([CI link](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-15103/7/tests)).
Prior to this commit, the tests in `test_tir_transform_inject_ptx_async_copy.py` registered the `"tvm_callback_cuda_postproc"` function during pytest collection, and used a global variable to disable its functionality outside of the tests in this file. This had two major issues. First, if any other test also installs a postproc function, these postproc function required by the NVPTX tests would be overwritten. Second, if one of the NTPTX tests fails, the global variable controlling the postproc function would not be reset, causing any subsequent CUDA-related tests to also fail. This commit updates these NVPTX tests to conditionally install the postproc function, to de-register it after the test instead of disabling its functionality, and to de-register it regardless of the test result. This issue was initially found when debugging apache#15103, when a failure in `test_tir_transform_inject_ptx_async_copy.py::test_cp_async_in_if_then_else` caused failures in 32 unrelated tests ([CI link](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-15103/7/tests)).
4fd4497
to
82f0a26
Compare
4852ca2
to
60da49a
Compare
Prior to this commit, the tests in `test_tir_transform_inject_ptx_async_copy.py` registered the `"tvm_callback_cuda_postproc"` function during pytest collection, and used a global variable to disable its functionality outside of the tests in this file. This had two major issues. First, if any other test also installs a postproc function, these postproc function required by the NVPTX tests would be overwritten. Second, if one of the NTPTX tests fails, the global variable controlling the postproc function would not be reset, causing any subsequent CUDA-related tests to also fail. This commit updates these NVPTX tests to conditionally install the postproc function, to de-register it after the test instead of disabling its functionality, and to de-register it regardless of the test result. This issue was initially found when debugging apache#15103, when a failure in `test_tir_transform_inject_ptx_async_copy.py::test_cp_async_in_if_then_else` caused failures in 32 unrelated tests ([CI link](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-15103/7/tests)).
- Call `DeclareFunction` for each `PrimFunc`, prior to any `AddFunction` calls - Provide both `GlobalVar` and `PrimFunc` to `AddFunction` calls.
C's automatic pointer cast (e.g. `void*` to `int*`) means that use of the arguments to infer the function signature may be incorrect. If a `call_extern` refers to a function within the same module, only output a single forward declaration based on the PrimFunc's parameters, not based on the CallNode's arguments.
These `Call` instances can return a `PointerType(PrimType(pointee_dtype))` rather than a `PrimType(DataType::Handle())`.
33bc277
to
02e9ddf
Compare
Previously, the micro kernels for gemm, avg_pool, max_pool, and tensordot relied on C's implicit type conversions for the arguments, when the caller's argument types differ from the signature's parameter types. This works, except when the codegen has auto-generated a forward declaration based on the caller's argument types, such as during AOT, which then causes a conflicting definition. Since the codegen cannot determine the functions names from the `"pragma_import_c"` in order to suppress these forward declarations, this conflict can be more easily resolved by updating the micro kernel signatures. The three types of mismatches are below. - Use of `int` or `long` parameters, whose width may vary by compiler, instead of fixed-width types. - TIR expecting the data array's integer type to also be used as an error code's return type, rather than the micro kernels' `int32_t` error code. - Pointer conversion done during argument conversion. Type conversions are done at the start of each micro kernel, to avoid changing types that are used within the computational sections of each micro kernel.
02e9ddf
to
2bf2313
Compare
…#15136) Prior to this commit, the tests in `test_tir_transform_inject_ptx_async_copy.py` registered the `"tvm_callback_cuda_postproc"` function during pytest collection, and used a global variable to disable its functionality outside of the tests in this file. This had two major issues. First, if any other test also installs a postproc function, these postproc function required by the NVPTX tests would be overwritten. Second, if one of the NTPTX tests fails, the global variable controlling the postproc function would not be reset, causing any subsequent CUDA-related tests to also fail. This commit updates these NVPTX tests to conditionally install the postproc function, to de-register it after the test instead of disabling its functionality, and to de-register it regardless of the test result. This issue was initially found when debugging apache#15103, when a failure in `test_tir_transform_inject_ptx_async_copy.py::test_cp_async_in_if_then_else` caused failures in 32 unrelated tests ([CI link](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-15103/7/tests)).
Required for internal functions after PR apache#15214
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with only a few questions and nitpicks.
The tests in `tests/python/relay/aot` were run as part of task_python_integration.sh which is run for `ci-cpu` and `ci-arm`, but some of tests in that folder require CMSIS-NN or FVP, so in practice they were not running anywhere in the upstream CI. Since all the tests in that folder could run on `ci-cortexm`, change the scripts run them there. Also fix some tests that started failing as a result of apache#15103 (but didn't manifest in upstream CI) and `test_device_api_hooks_unpacked_api` that clearly hasn't been running in any CI for a while.
[TEST] Run tests in tests/python/relay/aot in `ci-cortexm` The tests in `tests/python/relay/aot` were run as part of task_python_integration.sh which is run for `ci-cpu` and `ci-arm`, but some of tests in that folder require CMSIS-NN or FVP, so in practice they were not running anywhere in the upstream CI. Since all the tests in that folder could run on `ci-cortexm`, change the scripts run them there. Also fix some tests that started failing as a result of #15103 (but didn't manifest in upstream CI) and `test_device_api_hooks_unpacked_api` that clearly hasn't been running in any CI for a while.
Hello @Lunderberg, unfortunately this PR/commit completely breaks our Metal codegen. You are able to reproduce this issue by following the steps on Mac below:
Then you will see the following error:
It says that the generated Metal code cannot be parsed. By reverting this commit, I am able to get this test run (though it will fail on numerical comparison due to precision issue, which is a nonissue for here). Could you make a fix for this issue? I appreciate it if the problem can get resolved shortly. Thank you! |
We might need some basic testings for Metal to prevent this issue from happening...For now, to quickly ensure that MLC LLM is not impacted, @MasterJH5574 shall we submit a PR to temporarily revert this commit? Thanks a lot! |
…pache#15103)" This reverts commit 9ff71f4.
Prior to this commit, the CI compiled TVM with `USE_METAL=ON` on OSX, as defined in `conda/recipe/build.sh`, but did not validate the execution of any generated metal kernels. As a result, breakage could occur without being caught by the CI, such as found following apache#15103. This commit adds the execution of a single metal kernel as a minimal functionality test of the metal backend.
* [Unittest][Metal] Add minimal metal functionality test to CI Prior to this commit, the CI compiled TVM with `USE_METAL=ON` on OSX, as defined in `conda/recipe/build.sh`, but did not validate the execution of any generated metal kernels. As a result, breakage could occur without being caught by the CI, such as found following #15103. This commit adds the execution of a single metal kernel as a minimal functionality test of the metal backend. * CI testing, attempt a compile-only test case * CI testing, moved intentional failure from test-case to contrib.xcode * Move intentional failure point into codegen * ci bump * Removing the intentional failure during metallib compilation
This reverts commit [`e88d0d`](apache#15725), which itself reverted [`9ff71f`](apache#15103) for breakages on the metal backend. Now that the CI contains compile-time testing of the metal codegen, the original breakage should be identifiable.
…15835) * [CodeGenC][Redo] Handle GlobalVar callee as internal function call This reverts commit [`e88d0d`](#15725), which itself reverted [`9ff71f`](#15103) for breakages on the metal backend. Now that the CI contains compile-time testing of the metal codegen, the original breakage should be identifiable. * Added codegen metal CI debug print * Print function decl to the argument stream * Remove the codegen metal CI debug print-outs
Analogous to #14901, treat
GlobalVar
callees as internal function calls inCodeGenC
. This specific PR doesn't provide new end-to-end functionality, as thetarget="c"
backend isn't compiled. It does lead into allowing subroutines in any target whose codegen derives fromCodeGenC
, which will depend on the single-module lowering flow in #14985.