[CodeGenC] Handle GlobalVar callee as internal function call #15103

Lunderberg · 2023-06-14T16:10:52Z

Analogous to #14901, treat GlobalVar callees as internal function calls in CodeGenC. This specific PR doesn't provide new end-to-end functionality, as the target="c" backend isn't compiled. It does lead into allowing subroutines in any target whose codegen derives from CodeGenC, which will depend on the single-module lowering flow in #14985.

tvm-bot · 2023-06-14T16:10:56Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

No users to tag found in teams: codegenc _{See #10317 for details}

_{Generated by tvm-bot}

Prior to this commit, the tests in `test_tir_transform_inject_ptx_async_copy.py` registered the `"tvm_callback_cuda_postproc"` function during at pytest collection time, and used a global variable to disable its functionality outside of the tests in this file. This had two major issues. First, if any other test also installs a postproc function, these tests would fail. Second, if these tests fail, the global variable controlling the postproc function would also fail, causing any subsequent CUDA-related tests to fail. This commit updates these NVPTX tests to conditionally install the postproc function, to de-register it after the test instead of disabling its functionality, and to de-register it regardless of the test result. This issue was initially found when debugging apache#15103, when a failure in `test_tir_transform_inject_ptx_async_copy.py::test_cp_async_in_if_then_else` caused failures in 32 unrelated tests ([CI link](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-15103/7/tests)).

Prior to this commit, the tests in `test_tir_transform_inject_ptx_async_copy.py` registered the `"tvm_callback_cuda_postproc"` function during pytest collection, and used a global variable to disable its functionality outside of the tests in this file. This had two major issues. First, if any other test also installs a postproc function, these postproc function required by the NVPTX tests would be overwritten. Second, if one of the NTPTX tests fails, the global variable controlling the postproc function would not be reset, causing any subsequent CUDA-related tests to also fail. This commit updates these NVPTX tests to conditionally install the postproc function, to de-register it after the test instead of disabling its functionality, and to de-register it regardless of the test result. This issue was initially found when debugging apache#15103, when a failure in `test_tir_transform_inject_ptx_async_copy.py::test_cp_async_in_if_then_else` caused failures in 32 unrelated tests ([CI link](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-15103/7/tests)).

- Call `DeclareFunction` for each `PrimFunc`, prior to any `AddFunction` calls - Provide both `GlobalVar` and `PrimFunc` to `AddFunction` calls.

C's automatic pointer cast (e.g. `void*` to `int*`) means that use of the arguments to infer the function signature may be incorrect. If a `call_extern` refers to a function within the same module, only output a single forward declaration based on the PrimFunc's parameters, not based on the CallNode's arguments.

These `Call` instances can return a `PointerType(PrimType(pointee_dtype))` rather than a `PrimType(DataType::Handle())`.

Previously, the micro kernels for gemm, avg_pool, max_pool, and tensordot relied on C's implicit type conversions for the arguments, when the caller's argument types differ from the signature's parameter types. This works, except when the codegen has auto-generated a forward declaration based on the caller's argument types, such as during AOT, which then causes a conflicting definition. Since the codegen cannot determine the functions names from the `"pragma_import_c"` in order to suppress these forward declarations, this conflict can be more easily resolved by updating the micro kernel signatures. The three types of mismatches are below. - Use of `int` or `long` parameters, whose width may vary by compiler, instead of fixed-width types. - TIR expecting the data array's integer type to also be used as an error code's return type, rather than the micro kernels' `int32_t` error code. - Pointer conversion done during argument conversion. Type conversions are done at the start of each micro kernel, to avoid changing types that are used within the computational sections of each micro kernel.

…#15136) Prior to this commit, the tests in `test_tir_transform_inject_ptx_async_copy.py` registered the `"tvm_callback_cuda_postproc"` function during pytest collection, and used a global variable to disable its functionality outside of the tests in this file. This had two major issues. First, if any other test also installs a postproc function, these postproc function required by the NVPTX tests would be overwritten. Second, if one of the NTPTX tests fails, the global variable controlling the postproc function would not be reset, causing any subsequent CUDA-related tests to also fail. This commit updates these NVPTX tests to conditionally install the postproc function, to de-register it after the test instead of disabling its functionality, and to de-register it regardless of the test result. This issue was initially found when debugging apache#15103, when a failure in `test_tir_transform_inject_ptx_async_copy.py::test_cp_async_in_if_then_else` caused failures in 32 unrelated tests ([CI link](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-15103/7/tests)).

Required for internal functions after PR apache#15214

csullivan

Approved with only a few questions and nitpicks.

python/tvm/topi/arm_cpu/mprofile/dsp/micro_kernel/avg_pool.py

src/target/source/codegen_c.h

tests/python/relay/aot/test_crt_forward_declarations.py

The tests in `tests/python/relay/aot` were run as part of task_python_integration.sh which is run for `ci-cpu` and `ci-arm`, but some of tests in that folder require CMSIS-NN or FVP, so in practice they were not running anywhere in the upstream CI. Since all the tests in that folder could run on `ci-cortexm`, change the scripts run them there. Also fix some tests that started failing as a result of apache#15103 (but didn't manifest in upstream CI) and `test_device_api_hooks_unpacked_api` that clearly hasn't been running in any CI for a while.

[TEST] Run tests in tests/python/relay/aot in `ci-cortexm` The tests in `tests/python/relay/aot` were run as part of task_python_integration.sh which is run for `ci-cpu` and `ci-arm`, but some of tests in that folder require CMSIS-NN or FVP, so in practice they were not running anywhere in the upstream CI. Since all the tests in that folder could run on `ci-cortexm`, change the scripts run them there. Also fix some tests that started failing as a result of #15103 (but didn't manifest in upstream CI) and `test_device_api_hooks_unpacked_api` that clearly hasn't been running in any CI for a while.

MasterJH5574 · 2023-09-12T00:29:33Z

Hello @Lunderberg, unfortunately this PR/commit completely breaks our Metal codegen.

You are able to reproduce this issue by following the steps on Mac below:

open tests/python/unittest/test_allreduce_cuda.py (I just picked an arbitrary test for CUDA),
replace all occurrences of cuda with metal,
run this test_allreduce_cuda.py.

Then you will see the following error:

~/W/tvm/t/p/unittest @4d7e93cd *1 !2 ❯ python test_allreduce_cuda.py                                 python310 ruihang-macstudio@GS22385 08:12:05 PM
Traceback (most recent call last):
  File "/Users/ruihang-macstudio/Workspace/tvm/tests/python/unittest/test_allreduce_cuda.py", line 104, in <module>
    test_allreduce_metal()
  File "/Users/ruihang-macstudio/Workspace/tvm/tests/python/unittest/test_allreduce_cuda.py", line 99, in test_allreduce_metal
    check_sum(d1, d2, d3)
  File "/Users/ruihang-macstudio/Workspace/tvm/tests/python/unittest/test_allreduce_cuda.py", line 69, in check_sum
    f(a, b)
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/runtime/module.py", line 201, in __call__
    return self.entry_func(*args)
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__
    raise_last_ffi_error()
  File "/Users/ruihang-macstudio/Workspace/tvm/python/tvm/_ffi/base.py", line 476, in raise_last_ffi_error
    raise py_err
  File "/Users/ruihang-macstudio/Workspace/tvm/src/runtime/library_module.cc", line 76, in tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
    ICHECK_EQ(ret, 0) << TVMGetLastError();
tvm.error.InternalError: Traceback (most recent call last):
  0: tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/library_module.cc:76
  6: tvm::runtime::ModuleNode::GetFuncFromEnv(tvm::runtime::String const&)
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/module.cc:114
  5: tvm::runtime::ModuleNode::GetFunction(tvm::runtime::String const&, bool)
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/module.cc:66
  4: tvm::runtime::MetalModuleNode::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/metal/metal_module.mm:246
  3: void tvm::runtime::metal::AutoReleasePoolWrapper::operator<<<tvm::runtime::MetalModuleNode::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0>(tvm::runtime::MetalModuleNode::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0 const&)
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/metal/metal_common.h:89
  2: tvm::runtime::MetalModuleNode::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_0::operator()() const
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/metal/metal_module.mm:257
  1: tvm::runtime::MetalWrappedFunc::Init(tvm::runtime::MetalModuleNode*, tvm::runtime::ObjectPtr<tvm::runtime::Object>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long, unsigned long, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&)
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/metal/metal_module.mm:187
  0: tvm::runtime::MetalModuleNode::GetPipelineState(unsigned long, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
        at /Users/ruihang-macstudio/Workspace/tvm/src/runtime/metal/metal_module.mm:109
  File "/Users/ruihang-macstudio/Workspace/tvm/src/runtime/metal/metal_module.mm", line 109
  File "/Users/ruihang-macstudio/Workspace/tvm/src/runtime/library_module.cc", line 76
InternalError: Check failed: ret == 0 (-1 vs. 0) : TVMError: Fail to compile metal source:program_source:10:1: error: expected parameter declarator
;
^
program_source:10:1: error: expected ')'
program_source:8:36: note: to match this '('
kernel void default_function_kernel(  device float* A [[ buffer(0) ]],
                                   ^
program_source:8:58: warning: 'buffer' attribute ignored on function declaration
kernel void default_function_kernel(  device float* A [[ buffer(0) ]],
                                                         ^
program_source:9:22: warning: 'buffer' attribute ignored on function declaration
  device float* B [[ buffer(1) ]],
                     ^
program_source:11:20: error: 'threadgroup_position_in_grid' attribute only applies to parameters
  uint3 blockIdx [[threadgroup_position_in_grid]],
                   ^
program_source:11:9: error: program scope variable must reside in constant address space
  uint3 blockIdx [[threadgroup_position_in_grid]],
        ^
program_source:11:50: error: expected ';' at end of declaration
  uint3 blockIdx [[threadgroup_position_in_grid]],
                                                 ^
                                                 ;
program_source:12:21: error: 'thread_position_in_threadgroup' attribute only applies to parameters
  uint3 threadIdx [[thread_position_in_threadgroup]]
                    ^
program_source:12:9: error: program scope variable must reside in constant address space
  uint3 threadIdx [[thread_position_in_threadgroup]]
        ^
program_source:12:53: error: expected ';' after top level declarator
  uint3 threadIdx [[thread_position_in_threadgroup]]
                                                    ^
                                                    ;

It says that the generated Metal code cannot be parsed.

By reverting this commit, I am able to get this test run (though it will fail on numerical comparison due to precision issue, which is a nonissue for here).

Could you make a fix for this issue? I appreciate it if the problem can get resolved shortly. Thank you!

junrushao · 2023-09-12T00:31:30Z

We might need some basic testings for Metal to prevent this issue from happening...For now, to quickly ensure that MLC LLM is not impacted, @MasterJH5574 shall we submit a PR to temporarily revert this commit? Thanks a lot!

…15103)" This reverts commit 9ff71f4.

…#15725) Revert "[CodeGenC] Handle GlobalVar callee as internal function call (#15103)" This reverts commit 9ff71f4, a recent change that breaks the Metal backend.

…pache#15103)" This reverts commit 9ff71f4.

Prior to this commit, the CI compiled TVM with `USE_METAL=ON` on OSX, as defined in `conda/recipe/build.sh`, but did not validate the execution of any generated metal kernels. As a result, breakage could occur without being caught by the CI, such as found following apache#15103. This commit adds the execution of a single metal kernel as a minimal functionality test of the metal backend.

* [Unittest][Metal] Add minimal metal functionality test to CI Prior to this commit, the CI compiled TVM with `USE_METAL=ON` on OSX, as defined in `conda/recipe/build.sh`, but did not validate the execution of any generated metal kernels. As a result, breakage could occur without being caught by the CI, such as found following #15103. This commit adds the execution of a single metal kernel as a minimal functionality test of the metal backend. * CI testing, attempt a compile-only test case * CI testing, moved intentional failure from test-case to contrib.xcode * Move intentional failure point into codegen * ci bump * Removing the intentional failure during metallib compilation

This reverts commit [`e88d0d`](apache#15725), which itself reverted [`9ff71f`](apache#15103) for breakages on the metal backend. Now that the CI contains compile-time testing of the metal codegen, the original breakage should be identifiable.

…15835) * [CodeGenC][Redo] Handle GlobalVar callee as internal function call This reverts commit [`e88d0d`](#15725), which itself reverted [`9ff71f`](#15103) for breakages on the metal backend. Now that the CI contains compile-time testing of the metal codegen, the original breakage should be identifiable. * Added codegen metal CI debug print * Print function decl to the argument stream * Remove the codegen metal CI debug print-outs

Lunderberg force-pushed the codegen_c_support_subroutine_calls branch 4 times, most recently from 3860422 to 2a9ddc3 Compare June 21, 2023 14:10

Lunderberg mentioned this pull request Jun 21, 2023

[UnitTest][NVPTX] Avoid cascading failures from CUDA postproc #15136

Merged

Lunderberg force-pushed the codegen_c_support_subroutine_calls branch from 4fd4497 to 82f0a26 Compare June 21, 2023 21:01

Lunderberg force-pushed the codegen_c_support_subroutine_calls branch from 4852ca2 to 60da49a Compare July 3, 2023 16:28

Lunderberg added 9 commits July 5, 2023 07:58

[CodeGenC] Added unit tests for desired behavior

c5eaad0

[CodeGenC] Handle GlobalVar callee as internal function call

c58867a

Update CodeGenC subclasses for updated interface

864e311

- Call `DeclareFunction` for each `PrimFunc`, prior to any `AddFunction` calls - Provide both `GlobalVar` and `PrimFunc` to `AddFunction` calls.

Updated CRT test to expect forward declaration

7c6b945

Provide forward declarations for call_extern in cmsis

fe82c5b

Updated expected ptx cuda

680f79c

Cast the AOT pools to the arg type

a983db5

Improved tvm::GetType for tvm_access_ptr and address_of

f7a7bd3

These `Call` instances can return a `PointerType(PrimType(pointee_dtype))` rather than a `PrimType(DataType::Handle())`.

Lunderberg force-pushed the codegen_c_support_subroutine_calls branch from 33bc277 to 02e9ddf Compare July 5, 2023 12:59

Lunderberg force-pushed the codegen_c_support_subroutine_calls branch from 02e9ddf to 2bf2313 Compare July 5, 2023 13:47

Lunderberg added 2 commits July 31, 2023 13:12

Merge branch 'main' into codegen_c_support_subroutine_calls_pr_15103

78dcf65

Updated unit tests with private=True

e77fa25

Required for internal functions after PR apache#15214

csullivan approved these changes Aug 4, 2023

View reviewed changes

Docstring updates from review

0b5de58

csullivan merged commit 9ff71f4 into apache:main Aug 8, 2023

Lunderberg deleted the codegen_c_support_subroutine_calls branch August 9, 2023 02:48

ekalda mentioned this pull request Aug 10, 2023

[TEST] Run tests/python/relay/aot tests in ci-cortexm #15519

Merged

junrushao added a commit that referenced this pull request Sep 12, 2023

Revert "[CodeGenC] Handle GlobalVar callee as internal function call (#…

2902c10

…15103)" This reverts commit 9ff71f4.

junrushao mentioned this pull request Sep 12, 2023

Revert "[CodeGenC] Handle GlobalVar callee as internal function call" #15725

Merged

junrushao added a commit that referenced this pull request Sep 12, 2023

Revert "[CodeGenC] Handle GlobalVar callee as internal function call (#…

cf1766c

…15103)" This reverts commit 9ff71f4.

junrushao added a commit that referenced this pull request Sep 12, 2023

Revert "[CodeGenC] Handle GlobalVar callee as internal function call (#…

45eb402

…15103)" This reverts commit 9ff71f4.

MasterJH5574 added a commit to MasterJH5574/tvm that referenced this pull request Sep 13, 2023

Revert "[CodeGenC] Handle GlobalVar callee as internal function call (a…

cf5b492

…pache#15103)" This reverts commit 9ff71f4.

Lunderberg mentioned this pull request Sep 15, 2023

[Unittest][Metal] Add minimal metal functionality test to CI #15756

Merged

Lunderberg mentioned this pull request Sep 27, 2023

[CodeGenC][Redo] Handle GlobalVar callee as internal function call #15835

Merged

ysh329 mentioned this pull request Oct 18, 2023

[Release] v0.14.0 Release Candidate Notes #15948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeGenC] Handle GlobalVar callee as internal function call #15103

[CodeGenC] Handle GlobalVar callee as internal function call #15103

Lunderberg commented Jun 14, 2023

tvm-bot commented Jun 14, 2023

csullivan left a comment

MasterJH5574 commented Sep 12, 2023

junrushao commented Sep 12, 2023

[CodeGenC] Handle GlobalVar callee as internal function call #15103

[CodeGenC] Handle GlobalVar callee as internal function call #15103

Conversation

Lunderberg commented Jun 14, 2023

tvm-bot commented Jun 14, 2023

csullivan left a comment

Choose a reason for hiding this comment

MasterJH5574 commented Sep 12, 2023

junrushao commented Sep 12, 2023