Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Driver] Single-module lowering flow in driver_api.cc #14985

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Lunderberg
Copy link
Contributor

Prior to this commit, a build that used multiple targets needed to provide tvm::build with a Map<Target, IRModule> specifying which target should be used to compile each IRModule. As a result, lowering passes could not introduce new targets based on a PrimFunc's content (e.g. a with T.target() frame to delegate out to another device), nor simplify based on cross-device subroutines (e.g. simplify a host-side conditional based on the known output of a device-side internal subroutine).

This commit makes the tvm::attr::kTarget attribute ("target") be the single source of truth for where a PrimFunc will be executed. Other existing methods for specifying the target (the target parameter for tvm.build, the keys in a Map<Target,IRModule>, the parameter to the pass tir::transform::BindTarget) are still accepted as inputs, and may provide a default value for tvm::attr::kTarget if the attribute is missing, but may not overwrite the target attribute.

This is part of a series of commits to simplify the handling of multi-target builds.

@tvm-bot
Copy link
Collaborator

tvm-bot commented May 30, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

  • No users to tag found in teams: driver See #10317 for details

Generated by tvm-bot

@Lunderberg
Copy link
Contributor Author

This PR is a subset of the changes introduced in #14862, split out for ease of testing and reviewing. It is currently a draft PR, as it depends on changes made in #14944, and will be rebased onto main after #14944 lands.

@Lunderberg Lunderberg force-pushed the single_module_during_build branch from e459552 to e7bfa2d Compare May 30, 2023 17:53
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request May 30, 2023
The functionality tested in this commit was added across several
recent PRs, each of which tested their features in isolation.  This PR
adds unit tests to validate the end-to-end behavior of TIR subroutine
calls.

PRs building up to this point:

- TVMScript
  - apache#14889
  - apache#14915
  - apache#14919
  - apache#14941

- Functionality improvements of existing TIR passes
  - apache#14913
  - apache#14914
  - apache#14918
  - apache#14951

- Changes to the TIR lowering flow
  - apache#14942
  - apache#14985

- Codegen updates
  - apache#14958
  - apache#14901

- Compatibility updates/fixes
  - apache#14892
  - apache#14950
  - apache#14943
  - apache#14944
  - apache#14945
  - apache#14952
  - apache#14982
  - apache#14949
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from af1628f to 870ba06 Compare June 3, 2023 19:49
@Lunderberg Lunderberg marked this pull request as ready for review June 3, 2023 19:49
@Lunderberg
Copy link
Contributor Author

Rebased onto main now that #14944 has landed, marked as ready for review.

@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 870ba06 to 5286508 Compare June 5, 2023 03:03
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 5286508 to 3a3329f Compare June 6, 2023 14:22
@Lunderberg Lunderberg marked this pull request as draft June 6, 2023 14:23
@Lunderberg
Copy link
Contributor Author

Converting to draft, as the fixes for the last few CI failures depend on changes in #14986, and so this PR should wait until #14986 has landed.

@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 3a3329f to 4c2dd50 Compare June 10, 2023 23:22
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jun 10, 2023
The functionality tested in this commit was added across several
recent PRs, each of which tested their features in isolation.  This PR
adds unit tests to validate the end-to-end behavior of TIR subroutine
calls.

PRs building up to this point:

- TVMScript
  - apache#14889
  - apache#14915
  - apache#14919
  - apache#14941

- Functionality improvements of existing TIR passes
  - apache#14913
  - apache#14914
  - apache#14918
  - apache#14951

- Changes to the TIR lowering flow
  - apache#14942
  - apache#14985

- Codegen updates
  - apache#14958
  - apache#14901

- Compatibility updates/fixes
  - apache#14892
  - apache#14950
  - apache#14943
  - apache#14944
  - apache#14945
  - apache#14952
  - apache#14982
  - apache#14949
@Lunderberg Lunderberg force-pushed the single_module_during_build branch 5 times, most recently from 99e7073 to 03c39ef Compare June 13, 2023 20:46
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 1d8454e to cc7d384 Compare June 16, 2023 19:40
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jun 16, 2023
The functionality tested in this commit was added across several
recent PRs, each of which tested their features in isolation.  This PR
adds unit tests to validate the end-to-end behavior of TIR subroutine
calls.

PRs building up to this point:

- TVMScript
  - apache#14889
  - apache#14915
  - apache#14919
  - apache#14941

- Functionality improvements of existing TIR passes
  - apache#14913
  - apache#14914
  - apache#14918
  - apache#14951

- Changes to the TIR lowering flow
  - apache#14942
  - apache#14985

- Codegen updates
  - apache#14958
  - apache#14901

- Compatibility updates/fixes
  - apache#14892
  - apache#14950
  - apache#14943
  - apache#14944
  - apache#14945
  - apache#14952
  - apache#14982
  - apache#14949
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jun 16, 2023
The functionality tested in this commit was added across several
recent PRs, each of which tested their features in isolation.  This PR
adds unit tests to validate the end-to-end behavior of TIR subroutine
calls.

PRs building up to this point:

- TVMScript
  - apache#14889
  - apache#14915
  - apache#14919
  - apache#14941

- Functionality improvements of existing TIR passes
  - apache#14913
  - apache#14914
  - apache#14918
  - apache#14951

- Changes to the TIR lowering flow
  - apache#14942
  - apache#14985

- Codegen updates
  - apache#14958
  - apache#14901

- Compatibility updates/fixes
  - apache#14892
  - apache#14950
  - apache#14943
  - apache#14944
  - apache#14945
  - apache#14952
  - apache#14982
  - apache#14949
@Lunderberg Lunderberg marked this pull request as ready for review June 16, 2023 21:20
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jun 21, 2023
The functionality tested in this commit was added across several
recent PRs, each of which tested their features in isolation.  This PR
adds unit tests to validate the end-to-end behavior of TIR subroutine
calls.

PRs building up to this point:

- TVMScript
  - apache#14889
  - apache#14915
  - apache#14919
  - apache#14941

- Functionality improvements of existing TIR passes
  - apache#14913
  - apache#14914
  - apache#14918
  - apache#14951

- Changes to the TIR lowering flow
  - apache#14942
  - apache#14985

- Codegen updates
  - apache#14958
  - apache#14901

- Compatibility updates/fixes
  - apache#14892
  - apache#14950
  - apache#14943
  - apache#14944
  - apache#14945
  - apache#14952
  - apache#14982
  - apache#14949
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 2c850d6 to 93fde73 Compare June 21, 2023 22:09
@Lunderberg Lunderberg force-pushed the single_module_during_build branch 2 times, most recently from 27153ac to f20f78c Compare July 3, 2023 16:28
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jul 3, 2023
The functionality tested in this commit was added across several
recent PRs, each of which tested their features in isolation.  This PR
adds unit tests to validate the end-to-end behavior of TIR subroutine
calls.

PRs building up to this point:

- TVMScript
  - apache#14889
  - apache#14915
  - apache#14919
  - apache#14941

- Functionality improvements of existing TIR passes
  - apache#14913
  - apache#14914
  - apache#14918
  - apache#14951

- Changes to the TIR lowering flow
  - apache#14942
  - apache#14985

- Codegen updates
  - apache#14958
  - apache#14901

- Compatibility updates/fixes
  - apache#14892
  - apache#14950
  - apache#14943
  - apache#14944
  - apache#14945
  - apache#14952
  - apache#14982
  - apache#14949
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from f20f78c to b50ccd5 Compare July 4, 2023 14:31
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from af2964d to 2296be8 Compare July 6, 2023 13:11
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jul 6, 2023
The functionality tested in this commit was added across several
recent PRs, each of which tested their features in isolation.  This PR
adds unit tests to validate the end-to-end behavior of TIR subroutine
calls.

PRs building up to this point:

- TVMScript
  - apache#14889
  - apache#14915
  - apache#14919
  - apache#14941

- Functionality improvements of existing TIR passes
  - apache#14913
  - apache#14914
  - apache#14918
  - apache#14951

- Changes to the TIR lowering flow
  - apache#14942
  - apache#14985

- Codegen updates
  - apache#14958
  - apache#14901

- Compatibility updates/fixes
  - apache#14892
  - apache#14950
  - apache#14943
  - apache#14944
  - apache#14945
  - apache#14952
  - apache#14982
  - apache#14949
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jul 7, 2023
The functionality tested in this commit was added across several
recent PRs, each of which tested their features in isolation.  This PR
adds unit tests to validate the end-to-end behavior of TIR subroutine
calls.

PRs building up to this point:

- TVMScript
  - apache#14889
  - apache#14915
  - apache#14919
  - apache#14941

- Functionality improvements of existing TIR passes
  - apache#14913
  - apache#14914
  - apache#14918
  - apache#14951

- Changes to the TIR lowering flow
  - apache#14942
  - apache#14985

- Codegen updates
  - apache#14958
  - apache#14901

- Compatibility updates/fixes
  - apache#14892
  - apache#14950
  - apache#14943
  - apache#14944
  - apache#14945
  - apache#14952
  - apache#14982
  - apache#14949
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 2b7c0af to 3e493d5 Compare July 12, 2023 14:05
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Jul 12, 2023
Prior to this commit, if device initialization is required, the AOT
main function produced a `call_extern()` that included the device
context as input.  This commit updates the AOT main function to
provide the device context only if the function being called accepts a
device context as input.

If an extra device context argument is included at the call site, the
C codegen would produce a function signature that includes the device
context for the caller's compilation unit, but a signature without the
device context for the callee's compilation unit.  While this can
compile and run in some cases, it is undefined behavior for the
signature to vary between compilation units, and should be avoided.

This was initially discovered while debugging
apache#14985, in which changes to the
lowering flow resulted in the caller and callee being within the same
compilation unit.
tqchen pushed a commit that referenced this pull request Jul 18, 2023
Prior to this commit, if device initialization is required, the AOT
main function produced a `call_extern()` that included the device
context as input.  This commit updates the AOT main function to
provide the device context only if the function being called accepts a
device context as input.

If an extra device context argument is included at the call site, the
C codegen would produce a function signature that includes the device
context for the caller's compilation unit, but a signature without the
device context for the callee's compilation unit.  While this can
compile and run in some cases, it is undefined behavior for the
signature to vary between compilation units, and should be avoided.

This was initially discovered while debugging
#14985, in which changes to the
lowering flow resulted in the caller and callee being within the same
compilation unit.
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 24, 2023
Prior to this commit, if device initialization is required, the AOT
main function produced a `call_extern()` that included the device
context as input.  This commit updates the AOT main function to
provide the device context only if the function being called accepts a
device context as input.

If an extra device context argument is included at the call site, the
C codegen would produce a function signature that includes the device
context for the caller's compilation unit, but a signature without the
device context for the callee's compilation unit.  While this can
compile and run in some cases, it is undefined behavior for the
signature to vary between compilation units, and should be avoided.

This was initially discovered while debugging
apache#14985, in which changes to the
lowering flow resulted in the caller and callee being within the same
compilation unit.
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 27, 2023
Prior to this commit, if device initialization is required, the AOT
main function produced a `call_extern()` that included the device
context as input.  This commit updates the AOT main function to
provide the device context only if the function being called accepts a
device context as input.

If an extra device context argument is included at the call site, the
C codegen would produce a function signature that includes the device
context for the caller's compilation unit, but a signature without the
device context for the callee's compilation unit.  While this can
compile and run in some cases, it is undefined behavior for the
signature to vary between compilation units, and should be avoided.

This was initially discovered while debugging
apache#14985, in which changes to the
lowering flow resulted in the caller and callee being within the same
compilation unit.
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jul 30, 2023
Prior to this commit, if device initialization is required, the AOT
main function produced a `call_extern()` that included the device
context as input.  This commit updates the AOT main function to
provide the device context only if the function being called accepts a
device context as input.

If an extra device context argument is included at the call site, the
C codegen would produce a function signature that includes the device
context for the caller's compilation unit, but a signature without the
device context for the callee's compilation unit.  While this can
compile and run in some cases, it is undefined behavior for the
signature to vary between compilation units, and should be avoided.

This was initially discovered while debugging
apache#14985, in which changes to the
lowering flow resulted in the caller and callee being within the same
compilation unit.
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 3e493d5 to 5424a49 Compare July 31, 2023 18:17
csullivan pushed a commit that referenced this pull request Aug 8, 2023
Analogous to #14901, treat GlobalVar callees as internal function calls in CodeGenC. This specific PR doesn't provide new end-to-end functionality, as the target="c" backend isn't compiled. It does lead into allowing subroutines in any target whose codegen derives from CodeGenC, which will depend on the single-module lowering flow in #14985.

* [CodeGenC] Added unit tests for desired behavior

* [CodeGenC] Handle GlobalVar callee as internal function call

* Update CodeGenC subclasses for updated interface

- Call `DeclareFunction` for each `PrimFunc`, prior to any
  `AddFunction` calls

- Provide both `GlobalVar` and `PrimFunc` to `AddFunction` calls.

* Updated CRT test to expect forward declaration

* Provide forward declarations for call_extern in cmsis

* Avoid duplicate forward declaration

C's automatic pointer cast (e.g. `void*` to `int*`) means that use of
the arguments to infer the function signature may be incorrect.  If a
`call_extern` refers to a function within the same module, only output
a single forward declaration based on the PrimFunc's parameters, not
based on the CallNode's arguments.

* Updated expected ptx cuda

* Cast the AOT pools to the arg type

* Improved tvm::GetType for tvm_access_ptr and address_of

These `Call` instances can return a
`PointerType(PrimType(pointee_dtype))` rather than a
`PrimType(DataType::Handle())`.

* [ARM][Topi] Update micro kernels to use same argument type as caller

Previously, the micro kernels for gemm, avg_pool, max_pool, and
tensordot relied on C's implicit type conversions for the arguments,
when the caller's argument types differ from the signature's parameter
types.  This works, except when the codegen has auto-generated a
forward declaration based on the caller's argument types, such as
during AOT, which then causes a conflicting definition.

Since the codegen cannot determine the functions names from the
`"pragma_import_c"` in order to suppress these forward declarations,
this conflict can be more easily resolved by updating the micro kernel
signatures.  The three types of mismatches are below.

- Use of `int` or `long` parameters, whose width may vary by compiler,
  instead of fixed-width types.

- TIR expecting the data array's integer type to also be used as an
  error code's return type, rather than the micro kernels' `int32_t`
  error code.

- Pointer conversion done during argument conversion.

Type conversions are done at the start of each micro kernel, to avoid
changing types that are used within the computational sections of each
micro kernel.

* Updated unit tests with private=True

Required for internal functions after PR #15214

* Docstring updates from review
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 5424a49 to 85b1d5e Compare August 8, 2023 21:28
Lunderberg added a commit to Lunderberg/tvm that referenced this pull request Aug 8, 2023
The functionality tested in this commit was added across several
recent PRs, each of which tested their features in isolation.  This PR
adds unit tests to validate the end-to-end behavior of TIR subroutine
calls.

PRs building up to this point:

- TVMScript
  - apache#14889
  - apache#14915
  - apache#14919
  - apache#14941

- Functionality improvements of existing TIR passes
  - apache#14913
  - apache#14914
  - apache#14918
  - apache#14951

- Changes to the TIR lowering flow
  - apache#14942
  - apache#14985

- Codegen updates
  - apache#14958
  - apache#14901

- Compatibility updates/fixes
  - apache#14892
  - apache#14950
  - apache#14943
  - apache#14944
  - apache#14945
  - apache#14952
  - apache#14982
  - apache#14949
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 85b1d5e to 1124239 Compare January 3, 2024 17:31
@Lunderberg
Copy link
Contributor Author

With #16184 landed, this PR should (hopefully) be able to land without requiring all low-level codegen to directly support function calls. For codegens that do not yet support function calls, device-side kernels can have subroutines inlined to avoid a recurrence of #16033.

Currently, this PR is rebased on top of main, but does not use of InlinePrivateFunctions. I expect it to fail in CI for Metal codegen, and to have the same ethos-u failure modes as previously. These should be resolvable by applying InlinePrivateFunctions, conditional on the target being used.

@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 1124239 to 38e3ac9 Compare February 16, 2024 03:08
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 38e3ac9 to 485bd3e Compare March 25, 2024 20:00
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 485bd3e to 136a459 Compare April 5, 2024 00:11
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from b3ff7c4 to 04ccfef Compare June 10, 2024 19:13
Prior to this commit, a build that used multiple targets needed to
provide `tvm::build` with a `Map<Target, IRModule>` specifying which
target should be used to compile each `IRModule`.  As a result,
lowering passes could not introduce new targets based on a PrimFunc's
content (e.g. a `with T.target()` frame to delegate out to another
device), nor simplify based on cross-device subroutines (e.g. simplify
a host-side conditional based on the known output of a device-side
internal subroutine).

This commit makes the `tvm::attr::kTarget` attribute (`"target"`) be
the single source of truth for where a `PrimFunc` will be executed.
Other existing methods for specifying the target (the `target`
parameter for `tvm.build`, the keys in a `Map<Target,IRModule>`, the
parameter to the pass `tir::transform::BindTarget`) are still accepted
as inputs, and may provide a default value for `tvm::attr::kTarget` if
the attribute is missing, but may not overwrite the target attribute.

This is part of a series of commits to simplify the handling of
multi-target builds.
@Lunderberg Lunderberg force-pushed the single_module_during_build branch from 04ccfef to e7bebaf Compare September 11, 2024 15:37
@Lunderberg
Copy link
Contributor Author

The failing unit tests here are blocked by #15757. The InlinePrivateFunctions transform requires all buffers to be declared before use. However, the DeclBuffer nodes are currently stripped out during FlattenBuffer, and are no longer available when required after MakePackedAPI.

If a flattened buffer is produced for use in `BufferLoad` and
`BufferStore` statements, generate a `DeclBuffer`.

This is a subset of the changes made in
apache#14778, broken out for ease of
testing and review.
When producing a flattened buffer for use in `BufferLoad` and
`BufferStore` nodes, generate a `DeclBuffer` for the flattened buffer.

This is a subset of the changes made in
apache#14778, broken out for ease of
testing and review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants