Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Unified device/target/memory scope planning #38

Closed
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.idea/
176 changes: 176 additions & 0 deletions rfcs/00xx-improved-multi-target-handling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
- Feature Name: improved-multi-target-handling
- Start Date: 2021-09-20
- RFC PR: [apache/tvm-rfcs#0000](https://github.com/apache/tvm-rfcs/pull/0000)
- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)

# Summary
[summary]: #summary

TVM supports 'hetrogeneous' execution, whereby primitive operators may be (sequentially) evaluated on more than
one device (GPU, CPU, accelerator, etc). For the non-BYOC flow this works as follows:
1. Relay programs may contain "on_device" annotations which specify that a sub-expressions's result should
reside on a device with a given `DLDeviceType` (kDLCPU, kDLCUDA, etc).
2. The device planning pass uses those annotations to decide on the unique device for every Relay sub-expression,
including every primitive operator call. Sub-expressions which are unconstrained are assigned to the 'default'
device. The pass then inserts "device_copy" operators whenever tensors need to cross device boundaries.
3. The user/driver must also supply a list of `Target` objects. The compiler uses that list to build a `TargetMap`
from `DLDeviceType` to `Target` for all of those objects.
4. Each call to a primitive operator for a particular `DLDeviceType` signals we need to compile ('lower') that
primitive for that device. The `Target` to use for that compilation is found from the `TargetMap`.

This approach has 5 problems:
mbs-octoml marked this conversation as resolved.
Show resolved Hide resolved
1. TVM is being targeted to environments with multiple CPUs (eg Arm 'Big.LITTLE') and multiple tensor-friendly
devices (eg a GPU as well as an accelerator such as Arm 'Ethos-U'). This means a `DLDeviceType` no longer
uniquely determines a `Target`.
2. Though TVM's `Device` abstraction (an alias for `dlpack`'s `DLDevice`) is a pair of a `DLDeviceType` and an
arbitrary 'device id', TVM does not consistently plumb the device id through annotations, passes and operators.
Thus currently we cannot use 'device id' to distinguish, eg, two CPUs in the same system.
3. The codebase still uses an older `target` and `target_host` convention for distinguishing the main `Target` for
primitive operators from the `Target` for residual tensor computation, shape computation, and (for AOT) the
overall Relay control-flow. There's a fair bit of 'target normalization' scattered throughout the codebase to
deal with these different conventions.
4. `Target`s are often manufactured on-the-fly (eg to represent the default 'CPU' target on which shape computations
should be hosted). However there's no guarantee those default `Target`s will match up with the user-supplied
`Target`s, thus it's possible to end up with `"llvm"` and `"llvm -m ..."` `Targets` coexisting. Now that
`IRModule` uses `Target` objects themselves to distinguish which `PrimFunc`s are intended for which targets,
it is particularly important to ensure there's a single source of truth for available `Target`s.
5. TVM also supports a 'BYOC' extension mechanism. This allows `"target.<target name>"` annotations to be placed on
primitive operations to indicate they should possibly be compiled with the matching BYOC toolchain. A target
annotation pass uses those annotations to decide on a target name for every Relay sub-expression. A partition graph
pass then inserts function call boundaries whenever execution needs to cross target boundaries. However this
machinery is separate from and incompatible with the "on_device" mechanism, and 'target names' are a separate
concept from `Target` objects.

In this RFC we tackle problems 1-4. We won't directly take on 5 since it involves more moving parts, but our hope
is for this RFC to clear the way to taking on 5 in the future.

Our proposal is:
1. Extend `Target` to have a `DLDeviceType` attribute.
mbs-octoml marked this conversation as resolved.
Show resolved Hide resolved
2. Allow `Target` objects to be registered under a globally unique target label. Registration may be 'static' (ie
mbs-octoml marked this conversation as resolved.
Show resolved Hide resolved
built into the TVM compiler via another REGISTER macro) and 'dynamic' (ie injected for a particular run of the
compiler, eg as part of `tvmc` command line processing). (This machinery should be reconciled with the existing
CUDA-specific target registration map.)
3. Change the "on_device" call attributes to use a string instead of an integers (ie `DLDeviceType`). The string
can be of the form `<target label>` or `<target label>:<device id>`. The former simply implies a device id of 0.
4. Rework device planning to use a pair of `Target` and 'device id' instead of `DLDeviceType`:
```
class TargetDevice {
public:
Target target;
int device_id;
}
```
(We could also use a `Device` and accept the redundant `DLDeviceType` specification.) It is trivial
to go from an "on_device" label to a `TargetDevice` and back using the global `Target` registry.
5. Remove all uses of `TargetMap`. For example, in `LowerTEPass` we simply use the `TargetDevice` associated with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you propose any replacement in case we do need a map-like struct? Map<target_label, Target>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this from the RFC. In #9313 you'll see I kept TargetMap but introduces a helper class to hid it. Over time I think we can replace TargetMap with just Array, but I feel it's not worth getting specific about that in an RFC and is more just a cleanup task. It may well come out of @Mousius ' work on tvmc target specification cleanup.

every primitive operator call already found by device planning.
6. Bind two `TargetDevice`s as attributes on every `IRModule`:
- The default for primitive operators not otherwise constrained by "on_device" annotations.
mbs-octoml marked this conversation as resolved.
Show resolved Hide resolved
- The default for non primitive operators, such as Relay control flow and shape computation.
7. We remove the various copies of target/target_host reconciliation, `TargetMap`
construction and 'default/fallback' device calculation from the codebase.

This proposal tackles the original problems:
1. There's now no ambiguity about `Targets` since we propagate them from the global registry directly.
2. We support device ids.
mbs-octoml marked this conversation as resolved.
Show resolved Hide resolved
3. We always know the `Target` for every sub-expression and don't need to pass around the `target` and
`target host` separately.
4. `Targets` are never created on the fly, they are first registered then propagated.
5. The global registration implied by the existing BYOC target names is now more similar to how the mainline
`Target`s are handled.


-------- rest still in template form --------

# Motivation
[motivation]: #motivation

Why are we doing this? What use cases does it support? What is the expected outcome?

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Explain the proposal as if it was already included in the language and you were teaching it to a TVM user.

That generally means:

- Introducing new named concepts.
- Explaining what the feature enables (hint: think in terms of examples).
- If applicable, provide sample error messages, deprecation warnings, or migration guidance.

For internal RFCs (e.g. for compiler internals), this section should focus on how core contributors s
hould think about the change, and give examples of its concrete impact.

For policy RFCs, this section should provide an example-driven introduction to the policy,
and explain its impact in concrete terms.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

This is the technical portion of the RFC. Explain the design in sufficient detail that:

- Its interaction with other features is clear.
- It is reasonably clear how the feature would be implemented.
- Corner cases are dissected by example.

The section should return to the examples given in the previous section,
and explain more fully how the detailed proposal makes those examples work.

# Drawbacks
[drawbacks]: #drawbacks

Why should we *not* do this?

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

- Why is this design the best in the space of possible designs?
- What other designs have been considered and what is the rationale for not choosing them?
- What is the impact of not doing this?

# Prior art
[prior-art]: #prior-art

Discuss prior art, both the good and the bad, in relation to this proposal.
A few examples of what this can include are:

- Does this feature exist in other ML compilers or languages and discuss the experince their community has had?
- For community proposals: Is this done by some other community and what were their experiences with it?
- For other teams: What lessons can we learn from what other communities have done here?
- Papers: Are there any published papers or great posts that discuss this?
If you have some relevant papers to refer to, this can serve as a more detailed theoretical background.

If there is no prior art, that is fine - your ideas are interesting to us whether they are
brand new or if it is an adaptation from other languages.

Note that while precedent set by other languages is some motivation, it does not on its own motivate an RFC.
Please also take into consideration that TVM intentionally diverges from other compilers.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- What parts of the design do you expect to resolve through the RFC process before this gets merged?
- What parts of the design do you expect to resolve through the implementation of this feature before stabilization?
- What related issues do you consider out of scope for this RFC that could be addressed in the future
independently of the solution that comes out of this RFC?

# Future possibilities
[future-possibilities]: #future-possibilities

Think about what the natural extension and evolution of your proposal would
be and how it would affect the language and project as a whole in a holistic
way. Try to use this section as a tool to more fully consider all possible
interactions with the project and language in your proposal.
Also consider how this all fits into the roadmap for the project
and of the relevant sub-team.

This is also a good place to "dump ideas", if they are out of scope for the
RFC you are writing but otherwise related.

If you have tried and cannot think of any future possibilities,
you may simply state that you cannot think of anything.

Note that having something written down in the future-possibilities section
is not a reason to accept the current or a future RFC; such notes should be
in the section on motivation or rationale in this or subsequent RFCs.
The section merely provides additional information.