-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Unified device/target/memory scope planning #38
Conversation
Hi @mbs-octoml , I may have put a related comment here : apache/tvm#8892 (comment) Is this the RFC intended to cover these all ? |
Thanks @manupa-arm for the reminder there were some good comments on #8892. I see a work stream:
Here I want to just focus on 1 -- everything beyond that really needs face-to-face discussion. From 2 onward obviously overlaps your USMP. My vague thought was we can work from opposite ends and reconcile at 5. Ie 2-4 sets us up to work in a combined Relay+TIR world, then 5 is where everything we've learned from USMP could perhaps be replayed. Anyway, that's just a vague thought so I'd love to talk more about it. |
Note to self: The With convention should probably also be removed by this work also, but I've not audited the code to see how pervasive it is. Target already has a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbs-octoml thanks for the draft RFC! added some thoughts
``` | ||
(We could also use a `Device` and accept the redundant `DLDeviceType` specification.) It is trivial | ||
to go from an "on_device" label to a `TargetDevice` and back using the global `Target` registry. | ||
5. Remove all uses of `TargetMap`. For example, in `LowerTEPass` we simply use the `TargetDevice` associated with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you propose any replacement in case we do need a map-like struct? Map<target_label, Target>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this from the RFC. In #9313 you'll see I kept TargetMap but introduces a helper class to hid it. Over time I think we can replace TargetMap with just Array, but I feel it's not worth getting specific about that in an RFC and is more just a cleanup task. It may well come out of @Mousius ' work on tvmc target specification cleanup.
More notes to self:
|
…g in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
…g in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
…g in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
…g in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
Thanks for the comments @areusch and @manupa-arm . Now that I've started working on this (with an emphasis on handling memory scopes) I've decided to shift focus a bit. In particular Manupa I'm now much more motivated to tackle the BYOC/device planning overlap aspect, which I think you're particularly interested in. PTAL. |
…g in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
…g in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
…g in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
For the proposed BYOC flow (i.e., |
…g in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
…g in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
…g in 'device' planning. (#9313) [Target] Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form. * Reworked to avoid global SEScopeCache. Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. So instead placed the cache in the CompilationConfig class. Since that class now has everything the device planner needs to do its job, promoted it to be an FFI-able Object, which is now in compilation_config.{h,cc}. I think we can do much better with CompilationConfig, but for now keeping it to the minimum I needed to prepare for device planning from all the executor compilation codepaths.
[checkpoint] rebase [checkpoint] fix merge [checkpoint] lint [checkpoint] rebase [checkpoint] Fixed stray use of kDLCPU in vm/profiler/vm.cc [checkpoint] lint trivia [checkpoint] fix unit tests [checkpoint] device planner unit tests passing again [checkpoint] Switch over to new CompilerOptions [checkpoint] include [checkpoint] Almost working again Need to move the SEScopeCache into CompilationConfig and pass that into DeviceDomains instead of just the Vector<Target>. Then the host_se_scope can be memoized so that direct uses of that scope downstream will match up with se_scopes already established by PlanDevices. Sigh. [checkpoint] Use cache in device domains. [checkpoint] more moves [checkpoint] lints [checkpoint] Fix merge with VM profiling changes. [checkpoint] trivial [checkpoint] rebase fix [checkpoint] More unit tests. Getting ready to fork out SEScope changes alone. [checkpoint] lints [checkpoint] All plan devices unit tests pass [checkpoint] First unit test passes [checkpoint] Another go at target management This at least centralizes all the hackery. Compiles. [commit] Start to rollback resolving to target in planner. Better is to do it as stand alone pass I think. Besides it doesn't work with the structural test for expected output. [checkpoint] Almost have first unit test going. About to merge Michalis' changes. target_host is still a mess. Starting to eliminate target_map. [checkpoint] Cleanup VM device matching [checkpoint] Compiles [checkpoint] First sweep replacing DLDeviceType with SEScope VM still not done. [checkpoint] Expose CompilationConfig ctor in py [checkpoint] CompilationConfig is nullable for default ctor [checkpoint] Don't use target:: namespace [checkpoint] Promote CompilationConfig to be FFI-friendly Object Also rework to never mix the host_target into the 'primitive' targets. [checkpoint] ResolveSEScope on CompilationConfig [checkpoint] hash_reduce using target's data ptr [checkpoint] Share FullyUnconstrained [checkpoint] Backtrack on using global memoization for SEScope Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. I'll instead tackle memoization of SEScopes directly in device_domains.cc. [checkpoint] Improve back compat for homogeneous case If no host target is given but we have a unique target of kDLCPU device type then also use that for the host. Reworked to avoid global SEScopeCache. Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. So instead placed the cache in the CompilationConfig class. Since that class now has everything the device planner needs to do its job, promoted it to be an FFI-able Object, which is now in compilation_config.{h,cc}. I think we can do much better with CompilationConfig, but for now keeping it to the minimum I needed to prepare for device planning from all the executor compilation codepaths. Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
[checkpoint] rebase [checkpoint] fix merge [checkpoint] lint [checkpoint] rebase [checkpoint] Fixed stray use of kDLCPU in vm/profiler/vm.cc [checkpoint] lint trivia [checkpoint] fix unit tests [checkpoint] device planner unit tests passing again [checkpoint] Switch over to new CompilerOptions [checkpoint] include [checkpoint] Almost working again Need to move the SEScopeCache into CompilationConfig and pass that into DeviceDomains instead of just the Vector<Target>. Then the host_se_scope can be memoized so that direct uses of that scope downstream will match up with se_scopes already established by PlanDevices. Sigh. [checkpoint] Use cache in device domains. [checkpoint] more moves [checkpoint] lints [checkpoint] Fix merge with VM profiling changes. [checkpoint] trivial [checkpoint] rebase fix [checkpoint] More unit tests. Getting ready to fork out SEScope changes alone. [checkpoint] lints [checkpoint] All plan devices unit tests pass [checkpoint] First unit test passes [checkpoint] Another go at target management This at least centralizes all the hackery. Compiles. [commit] Start to rollback resolving to target in planner. Better is to do it as stand alone pass I think. Besides it doesn't work with the structural test for expected output. [checkpoint] Almost have first unit test going. About to merge Michalis' changes. target_host is still a mess. Starting to eliminate target_map. [checkpoint] Cleanup VM device matching [checkpoint] Compiles [checkpoint] First sweep replacing DLDeviceType with SEScope VM still not done. [checkpoint] Expose CompilationConfig ctor in py [checkpoint] CompilationConfig is nullable for default ctor [checkpoint] Don't use target:: namespace [checkpoint] Promote CompilationConfig to be FFI-friendly Object Also rework to never mix the host_target into the 'primitive' targets. [checkpoint] ResolveSEScope on CompilationConfig [checkpoint] hash_reduce using target's data ptr [checkpoint] Share FullyUnconstrained [checkpoint] Backtrack on using global memoization for SEScope Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. I'll instead tackle memoization of SEScopes directly in device_domains.cc. [checkpoint] Improve back compat for homogeneous case If no host target is given but we have a unique target of kDLCPU device type then also use that for the host. Reworked to avoid global SEScopeCache. Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. So instead placed the cache in the CompilationConfig class. Since that class now has everything the device planner needs to do its job, promoted it to be an FFI-able Object, which is now in compilation_config.{h,cc}. I think we can do much better with CompilationConfig, but for now keeping it to the minimum I needed to prepare for device planning from all the executor compilation codepaths. Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
[checkpoint] pretty printing fixes [checkpoint] Don't dup devices in executable, more unit tests [checkpoint] woops, left target str debug in Added Target::ToDebugString() so I can see the hosts since they were giving me a lot of trouble. [checkpoint] more pretty printing hackery, interpreter respects host devices Also try harder to integrate the existing target->host mechanism into CompilationConfig. [checkpoint] Almost working again - Unit test setup distinguishes CPU for prims from CPU for host. - Get pretty printing to use the SEScopeNode ReprPrinter. - Allow host and primitive to have same device types. test_dynamic_input failing [checkpoint] rebase [checkpoint] fix merge [checkpoint] lint [checkpoint] rebase [checkpoint] Fixed stray use of kDLCPU in vm/profiler/vm.cc [checkpoint] lint trivia [checkpoint] fix unit tests [checkpoint] device planner unit tests passing again [checkpoint] Switch over to new CompilerOptions [checkpoint] include [checkpoint] Almost working again Need to move the SEScopeCache into CompilationConfig and pass that into DeviceDomains instead of just the Vector<Target>. Then the host_se_scope can be memoized so that direct uses of that scope downstream will match up with se_scopes already established by PlanDevices. Sigh. [checkpoint] Use cache in device domains. [checkpoint] more moves [checkpoint] lints [checkpoint] Fix merge with VM profiling changes. [checkpoint] trivial [checkpoint] rebase fix [checkpoint] More unit tests. Getting ready to fork out SEScope changes alone. [checkpoint] lints [checkpoint] All plan devices unit tests pass [checkpoint] First unit test passes [checkpoint] Another go at target management This at least centralizes all the hackery. Compiles. [commit] Start to rollback resolving to target in planner. Better is to do it as stand alone pass I think. Besides it doesn't work with the structural test for expected output. [checkpoint] Almost have first unit test going. About to merge Michalis' changes. target_host is still a mess. Starting to eliminate target_map. [checkpoint] Cleanup VM device matching [checkpoint] Compiles [checkpoint] First sweep replacing DLDeviceType with SEScope VM still not done. [checkpoint] Expose CompilationConfig ctor in py [checkpoint] CompilationConfig is nullable for default ctor [checkpoint] Don't use target:: namespace [checkpoint] Promote CompilationConfig to be FFI-friendly Object Also rework to never mix the host_target into the 'primitive' targets. [checkpoint] ResolveSEScope on CompilationConfig [checkpoint] hash_reduce using target's data ptr [checkpoint] Share FullyUnconstrained [checkpoint] Backtrack on using global memoization for SEScope Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. I'll instead tackle memoization of SEScopes directly in device_domains.cc. [checkpoint] Improve back compat for homogeneous case If no host target is given but we have a unique target of kDLCPU device type then also use that for the host. Reworked to avoid global SEScopeCache. Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. So instead placed the cache in the CompilationConfig class. Since that class now has everything the device planner needs to do its job, promoted it to be an FFI-able Object, which is now in compilation_config.{h,cc}. I think we can do much better with CompilationConfig, but for now keeping it to the minimum I needed to prepare for device planning from all the executor compilation codepaths. Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
[checkpoint] bad rebase [checkpoint] pretty printing fixes [checkpoint] Don't dup devices in executable, more unit tests [checkpoint] woops, left target str debug in Added Target::ToDebugString() so I can see the hosts since they were giving me a lot of trouble. [checkpoint] more pretty printing hackery, interpreter respects host devices Also try harder to integrate the existing target->host mechanism into CompilationConfig. [checkpoint] Almost working again - Unit test setup distinguishes CPU for prims from CPU for host. - Get pretty printing to use the SEScopeNode ReprPrinter. - Allow host and primitive to have same device types. test_dynamic_input failing [checkpoint] rebase [checkpoint] fix merge [checkpoint] lint [checkpoint] rebase [checkpoint] Fixed stray use of kDLCPU in vm/profiler/vm.cc [checkpoint] lint trivia [checkpoint] fix unit tests [checkpoint] device planner unit tests passing again [checkpoint] Switch over to new CompilerOptions [checkpoint] include [checkpoint] Almost working again Need to move the SEScopeCache into CompilationConfig and pass that into DeviceDomains instead of just the Vector<Target>. Then the host_se_scope can be memoized so that direct uses of that scope downstream will match up with se_scopes already established by PlanDevices. Sigh. [checkpoint] Use cache in device domains. [checkpoint] more moves [checkpoint] lints [checkpoint] Fix merge with VM profiling changes. [checkpoint] trivial [checkpoint] rebase fix [checkpoint] More unit tests. Getting ready to fork out SEScope changes alone. [checkpoint] lints [checkpoint] All plan devices unit tests pass [checkpoint] First unit test passes [checkpoint] Another go at target management This at least centralizes all the hackery. Compiles. [commit] Start to rollback resolving to target in planner. Better is to do it as stand alone pass I think. Besides it doesn't work with the structural test for expected output. [checkpoint] Almost have first unit test going. About to merge Michalis' changes. target_host is still a mess. Starting to eliminate target_map. [checkpoint] Cleanup VM device matching [checkpoint] Compiles [checkpoint] First sweep replacing DLDeviceType with SEScope VM still not done. [checkpoint] Expose CompilationConfig ctor in py [checkpoint] CompilationConfig is nullable for default ctor [checkpoint] Don't use target:: namespace [checkpoint] Promote CompilationConfig to be FFI-friendly Object Also rework to never mix the host_target into the 'primitive' targets. [checkpoint] ResolveSEScope on CompilationConfig [checkpoint] hash_reduce using target's data ptr [checkpoint] Share FullyUnconstrained [checkpoint] Backtrack on using global memoization for SEScope Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. I'll instead tackle memoization of SEScopes directly in device_domains.cc. [checkpoint] Improve back compat for homogeneous case If no host target is given but we have a unique target of kDLCPU device type then also use that for the host. Reworked to avoid global SEScopeCache. Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. So instead placed the cache in the CompilationConfig class. Since that class now has everything the device planner needs to do its job, promoted it to be an FFI-able Object, which is now in compilation_config.{h,cc}. I think we can do much better with CompilationConfig, but for now keeping it to the minimum I needed to prepare for device planning from all the executor compilation codepaths. Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
[checkpoint] bad rebase [checkpoint] pretty printing fixes [checkpoint] Don't dup devices in executable, more unit tests [checkpoint] woops, left target str debug in Added Target::ToDebugString() so I can see the hosts since they were giving me a lot of trouble. [checkpoint] more pretty printing hackery, interpreter respects host devices Also try harder to integrate the existing target->host mechanism into CompilationConfig. [checkpoint] Almost working again - Unit test setup distinguishes CPU for prims from CPU for host. - Get pretty printing to use the SEScopeNode ReprPrinter. - Allow host and primitive to have same device types. test_dynamic_input failing [checkpoint] rebase [checkpoint] fix merge [checkpoint] lint [checkpoint] rebase [checkpoint] Fixed stray use of kDLCPU in vm/profiler/vm.cc [checkpoint] lint trivia [checkpoint] fix unit tests [checkpoint] device planner unit tests passing again [checkpoint] Switch over to new CompilerOptions [checkpoint] include [checkpoint] Almost working again Need to move the SEScopeCache into CompilationConfig and pass that into DeviceDomains instead of just the Vector<Target>. Then the host_se_scope can be memoized so that direct uses of that scope downstream will match up with se_scopes already established by PlanDevices. Sigh. [checkpoint] Use cache in device domains. [checkpoint] more moves [checkpoint] lints [checkpoint] Fix merge with VM profiling changes. [checkpoint] trivial [checkpoint] rebase fix [checkpoint] More unit tests. Getting ready to fork out SEScope changes alone. [checkpoint] lints [checkpoint] All plan devices unit tests pass [checkpoint] First unit test passes [checkpoint] Another go at target management This at least centralizes all the hackery. Compiles. [commit] Start to rollback resolving to target in planner. Better is to do it as stand alone pass I think. Besides it doesn't work with the structural test for expected output. [checkpoint] Almost have first unit test going. About to merge Michalis' changes. target_host is still a mess. Starting to eliminate target_map. [checkpoint] Cleanup VM device matching [checkpoint] Compiles [checkpoint] First sweep replacing DLDeviceType with SEScope VM still not done. [checkpoint] Expose CompilationConfig ctor in py [checkpoint] CompilationConfig is nullable for default ctor [checkpoint] Don't use target:: namespace [checkpoint] Promote CompilationConfig to be FFI-friendly Object Also rework to never mix the host_target into the 'primitive' targets. [checkpoint] ResolveSEScope on CompilationConfig [checkpoint] hash_reduce using target's data ptr [checkpoint] Share FullyUnconstrained [checkpoint] Backtrack on using global memoization for SEScope Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. I'll instead tackle memoization of SEScopes directly in device_domains.cc. [checkpoint] Improve back compat for homogeneous case If no host target is given but we have a unique target of kDLCPU device type then also use that for the host. Reworked to avoid global SEScopeCache. Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. So instead placed the cache in the CompilationConfig class. Since that class now has everything the device planner needs to do its job, promoted it to be an FFI-able Object, which is now in compilation_config.{h,cc}. I think we can do much better with CompilationConfig, but for now keeping it to the minimum I needed to prepare for device planning from all the executor compilation codepaths. Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbs-octoml some comments/clarifications, can you also remove/address the boilerplate?
# Summary | ||
[summary]: #summary | ||
|
||
TVM supports 'hetrogeneous' execution, whereby primitive operators may be (sequentially) evaluated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: sequentially is a bit misleading--maybe suggest
TVM supports 'hetrogeneous' execution, whereby primitive operators may be (sequentially) evaluated | |
TVM supports 'hetrogeneous' execution, whereby primitive operators may be evaluated (in topological order) |
should reside on a device with a given `DLDeviceType` (`kDLCPU`, `kDLCUDA`, etc). | ||
2. The `PlanDevices` pass uses those annotations to decide the unique device for every Relay | ||
sub-expression, including every primitive operator call. Sub-expressions which are unconstrained | ||
are assigned to the 'default' device. The pass then inserts `device_copy` operators whenever data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"default" also is called "fallback," right?
sub-expression, including every primitive operator call. Sub-expressions which are unconstrained | ||
are assigned to the 'default' device. The pass then inserts `device_copy` operators whenever data | ||
needs to cross device boundaries. | ||
3. The user must also supply a list of `Target` objects. The compiler uses that list to build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to clarify as they are also required at runtime to the executor ctor
3. The user must also supply a list of `Target` objects. The compiler uses that list to build | |
3. The user must also supply a list of `Target` objects to `tvm.relay.build`. The compiler uses that list to build |
|
||
TVM supports 'hetrogeneous' execution, whereby primitive operators may be (sequentially) evaluated | ||
on more than one device (GPU, CPU, accelerator, etc). For the non-BYOC flow this works as follows: | ||
1. Relay programs may contain `on_device` annotations which specify that a sub-expression's result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so is this constraining only the output of a particular subgraph (e.g. the subgraph can be actually implemented on a different device so long as a memory copy is done?)
in-tree.) | ||
3. The `AnnotateTarget` pass looks for the annotations from (1) and (2) to decide the unique | ||
toolchain name for every Relay sub-expression which should go via a BYOC path. The transitions in | ||
to and out of those sub-expressions are marked with `compiler_begin` and `compiler_end` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, because i've seen compiler_begin and compiler_end before but not many examples in complex programs: are these essentially a source-level annotation e.g. marking all Relay expressions between the two annotations as offloaded to a particular compiler? why shouldn't these be hierarchical e.g. CompilerBlock which contains the subgraph as a tree?
``` | ||
class SEScope { | ||
DLDeviceType device_type; | ||
int virtual_device_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this should be a String name which makes sense to the user. Doing this is helpful for a couple other reasons besides the compilation UI:
- In generated source code, it's possible to refer to the device by name. In particular, the embedded C API would like to have this for the conglomerate tvm_device_t struct.
- In systems with multiple e.g. CPUs, using an index here then implies some ordering (e.g. littlest CPU to biggest). It's better to make the assignment of ID to CPU capability more explicit
Finally, using a name would simplify the heterogeneous Target.
However, this is a bit of a lift. I do feel strongly we should get to this world. If it's not something that makes sense to do now, we could also revisit after or concurrent with USMP.
`PlanDevices`. In particular, any `SEScope` encountered during device planning is 'canonicalized' to fill | ||
in a `Target` by the same lookup as we do today. This means we continue to support the easy shorthand of | ||
referring to devices by the `DLDeviceType` alone. However, advanced users can supply a `SEScope` to these | ||
operators which contains the exact `Target` to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would be roughly the deprecation plan here? eventually we ban all the inputs to the compiler which could refer to SEScope in terms of DLDeviceType and then tighten the typing requirements here? this would be a backwards-incompatible Relay change. cc @jroesch
tir::PrimFunc.buffer_map -> tir::Buffer.data -> tir::Var.type_annotation -> PointerType.storage_scope -> String | ||
``` | ||
|
||
to discover the memory scope for each Relay argument. That scope will enter `SEScope`s and flow through the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by "enter SEScope
s"?
6. We rework `PartitionGraph` to `PartitionBySEScope` to work on `SEScope` annotations instead of | ||
`compiler_begin` and `compiler_end` annotations. Algorithmically it's not a big change -- maximal | ||
sub-expressions which share the same `SEScope` (or a projection thereof, eg just the `target`) are hoisted | ||
into global `Function`s. The function's `"result_se_scope"` attribute describes both the scope holding the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so then here, this sort of implements the "grouping adjacent expressions onto the same device" as a side-effect?
7. We allow `MergeComposite` to be used to insert `on_device` annotations, call it `MergeAndAnnotate`. | ||
|
||
8. (?) We rework `AnnotateTarget` to just look for `FTVMAnnotateTarget` operator attributes, call it | ||
`AnnotateSEScopes`. When the function fires an `on_device` annotation is inserted. However since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarifying my understanding:
`AnnotateSEScopes`. When the function fires an `on_device` annotation is inserted. However since | |
`AnnotateSEScopes`. When `FTVMAnnotateSEScopes` returns true, an `on_device` annotation is inserted. However since |
CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices. Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future. However, we get two nice side effects right away: - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer. - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero. The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.
CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices. Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future. However, we get two nice side effects right away: - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer. - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero. The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.
CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices. Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future. However, we get two nice side effects right away: - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer. - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero. The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.
CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices. Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future. However, we get two nice side effects right away: - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer. - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero. The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.
CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices. Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future. However, we get two nice side effects right away: - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer. - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero. The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.
…s. (#9326) * Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes. CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices. Continuing from #9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future. However, we get two nice side effects right away: - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer. - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero. The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope. * [checkpoint] Revert emitter.py, must have run 'black .' by mistake. * [checkpoint] Address PR comments Also add back SplitArgs pass in build_module.cc which somehow got lost in the shuffle. (try again -- flaky test_crt.py test_autotune?) * [checkpoint] Fix after rebase on CallLowered.
…s. (apache#9326) * Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes. CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices. Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future. However, we get two nice side effects right away: - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer. - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero. The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope. * [checkpoint] Revert emitter.py, must have run 'black .' by mistake. * [checkpoint] Address PR comments Also add back SplitArgs pass in build_module.cc which somehow got lost in the shuffle. (try again -- flaky test_crt.py test_autotune?) * [checkpoint] Fix after rebase on CallLowered.
…g in 'device' planning. (apache#9313) [Target] Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form. * Reworked to avoid global SEScopeCache. Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. So instead placed the cache in the CompilationConfig class. Since that class now has everything the device planner needs to do its job, promoted it to be an FFI-able Object, which is now in compilation_config.{h,cc}. I think we can do much better with CompilationConfig, but for now keeping it to the minimum I needed to prepare for device planning from all the executor compilation codepaths.
…s. (apache#9326) * Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes. CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices. Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future. However, we get two nice side effects right away: - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer. - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero. The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope. * [checkpoint] Revert emitter.py, must have run 'black .' by mistake. * [checkpoint] Address PR comments Also add back SplitArgs pass in build_module.cc which somehow got lost in the shuffle. (try again -- flaky test_crt.py test_autotune?) * [checkpoint] Fix after rebase on CallLowered.
…g in 'device' planning. (apache#9313) [Target] Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form. * Reworked to avoid global SEScopeCache. Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. So instead placed the cache in the CompilationConfig class. Since that class now has everything the device planner needs to do its job, promoted it to be an FFI-able Object, which is now in compilation_config.{h,cc}. I think we can do much better with CompilationConfig, but for now keeping it to the minimum I needed to prepare for device planning from all the executor compilation codepaths.
…s. (apache#9326) * Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes. CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices. Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future. However, we get two nice side effects right away: - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer. - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero. The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope. * [checkpoint] Revert emitter.py, must have run 'black .' by mistake. * [checkpoint] Address PR comments Also add back SplitArgs pass in build_module.cc which somehow got lost in the shuffle. (try again -- flaky test_crt.py test_autotune?) * [checkpoint] Fix after rebase on CallLowered.
…s. (apache#9326) * Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes. CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices. Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future. However, we get two nice side effects right away: - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer. - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero. The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope. * [checkpoint] Revert emitter.py, must have run 'black .' by mistake. * [checkpoint] Address PR comments Also add back SplitArgs pass in build_module.cc which somehow got lost in the shuffle. (try again -- flaky test_crt.py test_autotune?) * [checkpoint] Fix after rebase on CallLowered.
…g in 'device' planning. (apache#9313) [Target] Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning This is the first step in apache/tvm-rfcs#38 to bring devices and targets together when doing device planning. I've gone ahead and also included a memory scope in this object since we will also need to propagate memory scopes across Relay expressions once this basic preparation is in place. In the meantime that field will be left as "". Once device planning works in units of SEScopes it will be possible to directly read off the device and target for any Relay sub-expression without the need for TargetMaps ort the construction of default Targets. SEScopes also support 'Join' and 'Default' operations needed when constraint solving in the device planner. You can see those in use in my scratchpad branch: https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes This PR also brings some duplicated and the ad-hoc 'default target' handling logic together into a CompilationConfig class. (Again, see the scratchpad branch for how that will end up being used). I've placed that next to SEScope since it's main purpose is to a) establish the default SEScope for primitive ops b) establish the SEScope for the 'host' c) feed a definitive vector of Targets into device planning so it can resolve all "on_device" and "device_copy" device references to their full SEScope form. * Reworked to avoid global SEScopeCache. Realized while working through unit tests in the sequel that it's reasonable for folks to call build multiple times with distinct Target objects, in which case the global cache would grow without bound. So instead placed the cache in the CompilationConfig class. Since that class now has everything the device planner needs to do its job, promoted it to be an FFI-able Object, which is now in compilation_config.{h,cc}. I think we can do much better with CompilationConfig, but for now keeping it to the minimum I needed to prepare for device planning from all the executor compilation codepaths.
…s. (apache#9326) * Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes. CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices. Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future. However, we get two nice side effects right away: - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer. - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero. The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope. * [checkpoint] Revert emitter.py, must have run 'black .' by mistake. * [checkpoint] Address PR comments Also add back SplitArgs pass in build_module.cc which somehow got lost in the shuffle. (try again -- flaky test_crt.py test_autotune?) * [checkpoint] Fix after rebase on CallLowered.
Closing as obsolete, since most of this is either already done or has been subsumed by the Collage proposal. |
In rendered form:
https://github.com/mbs-octoml/mbs-tvm-rfcs/blob/mbs-target-and-device-planning/rfcs/0038-unified-device-target-and-memory-scope-planning.md
Some earlier discussion:
apache/tvm#8892
Tracking issue: apache/tvm#9327
(CORE-95 in the OctoML JIRA).