-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay] Prepare for merging context_analysis.cc and device_annotation.cc #9077
Conversation
- Improve construction and deconstruction of "on_device" and "device_copy" calls since they will be center stage. - Move "device_copy" support out of memory.h into own module to mirror "on_device". - Clearing out some DLOG -> VLOG changes I found helped me debug. - Clearing out some whitespace-only changes I accumulated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @mbs-octoml, thanks for splitting this out from the other PR - overall I think I understand where this is going, but I've left a few questions and suggestions to clarify 😸
include/tvm/relay/attrs/annotation.h
Outdated
|
||
/*! \brief Device type on which each of the function's arguments already resides. */ | ||
Array<Integer> param_device_types; | ||
// TODO(mbs): Replace device types with TargetDevice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these TODO
s, I'm assuming you're also tracking this work elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to be able to freely write "TODO(issues/1234)" so everything is cross linked. That would help anyone picking up the issue get a head start on where to look in the code, as well as just remind us all where things are likely to shift. That got a bit tangled up with questions about how we should reconcile GitHub and our internal OctoML Jira issues, and the granularity of GitHub issues, but I think we've decided to just switch to GitHub. I'm guessing you have the same dilemma at Arm? Anyway these "TODO(mbs)" is just me leaving enough cookie crumbs so I can go back and do it properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be the start of quite a long discussion, I've seen quite a lot of abandoned TODO
comments so the best approach (in my opinion) would be either GitHub issues to track all the work or just tracking internally. In this case there may be contention as to whether we use TargetDevice
or something else, so if we leave this for a prolonged period people coming to this code may be confused. One of my favourite examples of this is bugs which get fixed in an entirely different area leaving TODO(Mousius) - Fix bug
which we will never get rid of.
Having said all of that, I think these will get cleaned up pretty quickly and it's probably just my thinking - so let's not block anything on it 😸
src/relay/op/memory/device_copy.h
Outdated
*/ | ||
|
||
/*! | ||
* \file relay/attrs/device_copy.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* \file relay/attrs/device_copy.h | |
* \file relay/op/memory/device_copy.h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
begs the question why we have these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also for annotation.h)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same reason we have type hints and types in our Python comments? 😸
include/tvm/relay/attrs/annotation.h
Outdated
* function parameters and result. | ||
*/ | ||
struct FunctionOnDeviceAttrs : public tvm::AttrsNode<FunctionOnDeviceAttrs> { | ||
constexpr static const char* kFunctionAttrsKey = "on_device"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically these would go into include/tvm/ir/function.h
as the list of attributes on a function, is there a reason for them being here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just ignorance of the convention! Done, thx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, except I'd like to keep this scoped to relay::, so I moved it into it's own include/tvm/attrs/function.h since ir/function.h is at the BaseFunc 'common to both Relay and TIR' scope. Though that's yet another duplicated basename which I don't like much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have an immediate solution, but this looks at least neater 😸
// as a singleton target map indexed by the invalid DLDeviceType '0'. | ||
constexpr DLDeviceType kNullDeviceType = static_cast<DLDeviceType>(0); | ||
|
||
// An 'invalid' device type, does not correspond to any DLDeviceType enum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these not be added as valid enum
variants to dlpack.h
? There's a risk here that dlpack will change and it'll become incompatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. Note that these special values are endemic in the code (often stored in naked ints) so I'm at least not introducing a new convention. I think the argument for this approach is the enum values should always be valid at runtime and these invalid-but-distinguished values are just a compile time concept.
I this as a stepping stone to planning with hybrid targets/devices rather than just device or device types, in which case these distinguished values can go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roger roger.
return _make.on_device(data, _device_to_int(device), is_fixed) | ||
|
||
|
||
# for testing only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is for testing, can it not be moved to the relevant test file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, I changed my mind on this and think it's a legit user-visible annotation, but forgot to update the comment. Done.
return device.device_type | ||
if isinstance(device, str): | ||
return _nd.device(device).device_type | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just refactored from the below, but do we have a test for ValueError
being raised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't find any existing unit tests for these so added some.
if (expr->IsInstance<OpNode>() || expr->IsInstance<GlobalVarNode>() || | ||
expr->IsInstance<VarNode>() || expr->IsInstance<ConstructorNode>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be worth documenting this condition in code:
if (expr->IsInstance<OpNode>() || expr->IsInstance<GlobalVarNode>() || | |
expr->IsInstance<VarNode>() || expr->IsInstance<ConstructorNode>()) { | |
bool is_device_polymorphic = expr->IsInstance<OpNode>() || expr->IsInstance<ConstructorNode>(); | |
bool has_implied_device = expr->IsInstance<VarNode>() || expr->IsInstance<GlobalVarNode>(); | |
if (is_device_polymorphic || has_implied_device) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, though I used comments and early returns rather than bool assignment since I personally find that clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll avoid the clean code vs comments holy war here, it looks good now 😸
Note the CI failure is unrelated and fixed by #9076 |
@mbs-octoml I already did a pass on the other PR, LGTM, I think let's quickly address (on this PR) @Mousius suggestions and keep it rolling for next PR. |
Some stray py formatting changes snuck in since I just run black . at the root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, follow ups will address many of the concerns so we should move to them 😸
….cc (apache#9077) * [Relay] Prepare for merging context_analysis.cc and device_annotation.cc - Improve construction and deconstruction of "on_device" and "device_copy" calls since they will be center stage. - Move "device_copy" support out of memory.h into own module to mirror "on_device". - Clearing out some DLOG -> VLOG changes I found helped me debug. - Clearing out some whitespace-only changes I accumulated. * [checkpoint] Address Christopher's comments. Some stray py formatting changes snuck in since I just run black . at the root.
….cc (apache#9077) * [Relay] Prepare for merging context_analysis.cc and device_annotation.cc - Improve construction and deconstruction of "on_device" and "device_copy" calls since they will be center stage. - Move "device_copy" support out of memory.h into own module to mirror "on_device". - Clearing out some DLOG -> VLOG changes I found helped me debug. - Clearing out some whitespace-only changes I accumulated. * [checkpoint] Address Christopher's comments. Some stray py formatting changes snuck in since I just run black . at the root.
In preparation for #9038: