Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay] Prepare for merging context_analysis.cc and device_annotation.cc #9077

Merged
merged 2 commits into from
Sep 24, 2021

Conversation

mbs-octoml
Copy link
Contributor

@mbs-octoml mbs-octoml commented Sep 22, 2021

In preparation for #9038:

  • Improve construction and deconstruction of "on_device" and "device_copy" calls since they will be center stage.
  • Move "device_copy" support out of memory.h into own module to mirror "on_device".
  • Clearing out some DLOG -> VLOG changes I found helped me debug.
  • Clearing out some whitespace-only changes I accumulated.
  • Cleanup Device aliases and make sure we have some global common non-standard DLDeviceTypes.

- Improve construction and deconstruction of "on_device" and "device_copy" calls since they will be center stage.
- Move "device_copy" support out of memory.h into own module to mirror "on_device".
- Clearing out some DLOG -> VLOG changes I found helped me debug.
- Clearing out some whitespace-only changes I accumulated.
@mbs-octoml mbs-octoml changed the title [Relay] Prepare for merging context_analysis.cc and device_annotation.cc in #9038 [Relay] Prepare for merging context_analysis.cc and device_annotation.cc Sep 22, 2021
Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @mbs-octoml, thanks for splitting this out from the other PR - overall I think I understand where this is going, but I've left a few questions and suggestions to clarify 😸


/*! \brief Device type on which each of the function's arguments already resides. */
Array<Integer> param_device_types;
// TODO(mbs): Replace device types with TargetDevice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these TODOs, I'm assuming you're also tracking this work elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be able to freely write "TODO(issues/1234)" so everything is cross linked. That would help anyone picking up the issue get a head start on where to look in the code, as well as just remind us all where things are likely to shift. That got a bit tangled up with questions about how we should reconcile GitHub and our internal OctoML Jira issues, and the granularity of GitHub issues, but I think we've decided to just switch to GitHub. I'm guessing you have the same dilemma at Arm? Anyway these "TODO(mbs)" is just me leaving enough cookie crumbs so I can go back and do it properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be the start of quite a long discussion, I've seen quite a lot of abandoned TODO comments so the best approach (in my opinion) would be either GitHub issues to track all the work or just tracking internally. In this case there may be contention as to whether we use TargetDevice or something else, so if we leave this for a prolonged period people coming to this code may be confused. One of my favourite examples of this is bugs which get fixed in an entirely different area leaving TODO(Mousius) - Fix bug which we will never get rid of.

Having said all of that, I think these will get cleaned up pretty quickly and it's probably just my thinking - so let's not block anything on it 😸

*/

/*!
* \file relay/attrs/device_copy.h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* \file relay/attrs/device_copy.h
* \file relay/op/memory/device_copy.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.
begs the question why we have these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also for annotation.h)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same reason we have type hints and types in our Python comments? 😸

* function parameters and result.
*/
struct FunctionOnDeviceAttrs : public tvm::AttrsNode<FunctionOnDeviceAttrs> {
constexpr static const char* kFunctionAttrsKey = "on_device";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically these would go into include/tvm/ir/function.h as the list of attributes on a function, is there a reason for them being here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ignorance of the convention! Done, thx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, except I'd like to keep this scoped to relay::, so I moved it into it's own include/tvm/attrs/function.h since ir/function.h is at the BaseFunc 'common to both Relay and TIR' scope. Though that's yet another duplicated basename which I don't like much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an immediate solution, but this looks at least neater 😸

// as a singleton target map indexed by the invalid DLDeviceType '0'.
constexpr DLDeviceType kNullDeviceType = static_cast<DLDeviceType>(0);

// An 'invalid' device type, does not correspond to any DLDeviceType enum.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these not be added as valid enum variants to dlpack.h ? There's a risk here that dlpack will change and it'll become incompatible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. Note that these special values are endemic in the code (often stored in naked ints) so I'm at least not introducing a new convention. I think the argument for this approach is the enum values should always be valid at runtime and these invalid-but-distinguished values are just a compile time concept.

I this as a stepping stone to planning with hybrid targets/devices rather than just device or device types, in which case these distinguished values can go away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger roger.

return _make.on_device(data, _device_to_int(device), is_fixed)


# for testing only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is for testing, can it not be moved to the relevant test file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, I changed my mind on this and think it's a legit user-visible annotation, but forgot to update the comment. Done.

return device.device_type
if isinstance(device, str):
return _nd.device(device).device_type
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just refactored from the below, but do we have a test for ValueError being raised?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't find any existing unit tests for these so added some.

Comment on lines 60 to 61
if (expr->IsInstance<OpNode>() || expr->IsInstance<GlobalVarNode>() ||
expr->IsInstance<VarNode>() || expr->IsInstance<ConstructorNode>()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth documenting this condition in code:

Suggested change
if (expr->IsInstance<OpNode>() || expr->IsInstance<GlobalVarNode>() ||
expr->IsInstance<VarNode>() || expr->IsInstance<ConstructorNode>()) {
bool is_device_polymorphic = expr->IsInstance<OpNode>() || expr->IsInstance<ConstructorNode>();
bool has_implied_device = expr->IsInstance<VarNode>() || expr->IsInstance<GlobalVarNode>();
if (is_device_polymorphic || has_implied_device) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, though I used comments and early returns rather than bool assignment since I personally find that clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll avoid the clean code vs comments holy war here, it looks good now 😸

@mbs-octoml
Copy link
Contributor Author

Note the CI failure is unrelated and fixed by #9076

@jroesch
Copy link
Member

jroesch commented Sep 23, 2021

@mbs-octoml I already did a pass on the other PR, LGTM, I think let's quickly address (on this PR) @Mousius suggestions and keep it rolling for next PR.

Some stray py formatting changes snuck in since I just run black . at the root.
@mbs-octoml
Copy link
Contributor Author

Thank you @Mousius & @jroesch.

Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, follow ups will address many of the concerns so we should move to them 😸

@masahi masahi merged commit 5871608 into apache:main Sep 24, 2021
@mbs-octoml mbs-octoml deleted the mbs-prepare-device-planning branch September 24, 2021 21:19
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
….cc (apache#9077)

* [Relay] Prepare for merging context_analysis.cc and device_annotation.cc

- Improve construction and deconstruction of "on_device" and "device_copy" calls since they will be center stage.
- Move "device_copy" support out of memory.h into own module to mirror "on_device".
- Clearing out some DLOG -> VLOG changes I found helped me debug.
- Clearing out some whitespace-only changes I accumulated.

* [checkpoint] Address Christopher's comments.

Some stray py formatting changes snuck in since I just run black . at the root.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
….cc (apache#9077)

* [Relay] Prepare for merging context_analysis.cc and device_annotation.cc

- Improve construction and deconstruction of "on_device" and "device_copy" calls since they will be center stage.
- Move "device_copy" support out of memory.h into own module to mirror "on_device".
- Clearing out some DLOG -> VLOG changes I found helped me debug.
- Clearing out some whitespace-only changes I accumulated.

* [checkpoint] Address Christopher's comments.

Some stray py formatting changes snuck in since I just run black . at the root.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants