Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Target] Make key=arm_cpu --> key=arm_cpu,cpu on AArch64 #13775

Merged
merged 3 commits into from
Jan 13, 2023

Conversation

AndrewZhaoLuo
Copy link
Contributor

@AndrewZhaoLuo AndrewZhaoLuo commented Jan 12, 2023

The key in a target is used in dispatching to different strategies.

In most of the code base arm_cpu implies also having cpu. E.g. in python/tvm/target/target.py:

def arm_cpu(...):
    opts = ["-keys=arm_cpu,cpu", "-device=arm_cpu"] + pre_defined_opt
    opts = _merge_opts(opts, options)
    return Target(" ".join(["llvm"] + opts))

In src/target/parsers/mprofile.cc:

static Array<String> MergeKeys(Optional<Array<String>> existing_keys) {
  const Array<String> kExtraKeys = {"arm_cpu", "cpu"};

  if (!existing_keys) {
    return kExtraKeys;
  }

  Array<String> keys = existing_keys.value();
  for (String key : kExtraKeys) {
    if (std::find(keys.begin(), keys.end(), key) == keys.end()) {
      keys.push_back(key);
    }
  }
  return keys;
}

However A-series targets did not have both arm_cpu and cpu keys. Rather they only had arm_cpu keys. This PR makes it so it includes both, matching everywhere else in codebase.

This also may lead to increased performance speedups as previously A-series targets would dispatch to generic strategy instead of cpu optimized strategy. With this, they should properly dispatch to arm_cpu strategy, and fallback to cpu strategy.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jan 12, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

  • No users to tag found in teams: target See #10317 for details

Generated by tvm-bot

Copy link
Member

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, this was an unintended side effect of #12474, thanks for fixing this @AndrewZhaoLuo!

@Mousius Mousius changed the title [Target] Make key=arm_cpu --> key=arm_cpu,cpu [Target] Make key=arm_cpu --> key=arm_cpu,cpu on AArch64 Jan 12, 2023
@AndrewZhaoLuo AndrewZhaoLuo merged commit 5878f60 into apache:main Jan 13, 2023
ashutosh-arm added a commit to ashutosh-arm/tvm that referenced this pull request Mar 10, 2023
…ncatenate

NPU used generic implementation for concatenate
before cpu schedules were made the default fallback
schedules. This leads to performance degradation
as this blocks fusion with nearby ops. This commit
adds Relay op strategy for arm_cpu implementation
which makes it use arm_cpu schedule before cpu one.
Reference: apache#13775

Co-authored-by: Luke Hutton <[email protected]>
Change-Id: If6e65d74309702daf1f837a24e7b19c2912d0d9c
ashutosh-arm added a commit to ashutosh-arm/tvm that referenced this pull request Mar 10, 2023
…atenate

NPU used generic implementation for concatenate
before cpu schedules were made the default fallback
schedules. This leads to performance degradation
as this blocks fusion with nearby ops. This commit
adds Relay op strategy for arm_cpu implementation
which makes it use arm_cpu schedule before cpu one.
Reference: apache#13775

Co-authored-by: Luke Hutton <[email protected]>
Change-Id: If6e65d74309702daf1f837a24e7b19c2912d0d9c
ashutosh-arm added a commit to ashutosh-arm/tvm that referenced this pull request Mar 14, 2023
…atenate

NPU used generic implementation for concatenate
before cpu schedules were made the default fallback
schedules. This leads to performance degradation
as this blocks fusion with nearby ops. This commit
adds Relay op strategy for arm_cpu implementation
which makes it use arm_cpu schedule before cpu one.
Reference: apache#13775

Co-authored-by: Luke Hutton <[email protected]>
lhutton1 added a commit that referenced this pull request Mar 14, 2023
…oncat (#14270)

Previously used generic implementation for concatenate
before cpu schedules were made the default fallback
schedules. This leads to performance degradation
as this blocks fusion with nearby ops. This commit
adds Relay op strategy for arm_cpu implementation
which makes it use arm_cpu schedule before cpu one.
Reference: #13775

Co-authored-by: Luke Hutton <[email protected]>
fzi-peccia pushed a commit to fzi-peccia/tvm that referenced this pull request Mar 27, 2023
lhutton1 added a commit to lhutton1/tvm that referenced this pull request Aug 3, 2023
Currently the fallback used when compiling a dense operation with
targets such as `llvm -device=arm_cpu` is `dense.generic`. This results
very poor performance. Although apache#13775
meant that x86 schedules are used in cases where no strategy is provided
by arm_cpu, the dense strategy is registered due to the existance of
specialized schedules for arm_cpu e.g. a schedule for embedded devices.
This commit ensures x86 schedules are used inplace of a generic
schedule which yeilds much better performance.

The commit also follows the same approach for the `dense.generic`
schedule as the x86 strategy. This will only be used when autoscheduler
is enabled.

A test has been added to check the intended schedules are picked when
compiling with `arm_cpu`.

Change-Id: I8697f630d4acfab71a9626cf9e0dc3086987f163
lhutton1 added a commit to lhutton1/tvm that referenced this pull request Aug 4, 2023
Currently the fallback used when compiling a dense operation with
targets such as `llvm -device=arm_cpu` is `dense.generic`. This results
very poor performance. Although apache#13775
meant that x86 schedules are used in cases where no strategy is provided
by arm_cpu, the dense strategy is registered due to the existance of
specialized schedules for arm_cpu e.g. a schedule for embedded devices.
This commit ensures x86 schedules are used inplace of a generic
schedule which yeilds much better performance.

The commit also follows the same approach for the `dense.generic`
schedule as the x86 strategy. This will only be used when autoscheduler
is enabled.

A test has been added to check the intended schedules are picked when
compiling with `arm_cpu`.

Change-Id: I8697f630d4acfab71a9626cf9e0dc3086987f163
leandron pushed a commit that referenced this pull request Aug 7, 2023
Currently the fallback used when compiling a dense operation with
targets such as `llvm -device=arm_cpu` is `dense.generic`. This results
very poor performance. Although #13775
meant that x86 schedules are used in cases where no strategy is provided
by arm_cpu, the dense strategy is registered due to the existance of
specialized schedules for arm_cpu e.g. a schedule for embedded devices.
This commit ensures x86 schedules are used inplace of a generic
schedule which yeilds much better performance.

The commit also follows the same approach for the `dense.generic`
schedule as the x86 strategy. This will only be used when autoscheduler
is enabled.

A test has been added to check the intended schedules are picked when
compiling with `arm_cpu`.

Change-Id: I8697f630d4acfab71a9626cf9e0dc3086987f163
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants