Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEST] Remove llvm -device=arm_cpu and cuda -libs=cudnn from default enabled list #9905

Merged
merged 1 commit into from
Jan 14, 2022

Conversation

masahi
Copy link
Member

@masahi masahi commented Jan 11, 2022

Right now tvm.testing.enabled_targets() return llvm -device=arm_cpu and cuda -libs=cudnn (if cudnn is enabled, applies to CI env) in addition to llvm and cuda. So all tests that loop over enabled targets using tvm.testing.enabled_targets() are essentially running x86 and cuda tests twice.

I don't see a reason to run llvm -device=arm_cpu tests in addition to llvm. For cudnn, we have separate unit tests dedicated for it, we don't need to run full integration tests with it.

I did this change for PyTorch in #9781 and it sped up local testing time significantly. It should make the CI situation a lot better.

@areusch @leandron @Lunderberg @driazati

@areusch
Copy link
Contributor

areusch commented Jan 11, 2022

cc @u99127 @Mousius @manupa-arm this is an idea that came up internally but it'd be great to get your feedback regarding test coverage. we could leave this enabled e.g. on ci-arm, too.

@masahi
Copy link
Member Author

masahi commented Jan 11, 2022

Does anyone know what exactly llvm -device=arm_cpu do on an x86 host? For proper ARM codegen, I thought there is another flag that does affect LLVM.

@Lunderberg
Copy link
Contributor

For history, the llvm -device=arm_cpu was added in #6331, though I can't see where it traced back beyond there.

I added the cudnn target in #8383, as part of parametrizing tests for better visibility. Some unit tests which explicitly exercised a cudnn path could pass on a machine that that had cudnn disabled, but fail when a cudnn was enabled. The addition of cudnn as a general target was to ensure that no test coverage was dropped as a result.

@masahi
Copy link
Member Author

masahi commented Jan 11, 2022

Thanks @Lunderberg, I think testing for external libs are outside the scope of default enabled list. For complete coverage, we should also exercise cublas, mkl etc with full integration tests but that would be impractical.

@driazati
Copy link
Member

cudnn (and others mentioned) sound like good targets for a main-only / nightly test run (where run time doesn't really matter) rather than something that runs on every PR

Copy link
Contributor

@Lunderberg Lunderberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@u99127
Copy link
Contributor

u99127 commented Jan 12, 2022

cc @u99127 @Mousius @manupa-arm this is an idea that came up internally but it'd be great to get your feedback regarding test coverage. we could leave this enabled e.g. on ci-arm, too.

I think we would need to leave it in place for the testing on ci_arm . I am also not immediately sure what testing coverage is lost by removing this - perhaps see the difference in reports for pytest-cov before taking this off so that we make an informed decision .

regards
Ramana

@masahi
Copy link
Member Author

masahi commented Jan 12, 2022

@u99127 @areusch I looked it what -device=arm_cpu does. It's only used at

device = tgt.attrs.get("device", "")
where it instructs TVM to fetch a pre-tuned topi schedule configuration for ARM targets. It only affects tile sizes etc for conv2d, so it is perfectly valid to use it on an x86 host, although there is no point doing so.

Anyway, running both llvm and llvm -device=arm_cpu has nothing to do with test coverage and it only increases test time for no good reason.

@areusch
Copy link
Contributor

areusch commented Jan 12, 2022

@masahi i think it also enables schedules with ".arm_cpu" suffix so i don't think it's quite that simple. i think @u99127 is right that we may not need to run these tests elsewhere other than ci-arm, but i think we need to diff the tests being run first before coming to that conclusion.

@masahi
Copy link
Member Author

masahi commented Jan 12, 2022

ok, I missed that -device=arm_cpu also affects compute/schedule/strategy dispatch. But it still stands that having both

DEFAULT_TEST_TARGETS = [
    "llvm",
    "llvm -device=arm_cpu",

is odd, since llvm is not supposed to work on ci-arm. So it has to be the case that DEFAULT_TEST_TARGETS is never used in ci-arm, and that it is safe to remove llvm -device=arm_cpu from it. Tests that intend to be run on ARM hosts should always use tvm.target.arm_cpu() or manually specify the -mcpu flag.

@masahi
Copy link
Member Author

masahi commented Jan 13, 2022

I'm going to merge this unless there is no further comment.

@u99127
Copy link
Contributor

u99127 commented Jan 14, 2022

I'd be happier with a comparison test run of say the topi tests with both llvm and llvm -device=arm_cpu on a ci_cpu machine to see what the difference is.

I've not had the chance to look at the difference if any in test results for topi on aarch64 with llvm vs llvm -device=arm_cpu on a ci_arm environment. That's not on by default yet but I was working on that a bit slowly.

regards
Ramana

@masahi
Copy link
Member Author

masahi commented Jan 14, 2022

Yeah, CI job time has been very unstable recently (some finishes in 4 hours while others > 8). So it's difficult to assess how much this helps.

Note that "llvm" means x86 specifically, so you never use llvm (without any option) target on aarch64. But that's what on DEFAULT_TEST_TARGETS, so like I said, ci-arm shouldn't be using this value. So there would be no loss of distinct test coverage.

@masahi masahi merged commit 4419241 into apache:main Jan 14, 2022
crazydemo pushed a commit to crazydemo/tvm that referenced this pull request Jan 27, 2022
ylc pushed a commit to ylc/tvm that referenced this pull request Feb 16, 2022
junrushao pushed a commit that referenced this pull request Mar 6, 2022
…fault test target list (#10500)

After recent improvement in GPU frontend tests, I found that `topi: GPU` has become a bottleneck. From the log https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-10391/14/pipeline/319, it is clear that topi tests are running on target `llvm -device=arm_cpu` and `cuda -libs=cudnn`, which I claim is completely redundant since we already run on `llvm` and `cuda` targets. 

In #9905, I've already removed them from `DEFAULT_TEST_TARGETS`, but `topi: GPU` uses its own list of targets which still includes `llvm -device=arm_cpu` and `cuda -libs=cudnn`. I propose to remove them from topi test targets, which hopefully will cut topi GPU tests time by half.
ziqiangxu8457 pushed a commit to ziqiangxu8457/tvm that referenced this pull request Mar 6, 2022
…fault test target list (apache#10500)

After recent improvement in GPU frontend tests, I found that `topi: GPU` has become a bottleneck. From the log https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-10391/14/pipeline/319, it is clear that topi tests are running on target `llvm -device=arm_cpu` and `cuda -libs=cudnn`, which I claim is completely redundant since we already run on `llvm` and `cuda` targets. 

In apache#9905, I've already removed them from `DEFAULT_TEST_TARGETS`, but `topi: GPU` uses its own list of targets which still includes `llvm -device=arm_cpu` and `cuda -libs=cudnn`. I propose to remove them from topi test targets, which hopefully will cut topi GPU tests time by half.
pfk-beta pushed a commit to pfk-beta/tvm that referenced this pull request Apr 11, 2022
…fault test target list (apache#10500)

After recent improvement in GPU frontend tests, I found that `topi: GPU` has become a bottleneck. From the log https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-10391/14/pipeline/319, it is clear that topi tests are running on target `llvm -device=arm_cpu` and `cuda -libs=cudnn`, which I claim is completely redundant since we already run on `llvm` and `cuda` targets. 

In apache#9905, I've already removed them from `DEFAULT_TEST_TARGETS`, but `topi: GPU` uses its own list of targets which still includes `llvm -device=arm_cpu` and `cuda -libs=cudnn`. I propose to remove them from topi test targets, which hopefully will cut topi GPU tests time by half.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants