Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Topi][Unittests] Parametrized tests in test_topi_dense.py, split out gpu-independent implementations #8336

Merged
merged 3 commits into from
Jun 30, 2021

Conversation

Lunderberg
Copy link
Contributor

[Topi][UnitTests] Parametrized tests in test_topi_dense.py

Now, tests run for multiple data types, can be extended with additional datatypes.

[Topi] Separated generic-gpu nn.dense implementations into topi.gpu.dense

As a follow-up to the renaming of "gpu" to "cuda", separating implementations that require CUDA (e.g. dense_cublas.cuda) from implementations that require any GPU, but not necessarily a CUDA GPU (e.g. dense_small_batch.gpu).

My intent is to pair this migration with the extension of unit tests to cover additional GPU runtimes, migrating only implementations that run correctly on non-CUDA GPU devices.

@Lunderberg Lunderberg force-pushed the topi_dense_parametrize branch 2 times, most recently from 0dde14d to 98a145b Compare June 24, 2021 23:04
python/tvm/topi/gpu/dense.py Outdated Show resolved Hide resolved
python/tvm/topi/gpu/dense.py Outdated Show resolved Hide resolved
return (a_np, b_np, c_np, d_np)


def test_dense(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that wheather this can test different targets as you expected.

Shouldn't this function be decorated with "@tvm.testing.parametrize_targets"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of #8010 , the decorator for @tvm.testing.parametrize_targets is only needed if the test should override the default targets. Otherwise, having the target and/or dev fixture arguments is sufficient to have the test run on all enabled targets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad. That really a helpful feature.

Now, tests run for multiple data types, can be extended with
additional datatypes.
@Lunderberg Lunderberg force-pushed the topi_dense_parametrize branch from 98a145b to 4786a21 Compare June 29, 2021 17:17
@Lunderberg
Copy link
Contributor Author

Lunderberg commented Jun 29, 2021

Added a few fixes in the codegen needed for the vulkan tests to run correctly on newer nvidia GPUs. (@masahi )

…ense

As a follow-up to the renaming of "gpu" to "cuda", separating
implementations that require CUDA (e.g. dense_cublas.cuda) from
implementations that require any GPU, but not necessarily a CUDA GPU
(e.g. dense_small_batch.gpu).

My intent is to pair this migration with the extension of unit tests
to cover additional GPU runtimes, migrating only implementations that
run correctly on non-CUDA GPU devices.
…lts on some GPUs

- In ThreadAllreduceBuilder, separate out load/store so that they can
  have a memory barrier in-between.

- In Vulkan codegen, added Workgroup memory sync for subgroup thread
  sync, since the different subgroup threads can still access
  workgroup memory.  Longer-term, may need tir enhancements to
  separate out sync of control/memory.
@Lunderberg Lunderberg force-pushed the topi_dense_parametrize branch from 4786a21 to adb2431 Compare June 29, 2021 17:25
Copy link
Contributor

@jcf94 jcf94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me.
I have a PR #8234 which also have some modifications on topi & cuda strategy. It should be conflict with this.
I can rebase it after this has been merged.

@jcf94 jcf94 merged commit ae58f2c into apache:main Jun 30, 2021
@Lunderberg Lunderberg deleted the topi_dense_parametrize branch June 30, 2021 13:31
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…ut gpu-independent implementations (apache#8336)

* [Topi][UnitTests] Parametrized tests in test_topi_dense.py

Now, tests run for multiple data types, can be extended with
additional datatypes.

* [Topi] Separated generic-gpu nn.dense implementations into topi.gpu.dense

As a follow-up to the renaming of "gpu" to "cuda", separating
implementations that require CUDA (e.g. dense_cublas.cuda) from
implementations that require any GPU, but not necessarily a CUDA GPU
(e.g. dense_small_batch.gpu).

My intent is to pair this migration with the extension of unit tests
to cover additional GPU runtimes, migrating only implementations that
run correctly on non-CUDA GPU devices.

* [Vulkan][Codegen] Updated storage sync to avoid incorrect matmul results on some GPUs

- In ThreadAllreduceBuilder, separate out load/store so that they can
  have a memory barrier in-between.

- In Vulkan codegen, added Workgroup memory sync for subgroup thread
  sync, since the different subgroup threads can still access
  workgroup memory.  Longer-term, may need tir enhancements to
  separate out sync of control/memory.

Co-authored-by: Eric Lunderberg <[email protected]>
zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request Mar 4, 2022
…ut gpu-independent implementations (apache#8336)

* [Topi][UnitTests] Parametrized tests in test_topi_dense.py

Now, tests run for multiple data types, can be extended with
additional datatypes.

* [Topi] Separated generic-gpu nn.dense implementations into topi.gpu.dense

As a follow-up to the renaming of "gpu" to "cuda", separating
implementations that require CUDA (e.g. dense_cublas.cuda) from
implementations that require any GPU, but not necessarily a CUDA GPU
(e.g. dense_small_batch.gpu).

My intent is to pair this migration with the extension of unit tests
to cover additional GPU runtimes, migrating only implementations that
run correctly on non-CUDA GPU devices.

* [Vulkan][Codegen] Updated storage sync to avoid incorrect matmul results on some GPUs

- In ThreadAllreduceBuilder, separate out load/store so that they can
  have a memory barrier in-between.

- In Vulkan codegen, added Workgroup memory sync for subgroup thread
  sync, since the different subgroup threads can still access
  workgroup memory.  Longer-term, may need tir enhancements to
  separate out sync of control/memory.

Co-authored-by: Eric Lunderberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants