Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for mixed precision in contraction scale and bilinear #973

Merged
merged 4 commits into from
Nov 2, 2023

Conversation

bwroblew
Copy link
Contributor

@bwroblew bwroblew commented Oct 6, 2023

This PR restores the support for mixed precision in contraction scale and bilinear after it was reverted because of a conflict between CK and hipTensor.

This PR introduces support for mixed precision datatypes in the contraction operation.
It exposes a new template argument - ComputeDataType that is used to cast the data before computations.

The combinations supported after the change:

Input/Output datatype Compute datatype
fp32 fp32
fp64 fp64
fp64 fp32
fp32 fp16
fp32 bf16
fp16 fp32
bf16 fp32

This change also fixes a lot of bugs/issues I found in the reference contraction, profiler and tests:

  • incorrect use of type_convert with const datatypes in reference contraction;
  • lack of cast to the dst datatype in reference contraction;
  • incorrect order of dimensions of matrix B passed to the reference contraction;
  • incorrect order of strides of matrix B passed to the contraction kernel;
  • incorrectly calculated tolerance passed to check_err for BF16 and F16 datatypes;
  • tests not taking into account the memory layouts set in test cases - testing only two memory layouts hardcoded in strides;
  • profiler passing input datatype to reference contraction instead of the accumulation datatype.

Normally I would prepare a separate review for every issue fix, but since GitHub does not allow for dependent PRs, it would take ages to finally merge this change. That's why I decided to put it all in this single huuuge change (it's mostly huge since I'm adding a lot of new instances, fixes themselves are mostly 1-5 liners).

* Extract common functionality to separate files

* Reference contraction: Remove incorrect consts from type_converts

* Reference contraction: Add missing type_convert for dst value

* Reference contraction: Fix incorrect order of B matrix dimensions

* Add support for mixed precision in contraction scale and bilinear

* Move using statements from instances to a common file

* Move using statements from examples to a common file

* Fix the order of B matrix dimensions across examples and profiler

* Fix the computation of error threshold

* Make ComputeDataType an optional argument

* Include possible DataType -> ComputeDataType casting error in the threshold

* Remove commented code
@bwroblew bwroblew marked this pull request as ready for review October 18, 2023 08:11
@aosewski aosewski self-requested a review October 18, 2023 08:16
Copy link
Member

@saadrahim saadrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised and agreed to track long term concerns with documentation for numerical precision in documentation.

@bwroblew
Copy link
Contributor Author

Please do not merge this PR. This change must be merged together with ROCm/hipTensor#141

@illsilin illsilin merged commit 4ef704d into develop Nov 2, 2023
@illsilin illsilin deleted the bwroblew/contraction_mixed_prec_again branch November 15, 2023 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants