Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UT] Fix failed cases for tl.dot_scaled #2968

Open
LiyangLingIntel opened this issue Dec 9, 2024 · 4 comments
Open

[UT] Fix failed cases for tl.dot_scaled #2968

LiyangLingIntel opened this issue Dec 9, 2024 · 4 comments
Assignees
Labels
bug Something isn't working tests: ut

Comments

@LiyangLingIntel
Copy link
Contributor

LiyangLingIntel commented Dec 9, 2024

These are the skiped cases for tl.dot_scaled.

  • test_scaled_dot[32-64-128-False-False-True-e5m2-bf16-4-16-1]
  • test_scaled_dot[64-32-128-False-False-True-e4m3-bf16-4-16-1]

They failed for "Error during Intel loadBinary: Triton Error [ZE]: 0x78000011", which is ZE_RESULT_ERROR_INVALID_KERNEL_NAME: kernel name is not found in the module.
These cases are tested on agama 1032.

@LiyangLingIntel LiyangLingIntel added the bug Something isn't working label Dec 9, 2024
@LiyangLingIntel LiyangLingIntel self-assigned this Dec 9, 2024
@LiyangLingIntel
Copy link
Contributor Author

This issue is not critical, moving to work on #2961 and see if it still exists with the dot layout implementation.

@LiyangLingIntel LiyangLingIntel changed the title [UT] Fix test_scaled_dot[128-128-64-False-False-True-e4m3-bf16-4-16-1] [UT] Fix failed cases for tl.dot_scaled Jan 3, 2025
whitneywhtsang pushed a commit that referenced this issue Jan 8, 2025
This pull request support dot layout codegen for upcast_mxfp operation,
which could be more efficient than previous blocked layout
implementation.

The 2 skipped tests are failed for L0 runtime error, they will be
addressed in a seperate PR
#2968.
@LiyangLingIntel
Copy link
Contributor Author

The failed cases are caused by that after LLVM optimization passes, in the output module, kernel name cannot be recognized by IGC.
Issue is reported to IGC team.

whitneywhtsang added a commit that referenced this issue Jan 9, 2025
This is the last PR to improve UT pass rate for PVC rolling driver. 
There are 21 remaining failures, which are tracked in
#2755 and
#2968.

Before:
```
debug: passed: 28, failed: 0, skipped: 20, xfailed: 0, total: 48, fixme: 0, pass rate (w/o xfailed): 58.33%
interpreter: passed: 6364, failed: 0, skipped: 1, xfailed: 697, total: 7062, fixme: 0, pass rate (w/o xfailed): 99.98%
all: passed: 18671, failed: 0, skipped: 23, xfailed: 1309, total: 20003, fixme: 48, pass rate (w/o xfailed): 99.88%
```
After:
```
debug: passed: 28, failed: 0, skipped: 19, xfailed: 0, total: 47, fixme: 0, pass rate (w/o xfailed): 59.57%
interpreter: passed: 6365, failed: 0, skipped: 0, xfailed: 697, total: 7062, fixme: 0, pass rate (w/o xfailed): 100.0%
all: passed: 18672, failed: 0, skipped: 21, xfailed: 1309, total: 20002, fixme: 48, pass rate (w/o xfailed): 99.89%
```

Signed-off-by: Whitney Tsang <[email protected]>
@LiyangLingIntel
Copy link
Contributor Author

The failed cases changed to these 2 on agama 1057:

  • test_scaled_dot[32-64-128-False-False-True-e5m2-bf16-4-16-1]
  • test_scaled_dot[128-64-128-False-True-False-e4m3-bf16-4-16-1]

@LiyangLingIntel
Copy link
Contributor Author

This issue is pending on feedback from IGC team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tests: ut
Projects
None yet
Development

No branches or pull requests

2 participants