Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Failing Unit Tests #84

Closed
pvelesko opened this issue May 31, 2024 · 3 comments
Closed

[Issue]: Failing Unit Tests #84

pvelesko opened this issue May 31, 2024 · 3 comments

Comments

@pvelesko
Copy link

Problem Description

The following tests FAILED:
         24 - Unit_deviceAllocation_Malloc_PerThread_PrimitiveDataType (Failed)
         25 - Unit_deviceAllocation_New_PerThread_PrimitiveDataType (Failed)
         26 - Unit_deviceAllocation_Malloc_PerThread_StructDataType (Failed)
         27 - Unit_deviceAllocation_New_PerThread_StructDataType (Failed)
         28 - Unit_deviceAllocation_InOneThread_AccessInAllThreads (Failed)
         29 - Unit_deviceAllocation_Malloc_AcrossKernels (Failed)
         30 - Unit_deviceAllocation_New_AcrossKernels (Failed)
         31 - Unit_deviceAllocation_Malloc_ComplexDataType (Failed)
         32 - Unit_deviceAllocation_New_ComplexDataType (Failed)
         33 - Unit_deviceAllocation_Malloc_UnionType (Failed)
         34 - Unit_deviceAllocation_New_UnionType (Failed)
         35 - Unit_deviceAllocation_Malloc_SingleCodeObj (Failed)
         36 - Unit_deviceAllocation_New_SingleCodeObj (Failed)
         37 - Unit_deviceAllocation_Malloc_PerThread_Graph (Failed)
         38 - Unit_deviceAllocation_New_PerThread_Graph (Failed)
         39 - Unit_deviceAllocation_Malloc_DeviceFunc (Failed)
         40 - Unit_deviceAllocation_New_DeviceFunc (Failed)
         41 - Unit_deviceAllocation_VirtualFunction (Failed)
         42 - Unit_deviceAllocation_Malloc_MulKernels_MulThreads (Failed)
         43 - Unit_deviceAllocation_New_MulKernels_MulThreads (Failed)
         44 - Unit_deviceAllocation_Malloc_SingKernels_MulThreads (Failed)
         45 - Unit_deviceAllocation_New_SingKernels_MulThreads (Failed)
         46 - Unit_deviceAllocation_Malloc_MulCodeObj (Failed)
         47 - Unit_deviceAllocation_New_MulCodeObj (Failed)
        631 - Unit_hipMemPrefetchAsync_NonPageSz (Failed)
        946 - Unit_hipMemPrefetchAsync_Basic (Failed)
        1044 - Unit_hipHostMalloc_CoherentTst (Bus error)
        1045 - Unit_hipMallocManaged_CoherentTst (Bus error)
        1322 - Unit_printf_flags (Failed)
        1323 - Unit_printf_specifier (Failed)
        1615 - Unit_hipStreamPerThread_MangdMem (Failed)
        1659 - Unit_hipCGMultiGridGroupType (Bus error)
        1660 - Unit_hipCGMultiGridGroupType_BaseType (Bus error)
        1661 - Unit_hipCGMultiGridGroupType_PublicApi (Bus error)
        1662 - Unit_coalesced_groups_shfl_down (Failed)
        1663 - Unit_coalesced_groups_shfl_up (Failed)
        1664 - Unit_coalesced_groups (Failed)
        1711 - Unit_hipHostMalloc_WthEnv1 (Failed)
        1712 - Unit_hipHostMalloc_WthEnv1Flg1 (Failed)
        1713 - Unit_hipHostMalloc_WthEnv1Flg2 (Failed)
        1714 - Unit_hipHostMalloc_WthEnv1Flg3 (Failed)

Operating System

35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

CPU

13th Gen Intel(R) Core(TM) i9-13900K

GPU

AMD Radeon VII

ROCm Version

ROCm 5.7.1

ROCm Component

HIP

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

I've opened an issue on hip-tests but got no reply: ROCm/hip-tests#462
On there, I've tested ROCm 6.1

Since last official release for gfx906 was on ROCm 5.7, I've downgraded the installation and ran the tests again.

╭─pvelesko@cupcake ~/HIPAMD/hip-tests ‹rocm-5.7.x›
╰─$ dpkg -l | grep rocm                                                                                                                                                                                                                        130 ↵
ii  rocm-clang-ocl                             0.5.0.50700-63~20.04                                            amd64        OpenCL compilation with clang compiler.
ii  rocm-cmake                                 0.10.0.50700-63~20.04                                           amd64        rocm-cmake built using CMake
ii  rocm-core                                  5.7.0.50700-63~20.04                                            amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-dbgapi                                0.70.1.50700-63~20.04                                           amd64        Library to provide AMD GPU debugger API
ii  rocm-debug-agent                           2.0.3.50700-63~20.04                                            amd64        Radeon Open Compute Debug Agent (ROCdebug-agent)
ii  rocm-dev                                   5.7.0.50700-63~20.04                                            amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-device-libs                           1.0.0.50700-63~20.04                                            amd64        Radeon Open Compute - device libraries
ii  rocm-dkms                                  5.7.0.50700-63~20.04                                            amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocm-gdb                                   13.2.50700-63~20.04                                             amd64        ROCgdb
ii  rocm-llvm                                  17.0.0.23352.50700-63~20.04                                     amd64        ROCm compiler
ii  rocm-ocl-icd                               2.0.0.50700-63~20.04                                            amd64        clr built using CMake
ii  rocm-opencl                                2.0.0.50700-63~20.04                                            amd64        clr built using CMake
ii  rocm-opencl-dev                            2.0.0.50700-63~20.04                                            amd64        clr built using CMake
ii  rocm-smi-lib                               5.0.0.50700-63~20.04                                            amd64        AMD System Management libraries
ii  rocm-utils                                 5.7.0.50700-63~20.04                                            amd64        Radeon Open Compute (ROCm) Runtime software stack
ii  rocminfo                                   1.0.0.50700-63~20.04                                            amd64        Radeon Open Compute (ROCm) Runtime rocminfo tool
@cjatin
Copy link
Contributor

cjatin commented Jun 12, 2024

Looks like an PCIe atomics issue.

Can you share a few more details: What PCI-e gen you are on? I think Radeon VII supports 3.0.

Also is Large BAR enabled? It will be "4G decode" or something in your motherboard bios menu.

@pvelesko
Copy link
Author

Can you share a few more details: What PCI-e gen you are on? I think Radeon VII supports 3.0.

https://www.gigabyte.com/Motherboard/B660-DS3H-AC-DDR4-rev-10-12#kf

PCIe 4.0

Also is Large BAR enabled?

yes

@harkgill-amd
Copy link

Hi @pvelesko, let's use ROCm/hip-tests#462 to continue investigating this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants