Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][E2E] GroupAlgorithm/root_group.cpp timed out on OCL AMD CPU #13032

Closed
uditagarwal97 opened this issue Mar 14, 2024 · 4 comments
Closed
Labels
bug Something isn't working confirmed

Comments

@uditagarwal97
Copy link
Contributor

Describe the bug

Example:
https://github.com/intel/llvm/actions/runs/8274840200/job/22641210917
https://github.com/intel/llvm/actions/runs/8274840200/job/22670316284

********************
Slowest Tests:
--------------------------------------------------------------------------
600.01s: SYCL :: GroupAlgorithm/root_group.cpp

To reproduce

DPC++ commit id: bca61db

Environment

sycl-ls --verbose output:

[opencl:cpu][opencl:0] Intel(R) OpenCL, AMD Ryzen 7 5800X 8-Core Processor              OpenCL 3.0 (Build 0) [20.16.10.0.17_160000]
[opencl:fpga][opencl:1] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.10.0.17_160000]

Platforms: 2
Platform [#1]:
    Version  : OpenCL 3.0 LINUX
    Name     : Intel(R) OpenCL
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#0]:
        Type       : cpu
        Version    : OpenCL 3.0 (Build 0)
        Name       : AMD Ryzen 7 5800X 8-Core Processor             
        Vendor     : Intel(R) Corporation
        Driver     : 2023.16.10.0.17_160000
        Aspects    : cpu fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_oneapi_srgb ext_intel_legacy_image ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group
        info::device::sub_group_sizes: 4 8 16 64
Platform [#2]:
    Version  : OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
    Name     : Intel(R) FPGA Emulation Platform for OpenCL(TM)
    Vendor   : Intel(R) Corporation
    Devices  : 1
        Device [#1]:
        Type       : fpga
        Version    : OpenCL 1.2 
        Name       : Intel(R) FPGA Emulation Device
        Vendor     : Intel(R) Corporation
        Driver     : 2023.16.10.0.17_160000
        Aspects    : accelerator fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_atomic_host_allocations usm_atomic_shared_allocations ext_oneapi_srgb ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_intel_fpga_task_sequence
        info::device::sub_group_sizes: 4 8 16 32 64
default_selector()      : cpu, Intel(R) OpenCL, AMD Ryzen 7 5800X 8-Core Processor              OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
accelerator_selector()  : fpga, Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.10.0.17_160000]
cpu_selector()          : cpu, Intel(R) OpenCL, AMD Ryzen 7 5800X 8-Core Processor              OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
gpu_selector()          : No device of requested type available. -1 (PI_ERRO...
custom_selector(gpu)    : No device of requested type available. -1 (PI_ERRO...
custom_selector(cpu)    : cpu, Intel(R) OpenCL, AMD Ryzen 7 5800X 8-Core Processor              OpenCL 3.0 (Build 0) [2023.16.10.0.17_160000]
custom_selector(acc)    : fpga, Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.10.0.17_160000]

Additional context

No response

@uditagarwal97 uditagarwal97 added the bug Something isn't working label Mar 14, 2024
@uditagarwal97 uditagarwal97 changed the title [E2E] GroupAlgorithm/root_group.cpp timed out on OCL AMD CPU [SYCL][E2E] GroupAlgorithm/root_group.cpp timed out on OCL AMD CPU Mar 14, 2024
@JackAKirk
Copy link
Contributor

JackAKirk commented Mar 18, 2024

I think there are some issues in the implementation: e.g.: 66d35e2#r139907569

The cuda impl also will not be behaving as expected.

@JackAKirk
Copy link
Contributor

I think there are some issues in the implementation: e.g.: 66d35e2#r139907569

The cuda impl also will not be behaving as expected.

I can't reproduce the timeout on gfx90a using 5.7.1.
I looked into the cuda docs for grid_group synchronize. Whilst the cuda version is calling barrier.sync and the way it is implemented in dpc++ calls bar.sync, I'm not sure the difference matters in this case so I guess that what dpc++ does is also OK: probably explains the test passing.
For hip I am not sure yet what is correct, however what dpc++ does to just call a s_barrier with a fence might also be ok (although how we implement group_barrier is also I think not strictly correct and there are some patches for this #12872 etc)
Since I can't reproduce the fail on gfx90a I think it is quite likely that this is just another symptom of using an unsupported device:
#12997 (comment)

@JackAKirk
Copy link
Contributor

Wait, I read this completely wrong. I saw amd and thought amd gpu. It is an opencl issue and doesn't relate to hip at all.

@uditagarwal97
Copy link
Contributor Author

It turned out to be a CI infrastructure related issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working confirmed
Projects
None yet
Development

No branches or pull requests

2 participants