GPU support #192

bhack · 2017-03-20T11:03:51Z

I want to start this topic just to discuss how GPU support could be introduced in the library.

SyCL (lack of [full opensource compiler support but great c++ integration. Low level optimizzation limits for kernels?).
CLCUDAPI (header only helper)
OCCA
Boost Compute (lack of cuda support)

/cc @Randl @edgarriba

bhack · 2017-04-13T11:51:10Z

See also section 3.4 in https://github.com/kokkos/array_ref/blob/master/proposals/P0331.rst

feliwir · 2017-11-02T12:42:41Z

Would be interesting, indeed

SylvainCorlay · 2017-11-02T12:46:02Z

Note that we now have strong simd support for broadcasting and a number of other use cases, based on the xsimd project.

feliwir · 2017-11-02T14:50:31Z

@SylvainCorlay SIMD is good and all, but the performance is not comparable to GPU acceleration. Especially for deep learning applications it makes a lot of sense to use this kind of acceleration. Since this library uses numpy as an api orientation i recommend looking at pytorch, which has similiar goals but uses GPU acceleration.

SylvainCorlay · 2017-11-02T14:59:53Z

GPU is in scope and in the roadmap. I meamt that the work done for simd actually paved the way since a lot of the required logic in xtensor is the same.

Note that frameworks like pytorch don't implement compile-time loop unfolding like xtensor does which can make xtensor faster in complex expressions.

feliwir · 2017-11-02T15:28:24Z

awesome, thanks for letting me know. How are you planning to implement it? Are contributions possible?

AuroraDysis · 2018-03-11T14:40:51Z

I recommend use MAGMA as the backend to support both GPU and CPU

ktnyt · 2019-01-24T07:57:19Z

Just wondering if there are any updates on the topic. I'd love to make contributions where possible!

wolfv · 2019-01-24T08:00:34Z

We have GPU support on our roadmap for 2019. However, we're not yet sure how to do it concretely! So any input is highly appreciated.
And of course, contributions are very welcome!

The thing we'd probably like to start out with is mapping a container to the GPU, and evaluating a simple binary function, such as A + B.

ktnyt · 2019-01-24T08:25:13Z

Thanks for the prompt reply!

I haven't been able to make a deep dive into the code yet, but I was thinking that the implementation strategy from a recently released library called ChainerX might be of help. It basically provides device agnostic NumPy like multi-dimensional arrays for C++.
AFAIK they provide a Device abstract class that handles memory management and hides the hardware specific implementations for a core set of routines.
This is just an idea but the Device specialization for CPU specific code can be can be developed in parallel to xtensor and when it is mature enough switch out the portions of code calling the synonymous routines.
The GPU specialization can be filed in later and WIP routines can throw runtime or compile-time errors.

I'm not too familiar with the internals of xtensor so this might be an infeasible approach though.

wolfv · 2019-01-24T08:39:00Z

definitely a good idea to look at chainerx! Am Do., 24. Jan. 2019 um 17:25 Uhr schrieb ktnyt <[email protected]>:

…

Thanks for the prompt reply! I haven't been able to make a deep dive into the code yet, but I was thinking that the implementation strategy from a recently released library called ChainerX <https://github.com/chainer/chainer/tree/master/chainerx_cc> might be of help. It basically provides device agnostic NumPy like multi-dimensional arrays for C++. AFAIK they provide a Device abstract class that handles memory management and hides the hardware specific implementations for a core set of routines. This is just an idea but the Device specialization for CPU specific code can be can be developed in parallel to xtensor and when it is mature enough switch out the portions of code calling the synonymous routines. The GPU specialization can be filed in later and WIP routines can throw runtime or compile-time errors. I'm not too familiar with the internals of xtensor so this might be an infeasible approach though. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#192 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA2BPsBBpcRVFf9SXBkK3Lm5AVGmwLoRks5vGW3qgaJpZM4MiR_Y> .

miketoastmacneil · 2019-05-01T19:31:31Z

I'm also interested in this feature although not sure how to do it! As a starting point, would also point out (as a starting point) the ArrayFire package https://github.com/arrayfire/arrayfire. My loose understanding is instead of loop fusion they perform kernel fusion.

wolfv · 2020-02-11T23:10:51Z

I think we should leave this still open as it isn't solved.

fschlimb · 2020-03-23T14:21:16Z

This would be very interesting. Any progress/news?

bhack · 2020-03-23T21:25:35Z

I don't know if you could be interested in https://llvm.discourse.group/t/numpy-scipy-op-set/

JohanMabille · 2020-03-27T13:56:24Z

This would be very interesting. Any progress/news?

Not yet. We are trying to get some funding to start it.

fschlimb · 2021-11-25T10:18:16Z

This would be very interesting. Any progress/news?

Not yet. We are trying to get some funding to start it.

Any news on this subject?

fschlimb · 2021-11-30T17:05:26Z

Is there any description of how you'd envision supporting GPUs, in particular through SYCL?

JohanMabille · 2021-11-30T21:27:14Z

Unfortunately no. We don't have any funding for implementing this

antoniojkim · 2022-06-13T23:05:56Z

Why was this issue closed? xtensor does not yet have GPU support, does it?

JohanMabille · 2022-06-24T09:44:46Z

I don't know why it was closed, but it should definitely stay opened until we can implement it.

antoniojkim · 2022-06-24T13:18:09Z

are there any updates on a timeline for when xtensor might have GPU support?

Physicworld · 2022-06-28T00:19:20Z

Hey im a quant open to work on this, i have to research more about how library works and how map your containers to GPU.

What framework is better for this? I mean cuda could not because you want the BEST performance in ant GPUs.

Maybe open cl could work or another.

Also have you check the Nvidia implementarions for std::par and the integrations with? Make all easier but not sure if Will work on your library.

If in the background you have std vectors im pretty shure will work.

feliwir · 2022-06-28T07:12:39Z

@Physicworld i think the easiest / most portable solution would be to use SYCL

spectre-ns · 2022-07-03T16:09:54Z

@Physicworld Sycl can be used with multiple backends with full or experimental support for NVidia, AMD and Intel. I think sycl (And CUDA) have partial if not complete GPU implementations of std::algorithms so that might be some low hanging fruit.

spectre-ns · 2022-07-03T16:12:28Z

Given that sycl can run on the sycl host backend it would be ideal because all the xtensor call could be refactored into sycl then one implementation would work on host or GPU side with only a runtime toggle.

spectre-ns · 2022-07-03T17:08:20Z

https://github.com/oneapi-src/oneAPI-samples

ksvbka · 2023-09-20T08:18:13Z

Thank for great lib. Any progress/news?

JohanMabille · 2023-09-20T12:26:53Z

Nope, we are still searching for funding to implement new features in xtensor.

wolfv mentioned this issue Mar 20, 2017

Raw data API #194

Closed

JohanMabille added the Feature Request label Mar 22, 2017

wolfv mentioned this issue Apr 19, 2018

when do we have gpu support? #797

Closed

cerati mentioned this issue Apr 22, 2019

Performance tests of xtensor #1530

Open

bhack closed this as completed Feb 11, 2020

wolfv reopened this Feb 11, 2020

bhack closed this as completed Apr 12, 2022

JohanMabille reopened this Jun 24, 2022

spectre-ns mentioned this issue Jan 4, 2025

[WIP] Added skeleton of batch based GPU assignment #2820

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU support #192

GPU support #192

bhack commented Mar 20, 2017

bhack commented Apr 13, 2017

feliwir commented Nov 2, 2017

SylvainCorlay commented Nov 2, 2017

feliwir commented Nov 2, 2017

SylvainCorlay commented Nov 2, 2017

feliwir commented Nov 2, 2017 •

edited

Loading

AuroraDysis commented Mar 11, 2018

ktnyt commented Jan 24, 2019

wolfv commented Jan 24, 2019

ktnyt commented Jan 24, 2019

wolfv commented Jan 24, 2019 via email

miketoastmacneil commented May 1, 2019

wolfv commented Feb 11, 2020

fschlimb commented Mar 23, 2020

bhack commented Mar 23, 2020

JohanMabille commented Mar 27, 2020

fschlimb commented Nov 25, 2021

fschlimb commented Nov 30, 2021

JohanMabille commented Nov 30, 2021

antoniojkim commented Jun 13, 2022

JohanMabille commented Jun 24, 2022

antoniojkim commented Jun 24, 2022

Physicworld commented Jun 28, 2022

feliwir commented Jun 28, 2022

spectre-ns commented Jul 3, 2022

spectre-ns commented Jul 3, 2022

spectre-ns commented Jul 3, 2022

ksvbka commented Sep 20, 2023

JohanMabille commented Sep 20, 2023

GPU support #192

GPU support #192

Comments

bhack commented Mar 20, 2017

bhack commented Apr 13, 2017

feliwir commented Nov 2, 2017

SylvainCorlay commented Nov 2, 2017

feliwir commented Nov 2, 2017

SylvainCorlay commented Nov 2, 2017

feliwir commented Nov 2, 2017 • edited Loading

AuroraDysis commented Mar 11, 2018

ktnyt commented Jan 24, 2019

wolfv commented Jan 24, 2019

ktnyt commented Jan 24, 2019

wolfv commented Jan 24, 2019 via email

miketoastmacneil commented May 1, 2019

wolfv commented Feb 11, 2020

fschlimb commented Mar 23, 2020

bhack commented Mar 23, 2020

JohanMabille commented Mar 27, 2020

fschlimb commented Nov 25, 2021

fschlimb commented Nov 30, 2021

JohanMabille commented Nov 30, 2021

antoniojkim commented Jun 13, 2022

JohanMabille commented Jun 24, 2022

antoniojkim commented Jun 24, 2022

Physicworld commented Jun 28, 2022

feliwir commented Jun 28, 2022

spectre-ns commented Jul 3, 2022

spectre-ns commented Jul 3, 2022

spectre-ns commented Jul 3, 2022

ksvbka commented Sep 20, 2023

JohanMabille commented Sep 20, 2023

feliwir commented Nov 2, 2017 •

edited

Loading