Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU support #192

Open
bhack opened this issue Mar 20, 2017 · 29 comments
Open

GPU support #192

bhack opened this issue Mar 20, 2017 · 29 comments

Comments

@bhack
Copy link

bhack commented Mar 20, 2017

I want to start this topic just to discuss how GPU support could be introduced in the library.

  • SyCL (lack of [full opensource compiler support but great c++ integration. Low level optimizzation limits for kernels?).
  • CLCUDAPI (header only helper)
  • OCCA
  • Boost Compute (lack of cuda support)

/cc @Randl @edgarriba

@bhack
Copy link
Author

bhack commented Apr 13, 2017

@feliwir
Copy link

feliwir commented Nov 2, 2017

Would be interesting, indeed

@SylvainCorlay
Copy link
Member

Note that we now have strong simd support for broadcasting and a number of other use cases, based on the xsimd project.

@feliwir
Copy link

feliwir commented Nov 2, 2017

@SylvainCorlay SIMD is good and all, but the performance is not comparable to GPU acceleration. Especially for deep learning applications it makes a lot of sense to use this kind of acceleration. Since this library uses numpy as an api orientation i recommend looking at pytorch, which has similiar goals but uses GPU acceleration.

@SylvainCorlay
Copy link
Member

GPU is in scope and in the roadmap. I meamt that the work done for simd actually paved the way since a lot of the required logic in xtensor is the same.

Note that frameworks like pytorch don't implement compile-time loop unfolding like xtensor does which can make xtensor faster in complex expressions.

@feliwir
Copy link

feliwir commented Nov 2, 2017

awesome, thanks for letting me know. How are you planning to implement it? Are contributions possible?

@AuroraDysis
Copy link

I recommend use MAGMA as the backend to support both GPU and CPU

@ktnyt
Copy link
Contributor

ktnyt commented Jan 24, 2019

Just wondering if there are any updates on the topic. I'd love to make contributions where possible!

@wolfv
Copy link
Member

wolfv commented Jan 24, 2019

We have GPU support on our roadmap for 2019. However, we're not yet sure how to do it concretely! So any input is highly appreciated.
And of course, contributions are very welcome!

The thing we'd probably like to start out with is mapping a container to the GPU, and evaluating a simple binary function, such as A + B.

@ktnyt
Copy link
Contributor

ktnyt commented Jan 24, 2019

Thanks for the prompt reply!

I haven't been able to make a deep dive into the code yet, but I was thinking that the implementation strategy from a recently released library called ChainerX might be of help. It basically provides device agnostic NumPy like multi-dimensional arrays for C++.
AFAIK they provide a Device abstract class that handles memory management and hides the hardware specific implementations for a core set of routines.
This is just an idea but the Device specialization for CPU specific code can be can be developed in parallel to xtensor and when it is mature enough switch out the portions of code calling the synonymous routines.
The GPU specialization can be filed in later and WIP routines can throw runtime or compile-time errors.

I'm not too familiar with the internals of xtensor so this might be an infeasible approach though.

@wolfv
Copy link
Member

wolfv commented Jan 24, 2019 via email

@miketoastmacneil
Copy link

I'm also interested in this feature although not sure how to do it! As a starting point, would also point out (as a starting point) the ArrayFire package https://github.com/arrayfire/arrayfire. My loose understanding is instead of loop fusion they perform kernel fusion.

@bhack bhack closed this as completed Feb 11, 2020
@wolfv wolfv reopened this Feb 11, 2020
@wolfv
Copy link
Member

wolfv commented Feb 11, 2020

I think we should leave this still open as it isn't solved.

@fschlimb
Copy link

This would be very interesting. Any progress/news?

@bhack
Copy link
Author

bhack commented Mar 23, 2020

I don't know if you could be interested in https://llvm.discourse.group/t/numpy-scipy-op-set/

@JohanMabille
Copy link
Member

This would be very interesting. Any progress/news?

Not yet. We are trying to get some funding to start it.

@fschlimb
Copy link

This would be very interesting. Any progress/news?

Not yet. We are trying to get some funding to start it.

Any news on this subject?

@fschlimb
Copy link

Is there any description of how you'd envision supporting GPUs, in particular through SYCL?

@JohanMabille
Copy link
Member

Unfortunately no. We don't have any funding for implementing this

@bhack bhack closed this as completed Apr 12, 2022
@antoniojkim
Copy link

Why was this issue closed? xtensor does not yet have GPU support, does it?

@JohanMabille
Copy link
Member

I don't know why it was closed, but it should definitely stay opened until we can implement it.

@JohanMabille JohanMabille reopened this Jun 24, 2022
@antoniojkim
Copy link

are there any updates on a timeline for when xtensor might have GPU support?

@Physicworld
Copy link

Hey im a quant open to work on this, i have to research more about how library works and how map your containers to GPU.

What framework is better for this? I mean cuda could not because you want the BEST performance in ant GPUs.

Maybe open cl could work or another.

Also have you check the Nvidia implementarions for std::par and the integrations with? Make all easier but not sure if Will work on your library.

If in the background you have std vectors im pretty shure will work.

@feliwir
Copy link

feliwir commented Jun 28, 2022

@Physicworld i think the easiest / most portable solution would be to use SYCL

@spectre-ns
Copy link
Contributor

@Physicworld Sycl can be used with multiple backends with full or experimental support for NVidia, AMD and Intel. I think sycl (And CUDA) have partial if not complete GPU implementations of std::algorithms so that might be some low hanging fruit.

@spectre-ns
Copy link
Contributor

Given that sycl can run on the sycl host backend it would be ideal because all the xtensor call could be refactored into sycl then one implementation would work on host or GPU side with only a runtime toggle.

@spectre-ns
Copy link
Contributor

@ksvbka
Copy link

ksvbka commented Sep 20, 2023

Thank for great lib. Any progress/news?

@JohanMabille
Copy link
Member

Nope, we are still searching for funding to implement new features in xtensor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests