-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Native Convolution Support #907
Comments
It would be good to allow backends to implement convolutions without im2col, but if you make this change as you are suggesting you are going to break conv2d support in every backend. Do you intend to fix the implementations in the backends as well? |
yes, I will split it into several PRs, the first PR will update the |
Sure, that would be good, but it may require more changes than you expect. For instance, the Metal backend does not have an internal memory pool to allocate memory for the im2col, and the CUDA backend may require significant changes to adapt |
Okay it takes me a weekend to try it on the Metal backend and find it really a lot of changes. Can I only dispatch on the SYCL backend? It will introduce additional macros in the common files |
Until we have a better way to do this, you could add a new function and let the applications choose which one to use depending on the backend they are using. This cannot be done in the common ggml files since ggml does not know what backend will be used at the time |
hi @ggerganov @slaren, current
ggml_conv_2d
is implemented only in img2col algorithms, which is quite slow for most of the cases and will introduce extra memory consumption** source: https://github.com/leejet/stable-diffusion.cpp/blob/4a6e36edc586779918535e12b4fbe0583044ee6f/README.md#L57
Shall I made the the following changes and leave the convolution implementation to the each backend which I believe there will be much more efficient implementation themselves:
@luoyu-intel cc our performance expert for awareness
The text was updated successfully, but these errors were encountered: