-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How could I use cuda block shared memory in the self-defined operator? #59
Comments
I'm pretty sure you need to specify a size for the shared array You should probably define these numbers at the beginning of your file so that it's et at compile time, but you still have some flexibility. See an example here : https://github.com/ClementPinard/Pytorch-Correlation-extension/blob/master/Correlation_Module/correlation_cuda_kernel.cu#L47 |
@ClementPinard Thanks for replying !! As far as I know, cuda seems support dynamic allocated shared memory within a block, which is defined like |
Ah actually you are right, you can use dynamic shared arrays. |
But I did have called the kernel function with shared memory sized assgined, I called it like this: TestForward<scalar_t><<<grid, block, 4096, at::cuda::getCurrentCUDAStream()>>>(); I assigned 4k shared memory for each block in this way. Would you please tell me why does this not work? |
Ok sorry about misleading you, your code is mostly fine. I tried your code, and the problem seems to come from the template and the fact you use three different specializations of the template (float double and half), because doesn't allow two differently typed dynamic shared array with the same name ¯\_(ツ)_/¯ See here for more info : https://stackoverflow.com/questions/27570552/templated-cuda-kernel-with-dynamic-shared-memory in the end you need to change the line extern __shared__ scalar_t sdata[]; with the two lines extern __shared__ __align__(sizeof(scalar_t)) unsigned char sdata_uchar[];
scalar_t *sdata = reinterpret_cast<scalar_t *>(sdata_uchar); |
Thanks !!! It works now, but I have two more warnings:
Will this be fine if my code go with these two warnings? |
I am working on ubuntu16.04 with pytorch1.3 installed from conda.
My cuda version is 10.1.243 and cudnn version is 7.
I have 8 t4 gpus on my server and the gcc version is the default 5.4.
When submitting a bug report, please include the following information (where relevant):
The simplified version of my code
main.cu
is like this:and the
setup.py
is like this:I compiled it with command
python setup.py install
. And the error message is like this:What is the cause of this and how could I cope with this problem please ?
The text was updated successfully, but these errors were encountered: