You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Early tests are indicating that this might get us 10% speedup on 3x3 stencils. And the expectation is that larger stencils will see more speedup, because each cell is accessed more times (once by each time it forms part of someone's neighborhood).
(An earlier estimate of larger gains was wrong, because I was comparing different stencil sizes. Which is actually a useful insight - the speed difference between a 5-point Laplacian and a 9-point Laplacian is significant.)
Here XR, YR, ZR are the radii around each cell that we need to retrieve for the stencils (here 1). LX, LY, LZ are the work group size, and must be chosen such that we don't exceed CL_DEVICE_LOCAL_MEM_SIZE or CL_DEVICE_MAX_WORK_GROUP_SIZE.
We can use local memory for formula image rules.
The text was updated successfully, but these errors were encountered:
Early tests are indicating that this might get us 10% speedup on 3x3 stencils. And the expectation is that larger stencils will see more speedup, because each cell is accessed more times (once by each time it forms part of someone's neighborhood).
(An earlier estimate of larger gains was wrong, because I was comparing different stencil sizes. Which is actually a useful insight - the speed difference between a 5-point Laplacian and a 9-point Laplacian is significant.)
A hand-written kernel:
Here XR, YR, ZR are the radii around each cell that we need to retrieve for the stencils (here 1). LX, LY, LZ are the work group size, and must be chosen such that we don't exceed
CL_DEVICE_LOCAL_MEM_SIZE
orCL_DEVICE_MAX_WORK_GROUP_SIZE
.We can use local memory for formula image rules.
The text was updated successfully, but these errors were encountered: