Optimize run_spin_excitation! for GPU #462

rkierulf · 2024-08-19T13:42:28Z

This pull request adds a GPU-optimized implementation of run_spin_excitation! to BlochGPU.jl. Compared with the function in BlochSimple.jl, the calculations which can be done beforehand for all time points are stored in preallocated matrices Bz, B, φ, ΔT1, and ΔT2. The sequential calculations are done inside a kernel apply_excitation!, which uses shared memory for all repeated memory accesses, and does the real / imaginary number math in Magnetization.jl directly so that the shared memory arrays store successive 32-bit values, which I think is ideal to avoid bank conflicts (https://developer.nvidia.com/blog/using-shared-memory-cuda-cc/).

The tests pass for Metal on my computer, and I've seen the excitation-heavy MRI Lab benchmark speed up by ~5x. Hopefully the same will be true for the other backends!

codecov · 2024-08-19T17:59:10Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.94%. Comparing base (1457a4c) to head (18bcaee).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #462      +/-   ##
==========================================
+ Coverage   90.91%   90.94%   +0.03%     
==========================================
  Files          53       53              
  Lines        2916     2926      +10     
==========================================
+ Hits         2651     2661      +10     
  Misses        265      265

Flag	Coverage Δ
base	`88.20% <ø> (ø)`
core	`92.69% <100.00%> (+0.13%)`	⬆️
files	`93.55% <ø> (ø)`
komamri	`93.98% <ø> (ø)`
plots	`89.30% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
...RICore/src/simulation/SimMethods/Bloch/BlochGPU.jl	`100.00% <100.00%> (ø)`
KomaMRICore/src/simulation/SimulatorCore.jl	`94.83% <100.00%> (ø)`

github-actions

KomaMRI Benchmarks

Benchmark suite	Current: `18bcaee`	Previous: `1457a4c`	Ratio
`MRI Lab/Bloch/CPU/2 thread(s)`	`243027991` ns	`227517325.5` ns	`1.07`
`MRI Lab/Bloch/CPU/4 thread(s)`	`135522120` ns	`135033124` ns	`1.00`
`MRI Lab/Bloch/CPU/8 thread(s)`	`144774394.5` ns	`171880824` ns	`0.84`
`MRI Lab/Bloch/CPU/1 thread(s)`	`408151458` ns	`396561930.5` ns	`1.03`
`MRI Lab/Bloch/GPU/CUDA`	`57005066.5` ns	`138134905` ns	`0.41`
`MRI Lab/Bloch/GPU/oneAPI`	`527703085` ns	`14155999496.5` ns	`0.037277698768672034`
`MRI Lab/Bloch/GPU/Metal`	`543126542` ns	`3171338479` ns	`0.17`
`MRI Lab/Bloch/GPU/AMDGPU`	`36831327` ns	`75482754` ns	`0.49`
`Slice Selection 3D/Bloch/CPU/2 thread(s)`	`1016083368` ns	`1168211452` ns	`0.87`
`Slice Selection 3D/Bloch/CPU/4 thread(s)`	`619287647` ns	`612565463` ns	`1.01`
`Slice Selection 3D/Bloch/CPU/8 thread(s)`	`385912553` ns	`495427593` ns	`0.78`
`Slice Selection 3D/Bloch/CPU/1 thread(s)`	`2252333331` ns	`2245843835` ns	`1.00`
`Slice Selection 3D/Bloch/GPU/CUDA`	`101397562.5` ns	`108701927` ns	`0.93`
`Slice Selection 3D/Bloch/GPU/oneAPI`	`662296008` ns	`776956866` ns	`0.85`
`Slice Selection 3D/Bloch/GPU/Metal`	`564139375` ns	`769082459` ns	`0.73`
`Slice Selection 3D/Bloch/GPU/AMDGPU`	`60677723` ns	`64232156` ns	`0.94`

This comment was automatically generated by workflow using github-action-benchmark.

rkierulf added 3 commits August 2, 2024 12:26

Commit initial changes

59f60b6

Excitation kernel working for Metal

c7592a1

Add COV_EXCL marker

e7d8643

This was linked to issues Aug 19, 2024

Create new Kernel-based Simulation Method #353

Closed

Use @localmem inside future kernel-based simulation functions to speed up memory access #354

Closed

Move B logical indexing inside kernel

18bcaee

cncastillo self-requested a review August 19, 2024 17:48

cncastillo approved these changes Aug 19, 2024

View reviewed changes

github-actions bot reviewed Aug 19, 2024

View reviewed changes

rkierulf merged commit 1b6c5be into master Aug 19, 2024
19 checks passed

rkierulf deleted the gpu-excitation branch August 19, 2024 19:23

rkierulf mentioned this pull request Aug 23, 2024

GSOC: Add GPU Explanation Section to Documentation #470

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize run_spin_excitation! for GPU #462

Optimize run_spin_excitation! for GPU #462

rkierulf commented Aug 19, 2024

codecov bot commented Aug 19, 2024 •

edited

Loading

github-actions bot left a comment

Optimize run_spin_excitation! for GPU #462

Optimize run_spin_excitation! for GPU #462

Conversation

rkierulf commented Aug 19, 2024

codecov bot commented Aug 19, 2024 • edited Loading

Codecov Report

github-actions bot left a comment

Choose a reason for hiding this comment

KomaMRI Benchmarks

codecov bot commented Aug 19, 2024 •

edited

Loading