Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use single precision math functions for CUDA variants #95

Merged
merged 1 commit into from
Sep 3, 2024

Conversation

MichaelSt98
Copy link
Contributor

Use single precision math functions for CUDA variants when compiling for single precision, thus:

  • pow -> powf
  • fabs -> fabsf

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks straightforward. How big is the expected performance gain?

@MichaelSt98
Copy link
Contributor Author

Looks straightforward. How big is the expected performance gain?

bin/dwarf-cloudsc-c-cuda-k-caching 1 262144 128
     NUMOMP=1, NGPTOT=262144, NPROMA=128, NGPBLKS=2048
     NUMOMP    NGPTOT  #GP-cols     #BLKS    NPROMA tid# : Time(msec)  MFlops/s     col/s
          1    262144    262144         0       128    0 :         14   2235400  17908522 @ core#
          1    262144    262144      2048       128   -1 :       1723     18987    152116 TOTAL

vs. with the "correct" math functions:

bin/dwarf-cloudsc-c-cuda-k-caching 1 262144 128
     NUMOMP=1, NGPTOT=262144, NPROMA=128, NGPBLKS=2048
     NUMOMP    NGPTOT  #GP-cols     #BLKS    NPROMA tid# : Time(msec)  MFlops/s     col/s
          1    262144    262144         0       128    0 :         10   3146297  25206016 @ core#
          1    262144    262144      2048       128   -1 :       1729     18924    151607 TOTAL

@reuterbal
Copy link
Collaborator

Very nice!

@reuterbal reuterbal merged commit 57e95e8 into develop Sep 3, 2024
18 checks passed
@reuterbal reuterbal deleted the nams-cuda-sp-math-functions branch September 3, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants