Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

Potential sigverify latency optimization #81

Open
sakridge opened this issue Sep 15, 2020 · 0 comments
Open

Potential sigverify latency optimization #81

sakridge opened this issue Sep 15, 2020 · 0 comments

Comments

@sakridge
Copy link
Contributor

The ed25519 sigverify check does the operation a* A + b * B in a single thread. This is somewhat efficient for the CPU because it saves instructions and stack spill to L1 is not as expensive on CPU. On GPU, since there are so many threads, one could do a *A with one kernel launch and in parallel do b * B. At the end, then do the addition which is pretty cheap. Each of those launches would then use a larger portion of the GPU, but in low-batch situations I think this is preferable to letting a large part of the GPU go to waste. Each scalar multiply would also have much less register pressure since it only has half the temps to deal with.

One might even want to have both options available in case the GPU encounters large vs. small batch if one is more efficient than the other.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant