Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize for ppc64le using VSX intrinsics #784

Merged
merged 5 commits into from
May 12, 2024

Conversation

penghongbo
Copy link
Contributor

I would like to contribute to the ggml in improving the performance on ppc64le. Here is the updated code by using the VSX intrinsics. Please review. Thank you.

@penghongbo
Copy link
Contributor Author

@ggerganov would you please review this PR or assign reviewers? I am working for IBM and we would like to have this optimization on ppc64le. Thank you.

@penghongbo
Copy link
Contributor Author

@ggerganov is this your preferred place to make PR for enhancement or optimization. Or shall I make PR in llama.cpp and then you sync to this repository?

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks and sorry for the delay. This implementation fits in the existing pattern and seems all new code is behind the __POWER9_VECTOR__ define which is great.

I don't have the hardware to test this - can you share some sample numbers for the speed-up that you observe?

@penghongbo
Copy link
Contributor Author

Here is the speedup of vec_dot_q float32 throughput numbers compared to current master branch on RHEL9.2 (Power10 machine) by the test-quantizer-perf -i 10000. The code was compiled by gcc-12.2.1.

Type q4_0 q4_1 q5_0 q5_1 q8_0 q2_K q3_K q4_K q5_K q6_K iq3_xxs iq4_nl iq3_s iq2_s iq4_xs
speedup 4.99 3.39 4.27 3.17 1.37 7.70 4.56 5.26 4.78 5.14 3.57 5.97 3.61 4.91 6.99

The float32 throughput of q8_0 in function quantize_row_q and quantize_row_q_dot also get a speedup as around 4.81.

I also tried to run the test-quantizer-fns and verified the code in llama.cpp to verify the functionality. All passed in the test env as mentioned above.

Thanks for your review and approval.

@ggerganov ggerganov merged commit 9149580 into ggml-org:master May 12, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants