Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update GemmaRMSNorm #1232

Merged
merged 3 commits into from
Aug 28, 2024
Merged

feat: update GemmaRMSNorm #1232

merged 3 commits into from
Aug 28, 2024

Conversation

zhyncs
Copy link
Member

@zhyncs zhyncs commented Aug 27, 2024

Motivation

Await the review and release of FlashInfer
ref flashinfer-ai/flashinfer#477

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@zhyncs zhyncs self-assigned this Aug 27, 2024
@zhyncs zhyncs marked this pull request as draft August 27, 2024 13:26
@zhyncs zhyncs added enhancement New feature or request wip labels Aug 27, 2024
@zhyncs
Copy link
Member Author

zhyncs commented Aug 27, 2024

gsm8k
python3 -m sglang.launch_server --model google/gemma-2-2b --disable-radix-cache
python3 benchmark/gsm8k/bench_sglang.py

current

Latency: 15.371
Invalid: 0.005
Accuracy: 0.270

Latency: 15.183
Invalid: 0.000
Accuracy: 0.260

main

Latency: 19.638
Invalid: 0.005
Accuracy: 0.265

Latency: 19.716
Invalid: 0.005
Accuracy: 0.260

@zhyncs zhyncs removed the wip label Aug 28, 2024
@zhyncs zhyncs marked this pull request as ready for review August 28, 2024 12:01
@zhyncs zhyncs requested a review from merrymercy August 28, 2024 12:42
@zhyncs zhyncs merged commit b1a540e into sgl-project:main Aug 28, 2024
8 checks passed
@zhyncs zhyncs deleted the gemma branch August 28, 2024 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants