-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the rushed out multi-query kernel #44
Comments
76 tasks
Closing based on @tmm1's comment about |
tianyil1
pushed a commit
to tianyil1/vllm
that referenced
this issue
Jun 5, 2024
* Trimmed metadata - part 1 * [WIP] HPU graphs for decode * [WIP] Graph allocation algorithm reworked * Cleanup * Add graph memory estimations * Fix multinode synchronization * Create attn_bias inside HPU graph * Cleanup after rebase * Increase default VLLM_GRAPH_RESERVED_MEM to 0.3 * Remove obsolete class * Tweak default HPU graph parameters
fxmarty
pushed a commit
to fxmarty/vllm-public
that referenced
this issue
Jun 12, 2024
* adding fp8 gemm tunner to gradlib * formatting * add instructions * Linting * adding fp8 gemm tunner to gradlib formatting add instructions * Linting fp8 gradlib * fix merging issue of ROCm_performance.md * delete fp8_gemm_tuner.py * Fix linting for triton: unmeld if with constexpr * update tutorial * Fix linting again * fix typo --------- Co-authored-by: Matthew Wong <[email protected]>
yukavio
pushed a commit
to yukavio/vllm
that referenced
this issue
Jul 3, 2024
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The text was updated successfully, but these errors were encountered: