Skip to content

Commit

Permalink
[Kernel] Support running GPTQ 8-bit models in Marlin (vllm-project#4533)
Browse files Browse the repository at this point in the history
  • Loading branch information
alexm-redhat authored and joerunde committed May 6, 2024
1 parent 9ff783f commit d3ab1c7
Show file tree
Hide file tree
Showing 7 changed files with 553 additions and 324 deletions.
4 changes: 3 additions & 1 deletion csrc/ops.h
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ torch::Tensor gptq_marlin_gemm(
torch::Tensor &g_idx,
torch::Tensor &perm,
torch::Tensor &workspace,
int64_t num_bits,
int64_t size_m,
int64_t size_n,
int64_t size_k,
Expand All @@ -141,7 +142,8 @@ torch::Tensor gptq_marlin_repack(
torch::Tensor &b_q_weight,
torch::Tensor &perm,
int64_t size_k,
int64_t size_n);
int64_t size_n,
int64_t num_bits);
#endif

void squeezellm_gemm(
Expand Down
Loading

0 comments on commit d3ab1c7

Please sign in to comment.