[FEA] Does it supports quantization-matrix-mul? #2044

bianxuxuxu · 2025-01-17T03:53:55Z

The mixed-dtype-gemm example supports upcasting from a narrower (fewer bits) to a wider (more bits) type, but I need a quantization gemm which is from a wider to a narrow type.
For example, fp16xfp8 mm, we need do fp16 quantized to fp8 firstly(fp16/quant_scale, the quant_scale is provided), then do the fp8xfp8 gemm. How to do this?

bianxuxuxu added ? - Needs Triage feature request New feature or request labels Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Does it supports quantization-matrix-mul? #2044

[FEA] Does it supports quantization-matrix-mul? #2044

bianxuxuxu commented Jan 17, 2025

[FEA] Does it supports quantization-matrix-mul? #2044

[FEA] Does it supports quantization-matrix-mul? #2044

Comments

bianxuxuxu commented Jan 17, 2025