Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shapes for int8 gemm benchmark #3093

Merged
merged 1 commit into from
Jan 24, 2025
Merged

Conversation

ispobock
Copy link
Collaborator

@ispobock ispobock commented Jan 24, 2025

Motivation

Ref: #3047

Benchmark on A100, measured by GB/s (higher is better):

meta-llama/Llama-3.1-8B-Instruct, TP=1
meta-llama/Llama-3.1-8B-Instruct N=6144 K=4096: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1413.802533           1545.609483
1        16.0    23354.603480          25250.115416
2        32.0    46644.362586          48414.247353
3        64.0    87045.307953          92390.889209
4       128.0   151576.637692         169551.667399
5       256.0   258540.465461         288407.611741
6       512.0   343582.331272         387500.099402
7      1024.0   390975.862482         435179.547661
8      2048.0   419011.544799         448491.508730
meta-llama/Llama-3.1-8B-Instruct N=4096 K=4096: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1297.878879           1411.268781
1        16.0    20817.527290          23204.066838
2        32.0    40825.624306          43021.110859
3        64.0    74253.010299          87045.305765
4       128.0   134869.994168         145295.083199
5       256.0   201194.532171         221413.421378
6       512.0   242049.450039         268527.361839
7      1024.0   334739.605378         375282.248684
8      2048.0   419420.161154         468540.641939
meta-llama/Llama-3.1-8B-Instruct N=28672 K=4096: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     2289.684739           2609.118893
1        16.0    36073.070930          40960.713695
2        32.0    67218.179266          75142.622888
3        64.0   127720.015925         142198.556796
4       128.0   229180.439470         252695.589121
5       256.0   339082.950178         382510.945211
6       512.0   423920.550160         460661.358667
7      1024.0   443453.064450         489466.743631
8      2048.0   441994.668843         491552.622049
meta-llama/Llama-3.1-8B-Instruct N=4096 K=14336: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1846.792753           1962.278949
1        16.0    29653.097160          30998.830706
2        32.0    58742.781210          60952.304786
3        64.0   105819.030714         117134.170884
4       128.0   204946.477307         217666.647462
5       256.0   262537.580751         294911.995095
6       512.0   305802.681919         320015.165186
7      1024.0   403643.795050         429220.005784
8      2048.0   480790.109638         503284.891697
meta-llama/Llama-3.3-70B-Instruct, TP=4
meta-llama/Llama-3.3-70B-Instruct N=2560 K=8192: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1218.959058           1369.102270
1        16.0    21176.186120          23343.270899
2        32.0    40947.511994          43358.678651
3        64.0    66200.631296          82215.868845
4       128.0   128844.817038         133559.901840
5       256.0   153459.605051         172854.362364
6       512.0   298064.457195         347587.569927
7      1024.0   353627.799559         381773.278942
8      2048.0   466144.366720         395779.683416
meta-llama/Llama-3.3-70B-Instruct N=8192 K=2048: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1211.281143           1347.938458
1        16.0    19181.610384          21539.400500
2        32.0    36890.946652          41080.028900
3        64.0    73459.703862          80010.806093
4       128.0   135118.648291         141512.274086
5       256.0   176611.783821         197328.700166
6       512.0   263620.331737         305165.924326
7      1024.0   344189.700802         399786.632495
8      2048.0   370101.545647         429876.369434
meta-llama/Llama-3.3-70B-Instruct N=14336 K=8192: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     2457.330258           2600.906439
1        16.0    39042.975418          41365.491322
2        32.0    67144.323575          69517.517576
3        64.0   125053.855257         133072.169616
4       128.0   199481.205225         207860.896632
5       256.0   320052.736065         343435.571808
6       512.0   411311.354076         438198.515047
7      1024.0   461765.680050         486717.092096
8      2048.0   474705.806679         501165.366500
meta-llama/Llama-3.3-70B-Instruct N=8192 K=7168: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1874.850419           1970.403443
1        16.0    29890.798545          31526.455092
2        32.0    58298.921098          61631.161990
3        64.0   115339.184190         120916.278726
4       128.0   203120.538714         213062.525806
5       256.0   253162.349146         270807.897604
6       512.0   365285.538727         397945.186761
7      1024.0   458320.758231         495779.061075
8      2048.0   461894.981000         493453.583726
Qwen/Qwen2.5-7B-Instruct, TP=1
Qwen/Qwen2.5-7B-Instruct N=4608 K=3584: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1266.106569           1313.565443
1        16.0    20727.337754          22172.140122
2        32.0    42248.826239          44403.799754
3        64.0    72945.606930          80390.845181
4       128.0   130625.196517         143986.209094
5       256.0   203888.026490         225231.199845
6       512.0   255327.218603         283195.989213
7      1024.0   354635.395396         408878.584532
8      2048.0   390983.054406         432694.327951
Qwen/Qwen2.5-7B-Instruct N=3584 K=3584: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1189.420074           1254.365071
1        16.0    18222.050457          20952.390242
2        32.0    37127.758650          38575.015636
3        64.0    61553.912323          74255.517300
4       128.0   119394.600515         128808.695349
5       256.0   167755.731873         187978.352765
6       512.0   206300.445730         230499.765322
7      1024.0   284993.118735         319869.906211
8      2048.0   360799.817282         402909.283574
Qwen/Qwen2.5-7B-Instruct N=37888 K=3584: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     2291.087823           2692.835789
1        16.0    36073.998284          41794.398256
2        32.0    71166.392488          80783.471889
3        64.0   135558.409267         154193.846476
4       128.0   257726.361342         282742.156498
5       256.0   355436.719748         403557.786361
6       512.0   431485.945550         481732.020101
7      1024.0   436769.072921         486443.562900
8      2048.0   438110.172523         489686.600867
Qwen/Qwen2.5-7B-Instruct N=3584 K=18944: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1921.542730           2033.870570
1        16.0    30884.496864          32348.181367
2        32.0    60923.978530          63176.749482
3        64.0   105910.344958         121548.111569
4       128.0   204717.435457         217938.262252
5       256.0   239453.551477         272000.037549
6       512.0   275727.007178         288348.187639
7      1024.0   361881.652502         384209.370036
8      2048.0   437323.725467         459600.928627
Qwen/Qwen2.5-72B-Instruct, TP=4
Qwen/Qwen2.5-72B-Instruct N=2560 K=8192: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1213.321037           1369.102270
1        16.0    20574.118314          23265.631935
2        32.0    40087.103868          43269.278632
3        64.0    66096.377540          80327.652358
4       128.0   128647.352220         133241.899664
5       256.0   153319.458220         172943.394390
6       512.0   301003.676656         346869.429207
7      1024.0   352052.010275         383736.673870
8      2048.0   463690.100286         504158.553119
Qwen/Qwen2.5-72B-Instruct N=8192 K=2048: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1208.496567           1325.841135
1        16.0    19470.222189          21240.243039
2        32.0    37299.937925          41332.362205
3        64.0    72744.959978          80010.806093
4       128.0   134712.883511         141810.510862
5       256.0   173425.490868         197039.783333
6       512.0   263491.294858         304217.413522
7      1024.0   343969.769612         400530.288894
8      2048.0   370930.360530         431167.577871
Qwen/Qwen2.5-72B-Instruct N=14784 K=8192: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     2493.248780           2645.648001
1        16.0    39722.006567          42022.134842
2        32.0    67012.351526          70624.614234
3        64.0   128245.336326         136959.283439
4       128.0   203555.465788         213836.503322
5       256.0   327545.518567         355075.896970
6       512.0   423100.447975         450972.591721
7      1024.0   470034.675317         495323.928802
8      2048.0   468699.761958         501011.066893
Qwen/Qwen2.5-72B-Instruct N=8192 K=7392: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1854.809057           1986.114348
1        16.0    29044.004322          31077.088023
2        32.0    55066.170031          57824.735730
3        64.0   113963.935890         119116.114205
4       128.0   192534.767422         207003.663376
5       256.0   240865.744755         259252.700831
6       512.0   347093.299239         369303.023680
7      1024.0   436955.909747         458631.865034
8      2048.0   443298.727863         483593.600122
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct, TP=1
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct N=3072 K=2048: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0      714.260854            779.193659
1        16.0    11448.914604          14337.163524
2        32.0    23538.626566          25591.690196
3        64.0    38524.288103          49868.394150
4       128.0    75662.391125          85104.246393
5       256.0   116282.989698         129901.717750
6       512.0   210497.662623         257483.763518
7      1024.0   263878.774980         322214.309371
8      2048.0   312972.491697         368875.784054
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct N=4096 K=2048: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0      920.658522           1033.817048
1        16.0    15663.195109          16492.424064
2        32.0    29054.011115          31037.402540
3        64.0    51920.593671          61395.155325
4       128.0    95310.320148         109591.345199
5       256.0   150872.397692         164722.371092
6       512.0   189147.118999         221163.814510
7      1024.0   266887.806762         311884.535250
8      2048.0   344189.700802         402777.915758
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct N=2048 K=2048: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0      541.954643            645.818159
1        16.0     8635.663030          10182.973327
2        32.0    17306.864039          17858.037964
3        64.0    27176.529951          35945.024649
4       128.0    52081.339774          60079.541252
5       256.0    76464.872130          86601.145167
6       512.0   150872.397692         171655.842346
7      1024.0   191570.358911         217764.042029
8      2048.0   264462.146771         307431.593856
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct N=576 K=2048: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0      171.920936            199.800003
1        16.0     2652.053866           3188.183272
2        32.0     5256.960090           5672.978377
3        64.0     7820.270966          10490.607541
4       128.0    15386.223732          16807.332758
5       256.0    22583.599635          24901.388672
6       512.0    44037.359955          50265.754914
7      1024.0    89586.066747         100132.572223
8      2048.0   165827.430231         127334.273682
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct N=21888 K=2048: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1545.207930           1785.879227
1        16.0    24335.142687          27943.430103
2        32.0    51382.688437          55353.457245
3        64.0    99605.557500         105077.751644
4       128.0   171471.664643         197569.265396
5       256.0   258130.696765         297354.623141
6       512.0   340024.640484         397981.250190
7      1024.0   386796.892863         446124.134782
8      2048.0   390273.517607         450367.293822
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct N=2048 K=10944: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0     1143.644210           1271.811242
1        16.0    18201.766952          20202.321562
2        32.0    35175.804221          37625.127640
3        64.0    54395.576046          70684.242693
4       128.0   106026.366471         112757.138377
5       256.0   129201.736206         146805.735860
6       512.0   254824.725282         292295.890989
7      1024.0   299118.966311         317445.874988
8      2048.0   392606.444940         425930.522076
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct N=2816 K=2048: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0      703.143950            840.502354
1        16.0    11383.180510          12653.514573
2        32.0    21987.285635          24925.240758
3        64.0    36949.878969          48289.401979
4       128.0    73082.541585          82905.462406
5       256.0   103841.188190         118013.391613
6       512.0   196856.374870         226770.827641
7      1024.0   247221.099816         298219.160396
8      2048.0   291294.750555         338910.226383
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct N=2048 K=1408: 
int8 scaled matmul:
 batch_size  vllm int8 gemm  sgl-kernel int8 gemm
0         1.0      400.283179            467.514203
1        16.0     6859.829191           7366.025543
2        32.0    12504.742874          13884.162972
3        64.0    21015.230313          28346.125096
4       128.0    39860.213409          44837.918938
5       256.0    62254.794106          68924.952928
6       512.0   126723.856026         142735.187726
7      1024.0   161949.541317         188666.259607
8      2048.0   223217.201798         269680.165985

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@zhyncs zhyncs merged commit 7bad7e7 into sgl-project:main Jan 24, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants