How to benchmark for speedup and acceptance rate? #12

singularity-s0 · 2024-04-22T12:50:05Z

Sorry for asking a possibly obvious question but it would be better if the documentation makes this clear.

cyLi-Tiger · 2024-04-23T09:18:38Z

+1 How to benchmark the speed up? I ran the example codes and didn't see obvious acceleration. How to reproduce 4.04x accelerate of Llama2-7b on A100?

dreaming-panda · 2024-04-24T16:07:38Z

+1 How to benchmark the speed up? I ran the example codes and didn't see obvious acceleration. How to reproduce 4.04x accelerate of Llama2-7b on A100?

To run Sequoia:
CUDA_VISIBLE_DEVICES=0 python testbed_greedy.py --model JackFram/llama-68m --target meta-llama/Llama-2-7b-hf --T 0.6 --P 1.0 --start 0 --end 200 --M 384 --growmap ../A100_growmaps/68m_7b/growmaps/A100-C4-68m-7b-greedy.pt --Mode greedy --dataset c4
To run baseline:
CUDA_VISIBLE_DEVICES=0 python testbed_greedy.py --model JackFram/llama-68m --target meta-llama/Llama-2-7b-hf --T 0.6 --P 1.0 --start 0 --end 200 --M 384 --growmap ../A100_growmaps/68m_7b/growmaps/A100-C4-68m-7b-greedy.pt --Mode baseline --dataset c4

As the framework is written in Huggingface, the baseline should be around 23ms ~ 25ms per token, Sequoia should be 6ms ~ 7ms per token.

singularity-s0 · 2024-04-24T16:13:15Z

Thanks for the response. How about acceptance rate? What does decoding step and large model step mean in the output?

dreaming-panda · 2024-04-24T16:19:08Z

decoding step means how many tokens are generated. large model step means how many times large model do verification. decoding step / large model step reflects how many tokens are correctly predicted with Sequoia's tree.

acceptance rate needs to be independently measured with
python test_accept.py --model JackFram/llama-68m --target meta-llama/Llama-2-7b-hf
--T 0.6 --P 1.0 --start 0 --end 200 --M 288 --W 32
--ALG stochastic --dataset cnn \

singularity-s0 · 2024-04-24T16:24:18Z

Thank you. This answers all my questions.

briskerkazoos · 2024-04-25T09:18:21Z

After testing both baseline and greedy on C4 dataset on A100, I get the following result:

Baseline: total time :110.10318s, latency :0.02298s, decoding step: 4791
Greedy: total time :144.56247s, latency :0.00813s, decoding step: 17778, large model step: 4605, 3.8605863192182412

It seems that more tokens are being generated in greedy mode than in baseline mode. Although the generation latency is the same as expected, I wonder if it is rather unfair to compare latency when generating different tokens. Would it be better if we set a fixed sequence length and compare generation time instead?

decoding step / large model step reflects how many tokens are correctly predicted with Sequoia's tree.

Just to make sure I understand this correctly, if all drafts are wrong, then decoding step / large model step = 1. And if decoding step / large model step = 2, it means that on average, the drafting model gets 1 token correct per draft. Is this right?

dreaming-panda · 2024-04-25T15:26:59Z

Your understanding is correct. We only allow baseline to generate 32 tokens is because in some experiments, such as Vicuna33B, running baseline can cost a lot of time.

You can change this manually if you want. What you need to modify is inner_decoding_step < 32 in testbed.py.
Also, we plan to update the code in the following weeks. We will solve the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to benchmark for speedup and acceptance rate? #12

How to benchmark for speedup and acceptance rate? #12

singularity-s0 commented Apr 22, 2024

cyLi-Tiger commented Apr 23, 2024 •

edited

Loading

dreaming-panda commented Apr 24, 2024 •

edited

Loading

singularity-s0 commented Apr 24, 2024 •

edited

Loading

dreaming-panda commented Apr 24, 2024

singularity-s0 commented Apr 24, 2024

briskerkazoos commented Apr 25, 2024

dreaming-panda commented Apr 25, 2024

How to benchmark for speedup and acceptance rate? #12

How to benchmark for speedup and acceptance rate? #12

Comments

singularity-s0 commented Apr 22, 2024

cyLi-Tiger commented Apr 23, 2024 • edited Loading

dreaming-panda commented Apr 24, 2024 • edited Loading

singularity-s0 commented Apr 24, 2024 • edited Loading

dreaming-panda commented Apr 24, 2024

singularity-s0 commented Apr 24, 2024

briskerkazoos commented Apr 25, 2024

dreaming-panda commented Apr 25, 2024

cyLi-Tiger commented Apr 23, 2024 •

edited

Loading

dreaming-panda commented Apr 24, 2024 •

edited

Loading

singularity-s0 commented Apr 24, 2024 •

edited

Loading