-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Performance Comparison
hoshi-hiyouga edited this page Dec 23, 2023
·
5 revisions
LoRA Modules: q_proj,v_proj
Method | Bits | TGS | GRAM | Speed |
---|---|---|---|---|
HF | 16 | 2873 | 18GB | 100% |
HF+FA2 | 16 | 3652 | 17GB | 127% |
Unsloth+FA2 | 16 | 4466 | 16GB | 155% |
HF | 4 | 2746 | 9GB | 96% |
Unsloth+FA2 | 4 | 3280 | 7GB | 114% |
LoRA Modules: q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj
Method | Bits | TGS | GRAM | Speed |
---|---|---|---|---|
HF | 16 | 2392 | 18GB | 100% |
HF+FA2 | 16 | 2954 | 17GB | 123% |
Unsloth+FA2 | 16 | 4007 | 16GB | 168% |
HF | 4 | 2415 | 9GB | 101% |
Unsloth+FA2 | 4 | 3726 | 7GB | 160% |
- TGS: tokens per GPU per second
- Model: LLaMA2-7B
- GPU: NVIDIA A100 * 1
- Batch size: 4
- Gradient accumulation: 2
- LoRA rank: 8
- Max length: 1024
- Requirements
- Usage
- Guides
- Features