-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Performance Comparison
hoshi-hiyouga edited this page Jan 20, 2024
·
5 revisions
Method | Bits | TGS | GRAM | Speed |
---|---|---|---|---|
HF | 16 | 2873 | 18GB | 100% |
HF+FA2 | 16 | 3652 | 17GB | 127% |
Unsloth+FA2 | 16 | 4466 | 16GB | 155% |
HF | 4 | 2746 | 9GB | 96% |
Unsloth+FA2 | 4 | 3280 | 7GB | 114% |
Method | Bits | TGS | GRAM | Speed |
---|---|---|---|---|
HF | 16 | 2392 | 18GB | 100% |
HF+FA2 | 16 | 2954 | 17GB | 123% |
Unsloth+FA2 | 16 | 4007 | 16GB | 168% |
HF | 4 | 2415 | 9GB | 101% |
Unsloth+FA2 | 4 | 3726 | 7GB | 160% |
Method | Bits | TGS | GRAM | Speed |
---|---|---|---|---|
HF | 16 | 2155 | 29GB | 100% |
HF+FA2 | 16 | 2556 | 28GB | 119% |
Unsloth+FA2 | 16 | 3400 | 27GB | 158% |
- TGS: tokens per GPU per second
- Model: LLaMA2-7B
- Batch size: 4
- Gradient accumulation: 2
- LoRA rank: 8
- Max length: 1024
- Requirements
- Usage
- Guides
- Features