Allow loading model with BitsAndBytes 4bit quantization, PEFT LoRA adapters. #203

idoru · 2023-06-04T23:20:58Z

Also supports loading PEFT LoRA adapters with MODEL_PEFT=true.
For detail on 4bit quantization options, see: https://huggingface.co/blog/4bit-transformers-bitsandbytes

Implements #202

LoopControl · 2023-06-05T00:55:39Z

It might be better to split the QLora stuff from the Peft Lora adapter support.

Qlora/4bit requires latest/git-master version of transformers, accelerate, and such (and I don't see that listed in the requirements.txt on this PR).

Lora-adapter support should be possible without bleeding edge versions of transformers though so that'd be great to get merged in first.

idoru · 2023-06-05T02:28:36Z

Thanks for the review! I'm very new to working on Python codebases, so haven't fully got the hang of the dependency management workflows and gotchas. I'll split them as you suggested, and fix the requirements.

peakji · 2023-06-09T06:05:23Z

Huggingface finally released QLoRa-supported versions of transformers and accelerate, which allows us to add basic 4-bit quantization support in #209.

Maybe you can simplify this PR to include only PEFT stuffs? Of course it would also be easier if you want to add more detailed options for 4-bit quantization, as dependencies are no longer an issue.

For detail on 4bit options, see: https://huggingface.co/blog/4bit-transformers-bitsandbytes

idoru · 2023-06-15T00:32:27Z

Hi, thanks for the feedback. I've updated the PR now. Tested with my very amateur QLoRA model with the following:

MODEL_TRUST_REMOTE_CODE=true \
MODEL_LOAD_IN_4BIT=true \
MODEL_4BIT_QUANT_TYPE=nf4 \
MODEL_4BIT_DOUBLE_QUANT=true \
MODEL_PEFT=true \
MODEL=idoru/falcon-40b-nf4dq-chat-oasst1-2epoch-v2 \
PORT=8080 \
python -m basaran

codecov-commenter · 2023-06-15T04:36:09Z

Codecov Report

Patch coverage: 36.84% and project coverage change: -2.61 ⚠️

Comparison is base (1677491) 94.29% compared to head (33a37c1) 91.69%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #203      +/-   ##
==========================================
- Coverage   94.29%   91.69%   -2.61%     
==========================================
  Files           7        7              
  Lines         333      349      +16     
==========================================
+ Hits          314      320       +6     
- Misses         19       29      +10

Impacted Files	Coverage Δ
basaran/model.py	`83.52% <25.00%> (-5.01%)`	⬇️
basaran/__init__.py	`96.87% <100.00%> (+0.32%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

peakji · 2023-06-16T03:55:33Z

basaran/model.py

+)
+from peft import (
+    PeftConfig,
+    PeftModel


It seems that PeftModel is not being used. Are you sure that PEFT is working correctly? (The GitHub actions environment does not have a GPU for testing)

Oops 😅 well that explains why I wasn't seeming to get any results from my LoRA fine tunings.

So I added loading. But it only works with 4bit with dev version of peft. If loading 4bit with peft 0.3.0 then it will error on inference.

peakji · 2023-06-16T03:59:36Z

basaran/model.py

+                load_in_4bit=True,
+                bnb_4bit_quant_type=quant_type,
+                bnb_4bit_use_double_quant=double_quant,
+                bnb_4bit_compute_dtype=torch.bfloat16,


Hardcoding bnb_4bit_compute_dtype to bfloat16 is indeed a reasonable choice. But similarly, can't we use the recommended configuration for bnb_4bit_quant_type and bnb_4bit_use_double_quant in most scenarios? In fact, I prefer to keep only the load_in_4bit option to reduce user confusion. What do you think?

Reference: https://huggingface.co/blog/4bit-transformers-bitsandbytes#advanced-usage

Yes, I agree it's a lot more configuration options. I just was not sure how much people are playing around with different options, so I put them all in!

I think hardcoding "nf4" might be reasonable for the 4bit quant type as the QLoRA literature recommends this, even though the default for BitsAndBytesConfig is "fp4".

double_quant seems to only be suggested if memory constrained. Maybe the 0.4 bits are not worth it? If you think so, then I can hardcode to False and remove the optional config?

Finally, with bnb_4bit_compute_dtype I was not very sure of the tradeoffs - while QLoRA supposedly uses bfloat16 the default for BitsAndBytesConfig is float32. The reference seems to suggest bfloat16 it's for faster training, I thought it might be the same for inference, but that's not explicitly called out. Maybe it's only a memory saving benefit for inference??? So is this decision a good one? 🤔

Sorry for all the questions, I'm still trying to level up on ML code, and once again appreciate the feedback!

basaran/model.py

peakji · 2023-06-16T04:12:34Z

requirements.txt

@@ -12,3 +12,5 @@ safetensors~=0.3.1
 torch>=1.12.1
 transformers[sentencepiece]~=4.30.1
 waitress~=2.1.2
+peft~=0.3.0
+scipy~=1.10.1


The missing dependency is a bug in bitsandbytes: bitsandbytes-foundation/bitsandbytes#426

Instead of specifying the version of the indirect dependency, I suggest waiting for bitsandbytes to fix the issue in version 0.39.0.

Thanks for pointing this out. It did feel a bit weird that I had to add this!

idoru force-pushed the 4bit-quantization branch from 008f8f0 to a368310 Compare June 4, 2023 23:45

idoru changed the title ~~Allow loading model with 4bit quantization.~~ Allow loading model with BitsAndBytes 4bit quantization, PEFT LoRA adapters. Jun 4, 2023

idoru force-pushed the 4bit-quantization branch from a0299fa to 4877353 Compare June 14, 2023 20:44

idoru added 2 commits June 14, 2023 16:46

Allow loading model with 4bit quantization.

e4dabb0

For detail on 4bit options, see: https://huggingface.co/blog/4bit-transformers-bitsandbytes

Support loading PEFT (LoRA) models

ef138a0

idoru force-pushed the 4bit-quantization branch from 4877353 to ef138a0 Compare June 14, 2023 20:46

Add missing deps

8153086

peakji added the enhancement New feature or request label Jun 16, 2023

peakji reviewed Jun 16, 2023

View reviewed changes

idoru added 3 commits June 16, 2023 19:37

Actually create PEFT model if config supplied

eb11734

Another attempt at loading PeftModel

e7468b1

Pass kwargs to PeftModel loading

33a37c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow loading model with BitsAndBytes 4bit quantization, PEFT LoRA adapters. #203

Allow loading model with BitsAndBytes 4bit quantization, PEFT LoRA adapters. #203

idoru commented Jun 4, 2023 •

edited

Loading

LoopControl commented Jun 5, 2023 •

edited

Loading

idoru commented Jun 5, 2023

peakji commented Jun 9, 2023 •

edited

Loading

idoru commented Jun 15, 2023 •

edited

Loading

codecov-commenter commented Jun 15, 2023 •

edited

Loading

peakji Jun 16, 2023

idoru Jun 17, 2023

idoru Jun 17, 2023

peakji Jun 16, 2023

idoru Jun 16, 2023

peakji Jun 16, 2023

idoru Jun 16, 2023

Allow loading model with BitsAndBytes 4bit quantization, PEFT LoRA adapters. #203

Are you sure you want to change the base?

Allow loading model with BitsAndBytes 4bit quantization, PEFT LoRA adapters. #203

Conversation

idoru commented Jun 4, 2023 • edited Loading

LoopControl commented Jun 5, 2023 • edited Loading

idoru commented Jun 5, 2023

peakji commented Jun 9, 2023 • edited Loading

idoru commented Jun 15, 2023 • edited Loading

codecov-commenter commented Jun 15, 2023 • edited Loading

Codecov Report

peakji Jun 16, 2023

Choose a reason for hiding this comment

idoru Jun 17, 2023

Choose a reason for hiding this comment

idoru Jun 17, 2023

Choose a reason for hiding this comment

peakji Jun 16, 2023

Choose a reason for hiding this comment

idoru Jun 16, 2023

Choose a reason for hiding this comment

peakji Jun 16, 2023

Choose a reason for hiding this comment

idoru Jun 16, 2023

Choose a reason for hiding this comment

idoru commented Jun 4, 2023 •

edited

Loading

LoopControl commented Jun 5, 2023 •

edited

Loading

peakji commented Jun 9, 2023 •

edited

Loading

idoru commented Jun 15, 2023 •

edited

Loading

codecov-commenter commented Jun 15, 2023 •

edited

Loading