-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InstructBlip
] Add accelerate support for instructblip
#24488
[InstructBlip
] Add accelerate support for instructblip
#24488
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Thank you :) could you add an integration test? |
Hey @NielsRogge ! |
_no_split_modules = [ | ||
"InstructBlipAttention", | ||
"T5Block", | ||
"OPTDecoderLayer", | ||
"InstructBlipQFormerMultiHeadAttention", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be adapted dynamically depending on the models that are actually loaded, to be more future proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, from what I can see and as a side note, InstructBlip
relies on T5 or Llama only, so I did a mistake actually putting OPTDecoderLayer
. This is also fixed
if config.use_decoder_only_language_model: | ||
language_model = AutoModelForCausalLM.from_config(config.text_config) | ||
self._no_split_modules.append("LlamaDecoderLayer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, should language_model._no_split_modules
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think language_model._no_split_modules
should already contain LlamaDecoderLayer
, the purpose of adding it in self._no_split_modules
is that in from_pretrained
we never look at all the child modules that contain that attribute: https://github.com/huggingface/transformers/blob/195a9e5bdb1faa58cd58b47a23a47734d2b90d8c/src/transformers/modeling_utils.py#L2807C13-L2807C29 | then gets passed to accelerate this way:
transformers/src/transformers/modeling_utils.py
Line 2814 in 195a9e5
kwargs = {"no_split_module_classes": no_split_modules} |
language_model
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand your comment. The code loads any model from the Hub (granted it should be llama models in the actual checkpoints) but you add the specific Llama no split blocks. This will stop working if someone adds an instructblip-2 model that loads a different language model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cleaner implementation maybe would be that in post_init
we should add an utility method to look at all child modules that contain _no_split_modules
and dynamically append them to self._no_split_modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think that's better, happy to change the PR to add these changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see yes, sorry I misunderstood your comment, will update that now
else: | ||
language_model = AutoModelForSeq2SeqLM.from_config(config.text_config) | ||
self._no_split_modules.append("T5Block") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same there.
@@ -281,6 +281,8 @@ class InstructBlipPreTrainedModel(PreTrainedModel): | |||
r"language_model.decoder.embed_tokens.weight", | |||
r"language_model.lm_head.weight", | |||
] | |||
_keep_in_fp32_modules = ["wo"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last comment: this is the same issue here (this comes from T5 if I'm not mistaken). This key should also be build dynamically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, just updated it!
What does this PR do?
As per title, let's make users benefit from 8bit / 4bit loading of instructblip models
cc @amyeroberts @sgugger @NielsRogge
all
accelerate
tests pass for this modelAs a side note, as instruct blip relies on flan-t5 as backbone for some models, therefore it is important to add
To ensure inference stability in fp16 / int8 / fp4