You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2025-01-04 17:24:42,096] WARNING - RUN - run.py: main - 165: --reuse is not set, will not reuse previous (before one day) temporary files
[2025-01-04 17:24:42] WARNING - run.py: main - 165: --reuse is not set, will not reuse previous (before one day) temporary files
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.97it/s]
/home/mengyu/VLMEvalKit/vlmeval/vlm/llava/llava.py:294: UserWarning: Following kwargs received: {'do_sample': False, 'temperature': 0, 'max_new_tokens': 512, 'top_p': None, 'num_beams': 1}, will use as generation config.
warnings.warn(
0%| | 0/2600 [00:00<?, ?it/s]You may have used the wrong order for inputs. `images` should be passed before `text`. The `images` and `text` inputs will be swapped. This behavior will be deprecated in transformers v4.47.
/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:628: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
0%| | 0/2600 [00:00<?, ?it/s]
[2025-01-04 17:24:52,047] ERROR - RUN - run.py: main - 411: Model llava_next_interleave_7b x Dataset MUIRBench combination failed: Image features and image tokens do not match: tokens: 5832, features 2916, skipping this combination.
Traceback (most recent call last):
File "/home/mengyu/VLMEvalKit/run.py", line 299, in main
model = infer_data_job(
File "/home/mengyu/VLMEvalKit/vlmeval/inference.py", line 165, in infer_data_job
model = infer_data(
File "/home/mengyu/VLMEvalKit/vlmeval/inference.py", line 130, in infer_data
response = model.generate(message=struct, dataset=dataset_name)
File "/home/mengyu/VLMEvalKit/vlmeval/vlm/base.py", line 115, in generate
return self.generate_inner(message, dataset)
File "/home/mengyu/VLMEvalKit/vlmeval/vlm/llava/llava.py", line 402, in generate_inner
output = self.model.generate(**inputs, **self.kwargs)
File "/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/transformers/generation/utils.py", line 2252, in generate
result = self._sample(
File "/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/transformers/generation/utils.py", line 3251, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/transformers/models/llava/modeling_llava.py", line 534, in forward
raise ValueError(
ValueError: Image features and image tokens do not match: tokens: 5832, features 2916
[2025-01-04 17:24:52] ERROR - run.py: main - 411: Model llava_next_interleave_7b x Dataset MUIRBench combination failed: Image features and image tokens do not match: tokens: 5832, features 2916, skipping this combination.
Traceback (most recent call last):
File "/home/mengyu/VLMEvalKit/run.py", line 299, in main
model = infer_data_job(
File "/home/mengyu/VLMEvalKit/vlmeval/inference.py", line 165, in infer_data_job
model = infer_data(
File "/home/mengyu/VLMEvalKit/vlmeval/inference.py", line 130, in infer_data
response = model.generate(message=struct, dataset=dataset_name)
File "/home/mengyu/VLMEvalKit/vlmeval/vlm/base.py", line 115, in generate
return self.generate_inner(message, dataset)
File "/home/mengyu/VLMEvalKit/vlmeval/vlm/llava/llava.py", line 402, in generate_inner
output = self.model.generate(**inputs, **self.kwargs)
File "/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/transformers/generation/utils.py", line 2252, in generate
result = self._sample(
File "/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/transformers/generation/utils.py", line 3251, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/transformers/models/llava/modeling_llava.py", line 534, in forward
raise ValueError(
ValueError: Image features and image tokens do not match: tokens: 5832, features 2916
Seems that the model works fine with
vlmutil check llava_next_interleave_7b
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.01it/s]
/home/mengyu/VLMEvalKit/vlmeval/vlm/llava/llava.py:294: UserWarning: Following kwargs received: {'do_sample': False, 'temperature': 0, 'max_new_tokens': 512, 'top_p': None, 'num_beams': 1}, will use as generation config.
warnings.warn(
Model: llava_next_interleave_7b
You may have used the wrong order for inputs. `images` should be passed before `text`. The `images` and `text` inputs will be swapped. This behavior will be deprecated in transformers v4.47.
/home/mengyu/miniconda3/envs/vlmevalkit/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:628: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
Expanding inputs for image tokens in LLaVa should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50.
Test 1: The image shows a red apple with a green leaf attached to its stem. The apple appears to be fresh and shiny, suggesting it is ripe. The background is plain white, which highlights the apple as the main subject of the image.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
Test 2: The image shows a red apple with a green leaf attached to its stem. The apple appears to be fresh and shiny, suggesting it is ripe. The background is plain white, which highlights the apple as the main subject of the image.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
Test 3: There is only one apple in each image.
Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation.
Test 4: There is only one apple in each image.
but does not work when using MUIRBench (which is automatically downloaded by the program).
The text was updated successfully, but these errors were encountered:
I'm using transformers==4.47.1.
Command:
Output:
Seems that the model works fine with
but does not work when using MUIRBench (which is automatically downloaded by the program).
The text was updated successfully, but these errors were encountered: