You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I am trying to use Qwen-VL to extract unimodal features for a given input image and accompanying text query. How can that be achieved? I am aware that models like BLIP-2 have a direct API (extract_features) that aids in doing this. But how can it be achieved in the context of Qwen-VL?
The text was updated successfully, but these errors were encountered:
Exactly what I was about to query. How do we get encoder embedding from Qwen2-VL for text and/or image or image/text combined input --> feature extracted.
Hello! I am trying to use Qwen-VL to extract unimodal features for a given input image and accompanying text query. How can that be achieved? I am aware that models like BLIP-2 have a direct API (extract_features) that aids in doing this. But how can it be achieved in the context of Qwen-VL?
The text was updated successfully, but these errors were encountered: