You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes. Since our implementation is based on open_clip, you can do single image inference in a similar way as outlined there, except replacing open_clip with fast_clip:
First start the python terminal, we add PYTHONPATH so that fast_clip can be correctly imported:
PYTHONPATH='./src' python
Then do the inference
importtorchfromPILimportImageimportfast_clipmodel, _, preprocess=fast_clip.create_model_and_transforms('ViT-B-32', pretrained='laion2b_s34b_b79k')
model.eval() # model in train mode by default, impacts some models with BatchNorm or stochastic depth activetokenizer=fast_clip.get_tokenizer('ViT-B-32')
image=preprocess(Image.open("CLIP.png")).unsqueeze(0)
text=tokenizer(["a diagram", "a dog", "a cat"])
withtorch.no_grad(), torch.cuda.amp.autocast():
image_features=model.encode_image(image)
text_features=model.encode_text(text)
image_features/=image_features.norm(dim=-1, keepdim=True)
text_features/=text_features.norm(dim=-1, keepdim=True)
text_probs= (100.0*image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs)
Hello! Thanks for interesting paper and repo
Could you please explain how to inference your model on 1 single image in standart CLIP way? I mean this:
The text was updated successfully, but these errors were encountered: