Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLIP] Update clip dependencies, README.md #1203

Merged
merged 1 commit into from
Aug 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ def _parse_requirements_file(file_path):
_haystack_integration_deps = _parse_requirements_file(_haystack_requirements_file_path)
_clip_deps = [
"open_clip_torch==2.20.0",
"scipy==1.10.1",
"scipy<1.9.2,>=1.8",
f"{'nm-transformers' if is_release else 'nm-transformers-nightly'}",
]

Expand Down
6 changes: 3 additions & 3 deletions src/deepsparse/clip/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Before you start your adventure with the DeepSparse Engine, make sure that your
### Model Format
By default, to deploy CLIP models using the DeepSparse Engine, it is required to supply the model in the ONNX format. This grants the engine the flexibility to serve any model in a framework-agnostic environment. To see examples of pulling CLIP models and exporting them to ONNX, please see the [sparseml documentation](https://github.com/neuralmagic/sparseml/tree/main/integrations/clip).

For the Zero-shot image classification workflow, two ONNX models are required, a visual model for CLIP's visual branch, and a text model for CLIP's text branch. Both of these models can be produced through the sparseml integration linked above. For caption generation, specific models called CoCa models are required and instructions on how to export CoCa models are also provided in the sparseml documentation above. The CoCa exporting pathway will generate one additional decoder model, along with the text and visual models.
For the Zero-shot image classification workflow, two ONNX models are required, a visual model for CLIP's visual branch, and a text model for CLIP's text branch. Both of these models can be produced through the sparseml integration linked above. For caption generation, specific models called CoCa models are required and instructions on how to export CoCa models are also provided in the sparseml documentation. The CoCa exporting pathway will generate one additional decoder model, along with the text and visual models.

### Deployment examples:
The following example uses pipelines to run the CLIP models for inference. For Zero-shot prediction, the pipeline ingests a list of images and a list of possible classes. A class is returned for each of the provided images. For caption generation, only an image file is required.
Expand Down Expand Up @@ -60,8 +60,8 @@ from deepsparse.clip import (
possible_classes = ["ice cream", "an elephant", "a dog", "a building", "a church"]
images = ["basilica.jpg", "buddy.jpeg", "thailand.jpg"]

model_path_text = "zeroshot_research/text/model.onnx"
model_path_visual = "zeroshot_research/visual/model.onnx"
model_path_text = "zeroshot_research/clip_text.onnx"
model_path_visual = "zeroshot_research/clip_visual.onnx"

kwargs = {
"visual_model_path": model_path_visual,
Expand Down