Add inference support for Macbook silicon chip

Signed-off-by: Benjamin Huo <[email protected]>
ShishirPatil · Jul 30, 2023 · 4346157 · 4346157
1 parent de5dfe2
commit 4346157
Show file tree

Hide file tree

Showing 2 changed files with 4 additions and 0 deletions.
diff --git a/inference/README.md b/inference/README.md
@@ -56,6 +56,8 @@ For the falcon-7b model, you can use the following command:
 python3 serve/gorilla_falcon_cli.py --model-path path/to/gorilla-falcon-7b-hf-v0
 ```
 
+> Add `--device mps` if you're running on your MacBook with silicon chip
+
 ### [Optional] Batch Inference on a Prompt File
 
 After downloading the model, you need to make a jsonl file containing all the question you want to inference through Gorilla. Here is [one example](https://github.com/ShishirPatil/gorilla/blob/main/inference/example_questions/example_questions.jsonl): 

diff --git a/inference/serve/gorilla_cli.py b/inference/serve/gorilla_cli.py
@@ -67,6 +67,8 @@ def load_model(
                 }
             else:
                 kwargs["max_memory"] = {i: max_gpu_memory for i in range(num_gpus)}
+    elif device == "mps":
+        kwargs = {"torch_dtype": torch.float16}
     else:
         raise ValueError(f"Invalid device: {device}")