-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenVINO support #1037
OpenVINO support #1037
Conversation
Wow - this is quite interesting. First time I hear about OpenVINO - will try to get familiar.
Do you think it makes sense to do the same for Core ML so that the implementations follow similar pattern? |
@ggerganov, thanks for taking a look!
I think that makes sense, especially if CoreML exposes parameters to control how inference is performed -- but to be honest I know very little about CoreML. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes - should be good to merge after that
CMakeLists.txt
Outdated
@@ -310,6 +321,7 @@ add_library(${TARGET} | |||
${GGML_OPENCL_SOURCES} | |||
whisper.h | |||
whisper.cpp | |||
${OpenVINO_SOURCES} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use OPENVINO_SOURCES
However, why not make a separate target whisper.openvino
similar to how whisper.coreml
works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me try it again. I had originally tried to add it as a separate target and had some weird issues (something like the corresponding .Lib wasn't being generated in Windows build)-- I intended to circle back though, so thanks for the reminder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, see latest commit (76c4186)
I added openvino-encoder to dedicated OBJECT target:
add_library(${TARGET} OBJECT
openvino/whisper-openvino-encoder.h
openvino/whisper-openvino-encoder.cpp
)
And this target is linked to whisper just like coreml:
if (WHISPER_OPENVINO)
target_link_libraries(${TARGET} PRIVATE whisper.openvino)
endif()
I was thinking of making it SHARED, but I think it'd be more of a hassle to have to carry around a separate .dll / .so..
This builds fine, and did some minimal testing on Windows 11 & Ubuntu.
Co-authored-by: Georgi Gerganov <[email protected]>
…ino_path_cache to non-const func signatures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff 👍
Hopefully I didn't break something - haven't tested
Hi! And when I run: All the other instructions was executed with success Distro info Amazing work, and thanks for sharing. |
Hi @Nabaralathep, Looks like I forgot the
Let me know how it goes. |
Hi @RyanMetcalfeInt8, 1.When I run 2.When I ran make I received an error, it turns out that I had the debian arm version and my computer is x86_64, but when I went to the repository to download the appropriate one, I discovered that all the packages for debian are arm so what? In any case, I really appreciate (you don't know how much) your answer, since at least it made me understand the problem, thank you very much and your work is incredible. |
Does this implementation of OpenVINO support the GNA in 10th to 14th generation Intel CPUs? Intel advertises it as follows:
They also later mention it could be used for tasks such as speech-to-text, and I'm curious if/how well whisper would perform on it. Setting the OpenVINO device to "gna" just throws an error with assertion failed
|
OpenVINO Ubuntu packages are compatible with Debian OS. You can use OpenVINO archives as well as install via apt and Debian packages. |
* openvino: use OpenVINO encoder inference * openvino: add python script for OpenVINO model generation * whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * whisper: Fix compilation error * whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures * cmake: Add openvino-encoder as separate object target * whisper : minor style fixes * minor : indentation fixes --------- Co-authored-by: Georgi Gerganov <[email protected]>
Hopefully I didn't break something - haven't tested
* openvino: use OpenVINO encoder inference * openvino: add python script for OpenVINO model generation * whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * whisper: Fix compilation error * whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures * cmake: Add openvino-encoder as separate object target * whisper : minor style fixes * minor : indentation fixes --------- Co-authored-by: Georgi Gerganov <[email protected]>
Hopefully I didn't break something - haven't tested
onnx_path, | ||
input_names=["mel"], | ||
output_names=["output_features"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not required to export to ONNX before usage in OpenVINO.
You can use convert_model with PyTorch in-memory object https://docs.openvino.ai/2023.1/openvino_docs_OV_Converter_UG_prepare_model_convert_model_Convert_Model_From_PyTorch.html
@@ -0,0 +1,2 @@ | |||
openvino-dev[pytorch,onnx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use openvino>=2023.1.0
which contains update version of convert_model
directly in main openvino
pip package, while openvino-dev
is actually deprecated.
std::shared_ptr<ov::Model> model = core.read_model(path_model); | ||
|
||
// Produce a compiled-model object, given the device ("CPU", "GPU", etc.) | ||
auto compiledModel = core.compile_model(model, device); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can pass path_model
directly to compile_model
, which can speed-up loading with ov::cache_dir
enabled. See https://docs.openvino.ai/2023.1/openvino_docs_OV_UG_Model_caching_overview.html#make-it-even-faster-use-compile-model-modelpath
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any practical speedup from this change?
I'm on OpenVINO 2022.3.1 for device which is EOL'ed. I can compile master and run it with cache:
whisper_openvino_init: path_model = models/ggml-base.en-encoder-openvino.xml, device = MYRIAD, cache_dir = models/ggml-base.en-encoder-openvino-cache
The speed is on par with CPU/GPU OpenVINO. And it helps RPi to inference on base model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably some yes, but the speedup will be during initialization (i.e. the time it takes to pull the model / cached blob from disk and prep the execution device).
@ilya-lavrenov -- good suggestions, looks like OpenVINO made some nice improvements for 2023.1+. Did you want to submit a PR with the updates / fixes? |
* openvino: use OpenVINO encoder inference * openvino: add python script for OpenVINO model generation * whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * whisper: Fix compilation error * whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures * cmake: Add openvino-encoder as separate object target * whisper : minor style fixes * minor : indentation fixes --------- Co-authored-by: Georgi Gerganov <[email protected]>
Hopefully I didn't break something - haven't tested
* openvino: use OpenVINO encoder inference * openvino: add python script for OpenVINO model generation * whisper: Fix 'unused' warnings when OpenVINO isn't enabled in build * Apply suggestions from code review Co-authored-by: Georgi Gerganov <[email protected]> * whisper: Fix compilation error * whisper: revert whisper_get_openvino_path_encoder & whisper_get_openvino_path_cache to non-const func signatures * cmake: Add openvino-encoder as separate object target * whisper : minor style fixes * minor : indentation fixes --------- Co-authored-by: Georgi Gerganov <[email protected]>
Hopefully I didn't break something - haven't tested
Running Whisper inference using OpenVINO
This PR extends
whisper.cpp
to run the Whisper Encoder on OpenVINO supported devices such as CPU, and Intel GPUs (integrated & discrete).I've tested this on number of platforms, including
For each platform, the performance of using OpenVINO-based encoder gives a great boost in performance over the default encoder -- even for CPU -- and the ability to easily offload to another OpenVINO-supported device by simply specifying a different string at runtime (e.g. "CPU" --> "GPU") is very convenient.
High-level description of changes
This introduction of OpenVINO Encode support is modeled very closely to how whisper.cpp uses CoreML (this should be pretty obvious in the change-set). If the project is built with OpenVINO support, an OpenVINO-specific encoder is pulled into the build and instantiated at application startup time.
Also similar to CoreML, the models required to be present to take advantage of the OpenVINO encoder can be generated using a new python script in 'models' directory.
Just to point out -- something that does differ between CoreML and the new OpenVINO integration is how/when support is enabled at runtime. CoreML is enabled within the call to
whisper_init_*
. For OpenVINO, because we want the ability to specify a device string ("CPU", "GPU", etc.), I exposed a new API that is dedicated to initializing OpenVINO, given a ctx:(in whisper.h):
I'm happy to rework this if anyone has a better idea of how to enable OpenVINO support at init time.
main.cpp exposes a new parameter for user to set OpenVINO encode inference device (default is "CPU"):
And the new
whisper_ctx_init_openvino_encoder
API is called right after ctx creation:How to generate models and enable OpenVINO for whisper.cpp builds
Here are the instructions for generating the OpenVINO models for use with OpenVINO-enabled builds of whisper.cpp:
First, setup python virtual env. and install python dependencies. Python 3.10 is recommended.
Windows:
Linux and macOS:
Generate an OpenVINO encoder model. For example, to generate a
base.en
model, use:This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that is the default location that the OpenVINO extension will search at runtime.
Build
whisper.cpp
with OpenVINO support:Download OpenVINO package from release page. The recommended version to use is 2023.0.0.
After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
Linux:
source /path/to/l_openvino_toolkit_ubuntu22_2023.0.0.10926.b4452d56304_x86_64/setupvars.sh
Windows (cmd):
And then build the project using cmake:
cd build cmake -DWHISPER_OPENVINO=1 ..
Run the examples as usual. For example:
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
cached for the next run.
You can use -oved [DEVICE] argument to main to specify OpenVINO device to offload encoder inference to. For example: