Releases · openvinotoolkit/model_server

01 Jun 15:35

rasapala

v2023.0

301b794

OpenVINO™ Model Server 2023.0

The 2023.0 is a major release with numerous improvements and changes.

New Features

Added option to submit inference requests in a form of strings and reading the response also in a form of a string. That can be currently utilized via a custom nodes and OpenVINO models with a CPU extension handling string data:
- Using a custom node in a DAG pipeline which can perform string tokenization before passing it to the OpenVINO model - that is beneficial for models without tokenization layer to fully delegate that preprocessing to the model server.
- Using a custom node in a DAG pipeline which can perform string detokenization of the model response to convert it to a string format - that can be beneficial for models without detokenization layer to fully delegate that postprocessing to the model server.
- Both options above are demonstrated with a GPT model for text generation demo.
- For models with tokenization layer like universal-sentence-encoder - there is added a cpu extension which implements sentencepiece_tokenization layer. Users can pass to the model a string which is automatically converted to the format needed by the cpu extension.
- The option above is demonstrated in universal-sentence-encoder model usage demo.
- Added support for string input and output in the ovmsclient – ovmsclient library can be used to send the string data to the model server. Check the code snippets.
Preview version of OVMS with MediaPipe framework - it is possible to make calls to OpenVINO Model Server to perform mediapipe graph processing. There are calculators performing OpenVINO inference via C-API calls from OpenVINO Model Server, and also calculators converting the OV::Tensor input format to mediapipe image format. That creates a foundation for creating arbitrary graphs. Check model server integration with mediapipe documentation.
Extended C-API interface with ApiVersion and Metadata calls, C-API version is now 0.3.
Added support for saved_model format. Check how to create models repository. An example of such use case is in universal-sentence-encoder demo.
Added option to build the model server with NVIDIA plugin on UBI8 base image.
Virtual plugins AUTO, HETERO and MULTI are now supported with NVIDIA plugin.
In the DEBUG log_level, there is included a message about the actual execution device for each inference request for the AUTO target_device. Learn more about the AUTO plugin.
Support for relative paths to the model files. The paths can be now relative to the config.json location. It simplifies deployments when the config.json to distributed together with the models repository.
Updated OpenCL drivers for the GPU device to version 23.13 (with Ubuntu22.04 base image).
Added option to build OVMS on the base OS Ubuntu:22.04. This is an addition to the supported based OSes Ubuntu:20.04 and UBI8.7.

Breaking changes

KServe API unification with Triton implementation for handling string and encoded images formats (now every string or encoded image located in binary extension (REST) or raw_input_contents (GRPC) need to be preceded by 4 bytes (little endian) containing its size) The updated code snippets and samples.
Changed default performance hint from THROUGHPUT to LATENCY in 2023.0 the default performance hint is changed from THROUGHPUT to LATENCY. With the new default settings, the model server will be adjusted for optimal execution and minimal latency with low concurrency. The default setting will also minimize memory consumption. In case of the usage model with high concurrency, it is recommended to adjust the NUM_STREAMS or set the performance hint to THROUGHPUT explicitly. Read more in performance tuning guide.

Bug fixes

AUTO plugin starts serving models on CPU and switch to GPU device after the model is compiled – it reduces the startup time for the model.
Fixed image building error on MacOS and Ubuntu22.
Ovmsclient python library compatible with tensorflow in the same environment – ovmsclient is generally created to avoid the requirement of tensorflow package installation to create smaller python environment. Now the tensorflow package will not be conflicting so it is fully optional.
Improved memory handling after unloading the models – the model server will not force releasing the memory after models unloading. Memory consumption reported by the model server process will be smaller in use case, when the models are frequently changed.

You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:
docker pull openvino/model_server:2023.0  - CPU device support with the image based on Ubuntu20.04
docker pull openvino/model_server:2023.0-gpu - GPU and CPU device support with the image based on Ubuntu22.04
or use provided binary packages.
The prebuilt image is available also on RedHat Ecosystem Catalog

Assets 8

27 Feb 14:44

atobiszei

v2022.3.0.1

26f95e9

OpenVINO™ Model Server 2022.3.0.1

The 2022.3.0.1 version is a patch release for the OpenVINO Model Server. It includes a few bug fixes and enhancement in the C-API.

New Features

Added to inference execution method OVMS_Inference in C API support for DAG pipelines. The parameter servableName can be both the model name or the pipeline name
Added debug log in the AUTO plugin execution to report which physical device is used - AUTO plugin allocates the best available device for the model execution. For troubleshooting purposes, in the debug log level, the model server will report which device is used for each inference execution
Allowed enabling metrics collection via CLI parameters while using the configuration file. Metrics collection can be configured in CLI parameters or in the configuration file. Enabling the metrics in CLI is not blocking any more the usage of configuration file to define multiple models for serving.
Added client sample in Java to demonstrate KServe API usage .
Added client sample in Go to demonstrate KServe API usage.
Added client samples demonstrating asynchronous calls via KServe API.
Added a demo showcasing OVMS with GPT-J-6b model from Hugging Face.

Bug fixes

Fixed model server image building with NVIDIA plugin on a host with NVIDIA Container Toolkit installed.
Fixed KServe API response to include the DAG pipeline name for the calls to DAG – based on the API definition, the response includes the servable name. In case of DAG processing, it will return now the pipeline name instead of an empty value.
Default number of gRPC and REST workers will be calculated correctly based on allocated CPU cores – when the model server is started in the docker container with constrained CPU allocation, the default number of the frontend threads will be set more efficiently.
Corrected reporting the number of streams in the metrics while using non-CPU plugins – before fixing that bug, a zero value was returned. That metric suggests the optimal number of active parallel inferences calls for the best throughput performance.
Fixed handling model mapping with model reloads.
Fixed handling model mapping with dynamic shape/batch size.
ovmsclient is not causing conflicts with tensorflow-serving-api package installation in the same python environment.
Fixed debug image building.
Fixed C-API demo building.
Added security fixes.

Other changes:

Updated OpenCV version to 4.7 - opencv is an included dependence for image transformation in the custom nodes and for jpeg/png input decoding.
Lengthened requests waiting timeout during DAG reloads. On slower machines during DAG configuration reload sporadically timeout was reached ending in unsuccessful request.
ovmsclient has more relaxed requirements related to numpy version.
Improved unit tests stability.
Improved documentation.

You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:
docker pull openvino/model_server:2022.3.0.1 
docker pull openvino/model_server:2022.3.0.1-gpu
or use provided binary packages.

Assets 6

21 Dec 19:58

atobiszei

v2022.3

b9737ce

OpenVINO™ Model Server 2022.3

The 2022.3 version is a major release. It includes several new features, enhancements and bug fixes.

New Features

Import TensorFlow Models – preview feature

OpenVINO Model Server can now load TensorFlow models directly from the model repository. Converting to OpenVINO Intermediate Representation (IR) format with model optimizer is not required. This is a preview feature with several limitations. The model must be in a frozen graph format with .pb extension. Loaded models take advantage of all OpenVINO optimizations. Learn more about it and check this demo.

C API interface to the model server internal functions – preview feature

It is now possible to leverage the model management functionality in OpenVINO Model Server for local inference execution within an application. Just dynamically link the OVMS shared library to take advantage of its new C API and use internal model server functions in C/C++ applications. To learn more see the documentation and check this demo.

Extended KServe gRPC API

The KServe gRPC API implemented in OpenVINO Model Server has been extended to support both input and output in format of Tensor data and raw data. Output format is consistent with the input format. This extension enables using Triton Client library with OpenVINO Model Server to send inference requests. The input data can be prepared as vectors or encoded as jpeg/png and sent as bytes. Learn more about the current API and check Python and C++ samples.

Extended KServe REST API

The KServe REST API now has additional functionality that improves compatibility with Triton Inference Server extension. It is now possible to send raw data in an HTTP request outside of the JSON content. Concatenated bytes can be interpreted by the model server depending on the header content. It is easy and quick to serialize the data from numpy/vectors and send jpeg/png encoded images.

Added Support for Intel® Data Center GPU Flex and Intel® Arc GPU

OpenVINO Model Server has now official support for Intel® Data Center GPU Flex and Intel® Arc GPU cards. Learn more about using discrete GPU devices.

C++ Sample Inference Client Applications using KServe API

New client code samples to demonstrate KServe API usage. These samples illustrate typical data formats and scenarios. Check out the samples.

Extended Python Client Samples using KServe API

Python client code samples have been extended to include new API features for both the gRPC and REST interfaces

Added integration with OpenVINO plugin for NVIDIA GPU

OpenVINO Model Server can now be used also with NVIDIA GPU cards. Follow those steps to build the Model Server from sources including NVIDIA plugin from openvino_contrib repo. Learn more about using NVIDIA plugin

Breaking changes

CLI parameter has been changed to reflect interval time unit: custom_node_resources_cleaner_interval_seconds. Default value should be optimal for most use cases.
Temporarily there is no support for HDDL/NCS plugins. Support for those will come in next release.

Deprecated functionality

Plugin config parameters from OpenVINO API 1.0 – OpenVINO Model can be tuned using plugin config parameters. So far, the parameter names are defined by OpenVINO API 1.0. It is recommended to start using the parameter names defined in OpenVINO API 2.0. In this release old parameters are automatically translated to new substitutions. Check performance tuning guide and more info about the plugin parameters.

Bug fixes

Improved performance for DAG pipelines executed on GPU accelerators
The default number of performance tuning parameters was not calculated correctly inside docker containers with constrained CPU capacity. Now the number of optimal streams for THROUGHPUT mode will be set based on the bound CPU in the container.
Fixes in unit tests raising sporadic false positive errors.

Other changes:

Published binary package of OpenVINO Model Server which can be used in the deployments on baremetal hosts without Docker containers. See instructions for baremetal deployment.
Updated software dependencies and container base images

You can use an OpenVINO Model Server public Docker image's based on Ubuntu via the following command:
docker pull openvino/model_server:2022.3 
docker pull openvino/model_server:2022.3-gpu
or use provided binary packages.

Assets 4

22 Sep 14:06

atobiszei

v2022.2

d3d28c8

OpenVINO™ Model Server 2022.2

The 2022.2 version is a major release with the new OpenVINO backend API (Application Programming Interface).

New features

KServe gRPC API

Beside Tensorflow Serving API, it is now possible to run calls to the OpenVINO Model Server using KServe API. The following gRPC methods are implemented: ModelInfer, ModelMetadata, ModelReady, ServerLive, ServerReady and ServerMetadata.
Inference execution supports the input both in the raw_input_contents format and InferTensorContents.

The same clients can be used to connect with the OpenVINO Model Server like with other KFServe compatible model servers. Check the samples using Triton client library in python.

KServe REST API – feature preview

Next to TensorFlow Serving REST API, we implemented also KFServe REST API. There are functional the following endpoints:

v2
v2/health/live
v2/health/ready
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]/ready
v2/models/{MODEL_NAME}[/versions/{MODEL_VERSION}]/infer

Beside the standard input format as tensor_data, there is implemented also the binary extension compatible with the Triton Inference Server.

That way the data could be sent in as arrays in json or as encoded to jpeg or png content.

Check how to connect to KFServe in the samples using Triton client library in python.

Execution metrics – feature preview

OpenVINO Model Server can now expose metrics compatible with Prometheus format. Metrics can be enabled in the server configuration file or using a command line parameter.
The following metrics are now available:

ovms_streams 
ovms_current_requests 
ovms_requests_success 
ovms_requests_fail 
ovms_request_time_us 
ovms_inference_time_us 
ovms_wait_for_infer_req_time_us 
ovms_infer_req_queue_size 
ovms_infer_req_active

Metrics can be integrated with the Grafana reports or with horizontal autoscaler.

Learn more about using metrics.

Direct support for PaddlePaddle models

OpenVINO Model Server includes now PaddlePaddle model importer. It is possible to deploy models trained in PaddlePaddle framework directly into the models repository.
Check the demo how to deploy and use a segmentation model ocrnet-hrnet-w48-paddle in PaddlePaddle format.

Performance improvements in DAG execution

In several scenarios, the pipeline execution was improved to reduce data copy operation. That will be perceived as reduced latency and increased overall throughput.

Exemplary custom nodes are included in the OpenVINO Model Server public docker image.

Deploying the pipelines based on exemplary custom nodes . So far it was required to compile the custom node and mount into the container during the deployment. Now, those libraries are added to the public docker image. Demos including custom nodes, include now an option to use the precompiled version in the image or to build them from source. Check the demo of horizontal text detection pipeline

Breaking changes

Changed the sequence of starting REST/gRPC endpoints vs initial loading of the models.

With this version, the model server initiates the gRPC and REST endpoints (if enabled) before the models are loaded. Before that change, active network interface was acting as the readiness indicator. Now, the server readiness and models readiness can be checked using the dedicated endpoints according to the KFServe API:

v2/health/ready 
v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/ready

It will make easier to monitor the state of models during the initialization phase.

Updated OpenCV version used in the model server to 4.6.0 version

That impacts the custom node compatibility. Any custom nodes using OpenCV for custom image transformation could be recompiled. Check the recommended process for building the custom nodes in the docker container in our examples

Bug Fixes:

Minor fixes in logging
Fixed configuring warning log level
Fixes in documentation
Security fixes

You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command:
docker pull openvino/model_server:2022.2 or
docker pull openvino/model_server:2022.2-gpu

Assets 2

25 Mar 07:09

atobiszei

v2022.1

277156f

OpenVINO™ Model Server 2022.1

The 2022.1 version is a major release with the new OpenVINO backend API (Application Programming Interface). It includes several new features and a few breaking changes.

New features

Support for dynamic shape in the models
Allow configuring model inputs to accept range of input shape dimensions variable batch size. This enables sending predict requests with various image resolutions and batches.
Model cache for faster loading and initialization
The cached files make the Model Server initialization faster when performing subsequent model loading. Cache files can be reused within the same Model Server version, target device, hardware, model, model version, model shape and plugin config parameters.
Support for double precision
OVMS now supports two more additional precisions FP64 and I64.
Extended API for the Directed Acyclic Graph scheduler custom nodes to include initialization and cleanup steps
This enables additional use cases where you can initialize resources in the DAG loading step instead of during each predict request. This for example allows to avoid dynamic allocation during custom node execution.
Easier deployment of models with layout from training frameworks
If model has information about its layout this information is preserved in OVMS. OpenVINO Model Optimizer can be instructed to save information about model layout.
Arbitrary layout transpositions
Added support for handling any layout transformation when loading models. This will result in adding preprocessing step before inference. This is performed using --layout NCHW:NHWC which informs the OVMS that natively accepts NHWC layout and we should add preprocessing step with transposition from NCHW to accept such inputs.
Support for models with batch size on arbitrary dimension
Batch size in layout can be now on any position in model. Previously OVMS batch size was accepted only on first dimension when changing model batch size.

Breaking changes

Order of reshape and layout change operations during model initialization.
In previous OVMS versions, the order was: first do the reshape then apply layout change.
In this release OVMS handles order of operations for user, and it is required to specify expected final shape and expected transposition to be added.
If you wanted to change model with original shape: (1,3,200,200), layout: NCHW to handle different layout & resolution you had to set --shape "(1,3,224,224)" --layout NHWC. Now both parameters should describe target values so with 2022.1 it should look like: --shape "(1,224,224,3)" --layout NHWC:NCHW.
Layout parameter changes
Previously when configuring model with parameter --layout administrator was not required to know what the underlying model layout is because OV by default used NCHW. Now when using parameter --layout NCHW inform the OVMS that model is using layout NCHW – both model is using NCHW and accepting NCHW input.
Custom nodes code must include implementation of new API methods. It might be dummy implementation if not needed. Additionally, all previous API functions must include additional parameter void*.
In the DAG pipelines configuration, demultiplexing with dynamic number of parallel operations is configurable with the parameter “dynamic_count” set to –1 beside the 0 so far. It will be more consistent with the common conventions used e.g., in model input shapes. Using 0 is now deprecated and support for this will be removed in following releases.

Other changes:

Updated demo with question answering use case – BERT model demo with dynamic shape and variable length of the request content
Rearranged structure of the demos and client code examples.
Python client code examples both with tensorflow-server-api and ovmsclient library.
Demos updated to use models with preserved layout and color format
Custom nodes updated to use new API. Initialization step in model zoo custom node uses memory buffers initialization to speed up the execution.

Bug Fixes:

Fixed issue with loading cloud stored models. Occasionally when downloading model it would not load properly.
Fixes in documentation
Security fixes

You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command:
docker pull openvino/model_server:2022.1 or
docker pull openvino/model_server:2022.1-gpu

Assets 2

18 Nov 12:21

dtrawins

v2021.4.2

0ef76fc

OpenVINO™ Model Server 2021.4.2

The 2021.4.2 version is a hotfix release for the OpenVINO Model Server. It includes a few bug fixes and enhancements in the exemplary clients.

Bug fixes:

Fixed an issue with inference execution on NCS stick which allows loading multiple models at the same time. Now, with the config mode, multiple models can be passed to NCS device via a parameter --target_device MYRIAD.
Documented docker container deployment with NCS stick without the privileged mode.
Fixed handing of parameters including nested double quote" characters in the startup options in the docker container with an nginx proxy. It was impacting parameters like --plugin_config '{"CPU_THROUGHPUT_STREAMS":"1"}'
Improved handling of OpenVINO plugin config parameters. Previously wrong type for a plugin parameter value didn’t return an error so it was easy to miss the fact that the parameter was ignored. Now the device plugin configuration will accept numerical values both with and without quotes. --plugin_config '{"CPU_THROUGHPUT_STREAMS":"1"}' and --plugin_config '{"CPU_THROUGHPUT_STREAMS":1}' are fine now. Invalid format of the value will raise an error.
The parameters for changing the layout and shape with multiple inputs/outputs, will use the updated model tensor name as defined in the mapping_config.json file. It refers to a format like {"input1":"NHWC","input2":"NHWC"}
External contribution to a custom node model_zoo_intel_object_detection - added labels output in the Directed Acyclic Graph Scheduler custom node. Now, it includes in the output also the labels from an object detection model.
Security related updates

Exemplary client’s improvements:

OVMS Demo with Bert model – question answering python application
C++ async client example – it demonstrates how to connect to the model server using C++ application, but it can also be a convenient tool to test OVMS performance with the execution concurrency.
Golang client – a demonstration how to connect to OVMS via gRPC protocol from a Golang application
Updated Optical Character Recognition pipeline example to use a combination of EAST-REST50 model with a text recognition model from OpenVINO Model Zoo

You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command:
docker pull openvino/model_server:2021.4.2 or
docker pull openvino/model_server:2021.4.2-gpu

Assets 2

13 Sep 11:26

dkalinowski

v2021.4.1

f977dbf

OpenVINO™ Model Server 2021.4.1

The 2021.4.1 version of Model Server is primarily a hotfix release. It also includes a preview version of the simplified python client library and a sample client written in C++. We added also a version of the model server docker image based on Ubuntu 20.04. The public docker image in DockerHub is using now the Ubuntu 20.04 base OS. The model server image based on CentOS 7 will be discontinued starting from next release.

Bug Fixes:

Removed limitation in the DAG configuration which required the pipeline input to be connected to at least one neural network model while using the binary input format. Now the input can be connected also exclusively to a custom node. An example of such use case is documented in ovms_onnx_example.
Removed an invalid error message in the server logs while loading models from the google cloud storage.
Fixed a very rare race condition preventing detection of updates in the configuration file.
Improvements in the error messages reporting an invalid DAG pipeline configuration with unmatched data shape between nodes.
Corrected reported model state in the model states queries in the loading error condition. When the model cannot be loaded, it will now report status Loading>Error instead of End>OK.
The model server was ignoring incorrect parameters in the configuration file. It was typically a situation with a spelling mistake for a valid parameter. Now an error will be raised when an invalid parameter is defined.
Corrected issue related to a scenario with demultiplexed output connected both to a custom node and a neural network model (DL node).

Python client library - the lightweight client library provides simplified mechanism to communicate with OVMS and TensorFlow Serving. Contrary to tensorflow-server-api, it does not include Tensorflow as a dependency, which reduces its size dramatically. It also has simplified API which allows sending the prediction requests with just few commands. Currently gRPC protocol is included. REST API is to be added. Learn more in client lib documentation.

C++ client example - client code example compatible with OVMS and TensorFlow Serving. It can run the predict requests both in a format of jpeg/png images or as arrays converted to tensor_content format. It includes a receipt for building it using bazel and a dockerfile. Learn more in example documentation.

You can use an OpenVINO Model Server public Docker image based on Ubuntu via the following command:
docker pull openvino/model_server:2021.4.1 or
docker pull openvino/model_server:2021.4.1-gpu

Assets 2

30 Jun 14:36

atobiszei

v2021.4

10f4610

OpenVINO™ Model Server 2021.4

The 2021.4 release of OpenVINO™ Model Server includes the following new features and bug fixes:

New Features:

Binary input data - ability to send inference requests using data in a compressed format like jpeg or png – significantly reducing communication bandwidth. There is a noticeable performance improvement, especially with the REST API prediction calls and image data. For more details, see the documentation.
Dynamic batch size without model reloading – it is now possible to run inference with arbitrary batch sizes using input demultiplexing and splitting execution into parallel streams. This feature enables inference execution with OpenVINO Inference Engine without the side effect of changing the batch size for sequential requests and reloading models at runtime. For more details, see the documentation.
Practical examples of custom nodes – new or updated custom nodes: model zoo object detection, Optical Character Recognition and image transformation. These custom nodes can be used in a range of applications like vehicle object detection combined with recognition or OCR pipelines. Learn more about DAG Scheduler and custom nodes in the documentation.
Change model input and output layouts at runtime – it is now possible to change the model layout at runtime to NHWC. Source images are typically in HWC layout and such layout is used to image transformation libraries. Using the same layout in the model simplifies linking custom nodes with image transformations and avoids data transposing. It also reduces the load on clients and the overall latency for inference requests. Learn more

Bug Fixes:

Access to public S3 buckets without authentication was not functional. Now models in public S3 buckets can be loaded without credentials.
Configuration Reload API calls did not update the models when the Model Server was started with missing model repository.
Configuration file validation accepted illegal shape configurations which is corrected now and report a proper error log.
ONNX models with dynamic shapes could not be loaded even after defining the shape in the configuration file.
DAG Scheduler pipelines could not be created with connections between nodes one having dynamic and one having static shape.
Custom loader did not detect and apply configuration changes correctly at runtime.
Unhandled exception while loading unsupported models on HDDL devices.

OpenVINO™ Toolkit Operator for OpenShift

The OpenVINO™ Toolkit Operator for OpenShift 0.2.0 is included in the 2021.4 release. It has been renamed and has the following enhancements compared to previous OpenVINO™ Model Server Operator 0.1.0 released with 2021.3:

The Custom Resource for managing the instances of OpenVINO™ Model Server is renamed from Ovms to ModelServer.
ModelServer resources can now manage additional parameters: annotations, batch_size, shape, model_version_policy, file_system_poll_wait_seconds, stateful, node_selector, and layout. For a list of all parameters, see the documentation.
The new Operator integrates OpenVINO™ Toolkit with OpenShift Data Science —a managed service for data scientists and AI developers offered by Red Hat. The Operator automatically builds a Notebook image in OpenShift which integrates OpenVINO™ Toolkit's developer tools and tutorials with the JupyterHub spawner.
Operator 0.2.0 is currently available for OpenShift only. Updates to the Kubernetes Operator will be included in a future release.

You can use an OpenVINO™ Model Server public Docker image based on CentOS via the following command:
docker pull openvino/model_server:2021.4 or
docker pull openvino/model_server:2021.4-gpu

Deprecation notice
Starting with 2022.1 OpenVINO™ Model Server release docker images will be based on Ubuntu instead of CentOS.

Assets 2

24 Mar 13:06

atobiszei

v2021.3

a054e25

OpenVINO Model Server 2021.3

OpenVINO™ Model Server

This is the third release of OVMS in C++ implementation. It contains as a backend OpenVINO Inference Engine in the same version - 2021.3.

New capabilities and enhancements

Custom Node support for Directed Acyclic Graph Scheduler. Custom nodes in OpenVINO Model Server simplifies linking deep learning models into a complete pipeline even if the inputs and output of the sequential models does not fit. In many cases, output of one model can not be directly passed to another one. The data might need to be analysed, filtered or converted to different format. Those operations can not be easily implemented in AI frameworks or are simply not supported. Custom node addresses this challenge. They allow employing a dynamic library developed in C++ or C to perform arbitrary data transformations.
DAG demultiplexing - Directed Acyclic Graph Scheduler allows creating pipelines with Node output demultiplexing into separate sub outputs and branch pipeline execution. It can improve execution performance and address scenarios where any number of intermediate batches produced by custom nodes can be processed separately and collected at any graph stage.
Exemplary custom node for OCR pipeline - A use case scenario for custom node and execution demuliplexing has been demonstrated in an OCR pipeline. It combines east-resnet50 model with CRNN model for a complete text detection and text recognition. This custom node analyses the response of east-resnet50 model. Based on the inference results and the original image, it generates a list of detected boxes for text recognition. Each image in the output will be resized to the predefined target size to fit the next inference model in the DAG pipeline (CRNN) .
Support for stateful models - A stateful model recognizes dependencies between consecutive inference requests. It maintains state between inference requests so that next inference depends on the results of previous ones. OVMS allows now submitting inference requests in a context of a specific sequence. OVMS stored and model state and response the prediction results based on the history of requests from the client.
Control API - extended REST API to provide functionality of triggering OVMS configuration updates. Endpoint config/reload initiate applying configuration changes and models reloading. It ensures changes in configuration are deployed in a specific time and also gives confirmation about reload operation status. Endpoint /config reports all served models and their versions. It simplifies usage model from the client side and connection troubleshooting.
Helm chart enhancements - added multiple configuration options for deployment with new scenarios: new model storage classes, kubernetes resource restrictions, security context. Fixed defected with big scale deployments.
Kubernetes Operator - enabled OVMS deployments using Kubernetes Operator for OVMS. This offering can be used to simplify management of OVMS services at scale in Openshift and in open source kubernetes. This offering is published in operatorhub

You can use an OpenVINO Model Server public Docker image based on centos* via the following command:
docker pull openvino/model_server:2021.3 or
docker pull openvino/model_server:2021.3-gpu

Assets 2

21 Jan 14:30

dtrawins

v2021.2.1

e098dae

OpenVINO Model Server 2021.2.1

OpenVINO Model Server 2021.2.1 is a hot fix release without any new features and functionality changes. It contains OpenVINO Inference Engine in version - 2021.2.

It addresses the following bugs:

Incorrect version management for corrupted or invalid models – when the model files were invalid or incomplete, OVMS could serve incorrect version or stop serving all model versions. Now, versions with invalid model files will be ignored. Model version policy will apply only to valid models.
Sporadic OVMS crash after online update of the configuration file under very heavy load from DAG prediction calls.
Incorrect response from GetModelMetadata after online model configuration change
Incorrect parsing of OVMS parameters in quotes in the docker image with nginx reverse proxy for clients mTLS authorization
Allowed configuration of multiple pipelines with identical name – now it is prevented in configuration validation
Minor issues in documentation

You can use an OpenVINO Model Server public Docker image based on CentOS* via the following command:
docker pull openvino/model_server:2021.2.1 or
docker pull openvino/model_server:2021.2.1-gpu

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features

Breaking changes

Bug fixes

New Features

Bug fixes

Other changes:

New Features

Import TensorFlow Models – preview feature

C API interface to the model server internal functions – preview feature

Extended KServe gRPC API

Extended KServe REST API

Added Support for Intel® Data Center GPU Flex and Intel® Arc GPU

C++ Sample Inference Client Applications using KServe API

Extended Python Client Samples using KServe API

Added integration with OpenVINO plugin for NVIDIA GPU

Breaking changes

Deprecated functionality

Bug fixes

Other changes:

New features

KServe gRPC API

KServe REST API – feature preview

Execution metrics – feature preview

Direct support for PaddlePaddle models

Performance improvements in DAG execution

Exemplary custom nodes are included in the OpenVINO Model Server public docker image.

Breaking changes

Changed the sequence of starting REST/gRPC endpoints vs initial loading of the models.

Updated OpenCV version used in the model server to 4.6.0 version

Bug Fixes:

Releases: openvinotoolkit/model_server

OpenVINO™ Model Server 2023.0

New Features

Breaking changes

Bug fixes

OpenVINO™ Model Server 2022.3.0.1

New Features

Bug fixes

Other changes:

OpenVINO™ Model Server 2022.3

New Features

Import TensorFlow Models – preview feature

C API interface to the model server internal functions – preview feature

Extended KServe gRPC API

Extended KServe REST API

Added Support for Intel® Data Center GPU Flex and Intel® Arc GPU

C++ Sample Inference Client Applications using KServe API

Extended Python Client Samples using KServe API

Added integration with OpenVINO plugin for NVIDIA GPU

Breaking changes

Deprecated functionality

Bug fixes

Other changes:

OpenVINO™ Model Server 2022.2

New features

KServe gRPC API

KServe REST API – feature preview

Execution metrics – feature preview

Direct support for PaddlePaddle models

Performance improvements in DAG execution

Exemplary custom nodes are included in the OpenVINO Model Server public docker image.

Breaking changes

Changed the sequence of starting REST/gRPC endpoints vs initial loading of the models.

Updated OpenCV version used in the model server to 4.6.0 version

Bug Fixes:

OpenVINO™ Model Server 2022.1

OpenVINO™ Model Server 2021.4.2

OpenVINO™ Model Server 2021.4.1

OpenVINO™ Model Server 2021.4

OpenVINO Model Server 2021.3

OpenVINO Model Server 2021.2.1