Skip to content

Releases: openvinotoolkit/model_server

OpenVINO Model Server 2021.2

15 Dec 16:08
5ab937c
Compare
Choose a tag to compare

This is the second release of OVMS in C++ implementation. It includes several new features, enhancements and bug fixes. It contains as a backend OpenVINO Inference Engine in the same version - 2021.2.

New capabilities and enhancements

  • Directed Acyclic Graph (DAG) scheduler – (formerly models ensemble) this feature was first available as a preview in 2021.1. It is now officially supported, making it possible to define inference pipelines composed of multiple interconnected models that respond to a single prediction request. In this release we are adding support for remaining API calls which were not supported for DAGs in the preview, specifically GetModelStatus and GetModelMetadata. GetModelStatus returns the status of the complete pipeline while GetModelMetadata returns the pipeline inputs and outputs parameters. The new 2021.2 release has improved DAG config validation.
  • Direct import of ONNX models – it is now possible to serve ONNX models without converting to Intermediate Representation (IR) format. This helps simplify deployments using ONNX models and the PyTorch training framework.
  • Custom loaders and integration with OpenVINO™ Security Add-on – it is now possible to define a custom library to handle model loading operations – including additional steps related to model decryption and license verification. Review the documentation of the Security Add-on component to learn about controlled access to the models.
  • Traffic Encryption – new deployment recipe for client authorization via mTLS certificates and traffic encryption by integrating with NGINX reverse proxy in a Docker container.
  • Remote Model Caching from cloud storage – models stored in Google Cloud Storage (GCS), Amazon S3 and Azure blob will no longer be downloaded multiple times after configuration changes that require model reloading. Cached model(s) will be used during the model reload operation. When a served model is changed, only the corresponding new version folder will be added to the model storage.
  • updated versions of several third-party dependencies

Fixed bugs

  • Sporadic short unavailability of the default version when the model is switching to newer one
  • REST API not working with rest_workers=1 - there will be clear error message about invalid value. By default the number of REST worker threads will be adjusted automatically based on the CPUs
  • Prevented service crash when shape parameter is out of integer range

Known issues

  • version upgrade might fail when new model files are corrupted but older versions might be unloaded according to model version policy
  • ovms might sporadically fail under very heavy load on DAG execution during online update of pipeline models configuration. Predictions for individual models are not impacted.

You can use an OpenVINO Model Server public Docker image based on centos* via the following command:
docker pull openvino/model_server:2021.2 or
docker pull openvino/model_server:2021.2-gpu

OpenVINO Model Server 2021.1

06 Oct 12:42
Compare
Choose a tag to compare

This is a major release of OpenVINO Model Server. It is a completely rewritten implementation of the serving component. Upgrade from Python-based version (2020.4) to C++ implementation (2021.1) should be mostly transparent. There are no changes required on the client side. Exposed API is unchanged but some configuration settings and deployment methods might be slightly adjusted.

Key New Features and Enhancements

  • Much higher scalability in a single service instance. You can now utilize the full capacity of the available hardware. Expect linear scalability when introducing additional resources while avoiding any bottleneck on the frontend.
  • Lower latency between the client and the server. This is especially noticeable with high performance accelerators or CPUs.
  • Reduced footprint. By switching to C++ and reducing dependencies, the Docker image is reduced to ~400MB (for CPU, NCS and HDDL support) and ~800MB (for the image including also iGPU support).
  • Reduced RAM usage. Thanks to reduced number of external software dependencies, OpenVINO Model Server allocates less memory on start up.
  • Easier deployment on bare-metal or inside a Docker container.
  • Support for online model updates.The server monitors configuration file changes and reloads models as needed without restarting the service.
  • Model ensemble (preview). Connect multiple models to deploy complex processing solutions and reduce overhead of sending data back and forth.
  • Azure Blob Storage support. From now on you can host your models in Azure Blob Storage containers.
  • Updated helm chart for easy deployment in Kubernetes

Changes in version 2021.1

Moving from 2020.4 to 2021.1 introduces a few changes and optimizations which primarily impact the server deployment and configuration process. These changes are documented below.

  • Docker Container Entrypoint
    To simplify deployment with containers, Docker imageentrypoint was added. Now the container startup requires only parameters specific to the Model Server executable:
    Old command:
    docker run -d -v $(pwd)/model:/models/my_model/ -e LOG_LEVEL=DEBUG -p 9000:9000 openvino/model_server /ie-serving-py/start_server.sh ie_serving model --model_path /models/face-detection --model_name my_model --port 9000 --shape auto
    New command:
    docker run -d -v $(pwd)/model:/models/my_model/ -p 9000:9000 openvino/model_server --model_path /models/my_model --model_name my_model --port 9000 --shape auto --log_level DEBUG
  • Simplified Command Line Parameters
    Subcommands model and config are no longer used. Single-model mode or multi-model mode of serving is determined based on whether --config_path or --model_name is defined. --config_path or --model_name are exclusive.
  • Changed default THROUGHPUT_STREAMS settings for the CPU and GPU device plugin
    In python implementation, the default configuration was optimized for minimal latency results with a single stream of inference request. In version 2021.1, the default configuration for the server concurrency CPU_THROUGHPUT_STREAMS and GPU_THROUGHPUT_STREAMS are calculated automatically based on the available resources. It ensure both low latency and efficient parallel processing. If you need to serve the models only for a single client on high performance systems, set a parameter like below:
    --plugin_config '{"CPU_THROUGHPUT_STREAMS":"1"}'
  • Log Level and Log File Path
    Instead of environment variables LOG_LEVEL and LOG_PATH, log level and path are now defined in command line parameters to simplify configuration.
    --log_level DEBUG/INFO(default)/ERROR
  • grpc_workers Parameter Meaning
    In the Python implementation (2020.4 and below) this parameter defined the number of frontend threads. In the C++ implementation (2021.1 and above) this defines the number of internal gRPC server objects to increase the maximum bandwidth capacity. The default value of 1 should be sufficient for most scenarios. Consider tuning it if you expect very high load from multiple parallel clients.
  • Model Data Type Conversion
    In the Python implementation (2020.4 and below) the input tensors of data type different than expected by the model were automatically converted to match required data type. In some cases, such conversion impacted the overall performance of inference request. In the version 2021.1, the user input data type must be the same as the model input data type. The client receives an error indicating incorrect input data precision, which gives immediate feedback to correct the format.
  • Proxy Settings
    no_proxy environment variable is not used with the cloud storage for models. The http_proxy and https_proxy settings are common for all remote models deployed in OpenVINO Model Server. In case you need to deploy both models stored behind the proxy and direct, run two instances of the model server.
    Refer to troubleshooting guide to learn about known issues and workarounds.
  • Default Docker security context
    By default OpenVINO Model Server process starts inside the docker container in the context of ovms account with uid 5000. It was root context in the previous versions. The change is enforcing the best practice of minimal required permissions. In case you need to change the security context, use –user flag in docker run command.

Note: Git history of C++ development is stored on a main branch (new default). Python implementation history is preserved on a master branch.

You can use an OpenVINO Model Server public Docker image based on centos* via the following command:
docker pull openvino/model_server:2021.1 or
docker pull openvino/model_server:2021.1-gpu

OpenVINO Model Server 2020.4

07 Aug 12:56
6fcff34
Compare
Choose a tag to compare

OpenVINO™ Model Server 2020.4 introduces support for Inference Engine in version 2020.4.
Read OpenVINO-Release Notes to learn more about changes. The most important for the model server scenarios is:

You can use an OVMS public Docker image based on clearlinux via the following command:
docker pull intelaipg/openvino-model-server:2020.4

OpenVINO Model Server 2020.3

06 Aug 22:06
92abde9
Compare
Choose a tag to compare

OpenVINO™ Model Server 2020.3 introduces support for Inference Engine in version 2020.3.
Refer to OpenVINO-Release Notes to learn more about enhancements. The most important for the model server scenarios are:

  • Introducing Long-Term Support (LTS), a new release type that provides longer-term maintenance and support with a focus on stability and compatibility
  • Added support for new FP32 and INT8 models to enable more vision and text use cases: 3D U-Net, MobileFace, EAST, OpenPose, RetinaNet, and FaceNet
  • Improved the support of AVX2 and AVX512 instruction sets in the CPU preprocessing module
  • Added support for new model operations
  • Introduced support for bfloat16 (BF16) data type for inferencing
  • Included security, functionality bug fixes, and minor capability changes

OpenVINO Model Server 2020.3 release has the following changes and enhancements:

Bug fixes:

  • Fixed unnecessary model reload that occurred for multiple versions of the model
  • Fixed race condition for simultaneous loading and unloading of the same version
  • Fixed bug in face detection example

You can use an OVMS public Docker image based on clearlinux via the following command:
docker pull intelaipg/openvino-model-server:2020.3

OpenVINO Model Server 2020.1

23 Mar 20:55
1ae76c6
Compare
Choose a tag to compare

OpenVINO™ Model Server 2020.1 introduces support for Inference Engine in version 2020.1.
Refer to OpenVINO-Release Notes to learn more about enhancements. The most relevant for the model server use case are:

  • Inference Engine integrated with ngraph
  • Low-precision runtime for INT8
  • Added support for multiple new layers and operations
  • Numerous improvements in the plugin implementation

OpenVINO Model Server 2020.1 release has the following new features and changes:

  • Speeded up inference output serialization – up to 40x faster – models with big outputs will have noticeably shorter latency
  • Added exemplary client sending inference requests from multiple cameras in parallel
  • Added support for tensorflow 2.0 and python3.8 with backward compatibility
  • Updated functional tests to use IR models from OpenVINO Model Zoo
  • Updated functional tests to use mino for S3 compatible model storage

Bug fixes:

  • Fixed model files detection and import for certain name patterns
  • Corrected kubernetes demo in GCP

Note: In version 2020.1 CPU extensions library was removed. Extensions are include into the CPU plugin.
Extension library is now optional to include custom layers only.

You can use an OVMS public Docker image based on OpenVINO runtime image via the following command:

docker pull intelaipg/openvino-model-server:2020.1

OpenVINO Model Server 2019 R3

31 Oct 14:23
0a9b15e
Compare
Choose a tag to compare

OpenVINO™ Model Server 2019 R3 introduces support for Inference Engine in version 2019 R3.
Refer to OpenVINO-Release Notes to learn more about enhancements. The most relevant for the model server use case are:

  • Improved performance through network loading optimizations and sped up inference by reducing model loading time. This is useful when shape size changes between inferences.
  • Added support for Ubuntu* 18.04
  • Added support for multiple new layers and operations
  • Numerous improvements in the plugin implementation for all supported devices

OpenVINO Model Server 2019 R3 release has the following new features and changes:

  • Ability to start the server with multi-worker configuration and parallel inference execution. A new set of parameters are introduced for controlling the number of server threads and parallel inference executions:
    -- grpc_workers
    -- rest_workers
    -- nireq
    Read more about this in performance tuning guide.
    This new feature improves throughput results when employing hardware accelerators like Intel® Movidius™ VPU HDDL.
  • The target device is now configurable on the model level for running inference operations by adding the parameter target_device in the command line and in the service configuration file. The DEVICE environment variable is no longer used.
  • Added the option to pass additional configuration to the employed plugins with parameter plugin_config
  • Included recommendation to use CPU affinity with multiple replicas in Kubernetes via a CPU manager and a static assignment policy.

You can use a public Docker image based on clearlinux base image via the following command:

docker pull intelaipg/openvino-model-server:2019.3

OpenVINO Model Server 2019 R2

26 Sep 14:04
d2bab2a
Compare
Choose a tag to compare

OpenVINO™ Model Server 2019 R2 introduces support for Inference Engine in 2019 R2 version of the Intel® Distribution of OpenVINO toolkit. Refer to the Release Notes to learn more about enhancements. The most relevant enhancements for model server use cases:

  • Added new non-vision topologies: GNMT, BERT, TDNN (NNet3), ESPNet, etc. to enable machine translation, natural language processing and speech use cases
  • Added support for model in FP16 precision in CPU plugin
  • Performance improvements with CPU execution
  • Added support for multiple new layers and operations

OpenVINO Model Server 2019 R2 release brings additional new capabilities:

  • Added option to change model shape in runtime - it is now possible to change the model input data shapes without recreating it. The model server can also adjust the served model parameters to fit to the input data in the received request. Learn more about it
  • Public docker image is now based on clearlinux. It is expected to bring execution optimization
  • Added support for Intel Movidius™ Myriad™ X VPU HDDL accelerators
  • Added exemplary face detection Python application to demonstrate automatic model shape reconfiguration

You can use a public docker image based on clearlinux base image via a command:

docker pull intelaipg/openvino-model-server:2019.2

OpenVINO Model Server 2019 R1.1

17 Jul 20:16
Compare
Choose a tag to compare

In OpenVINO Model Server 2019 R1.1 there is introduced support for Inference Engine in version 2019 R1.1.
Refer to OpenVINO-Release Notes to learn more about introduced improvements. Most important enhancements are:

  • alignment with Intel® Movidius™ Myriad™ X Development Kit R7 release.
  • support for mPCIe and M.2 form factor versions of Intel® Vision Accelerator Design with Intel® Movidius™ VPUs.
  • Myriad plugin is now available in open source

Release OpenVINO Model Server 2019 R1.1 brings also the following new features and changes:

  • Added RESTful API - all implemented functions can be accessed using gRPC and REST interfaces according to TensorFlow Serving API. Check the client examples and Jupyter notebook to learn how to use the new interface.
  • added exemplary kubeflow pipelines which demo OpenVINO Model Server deployment in Kubernetes and TensorFlow model optimization using Model Optimizer from OpenVINO Toolkit
  • Added implementation of GetModelStatus function - it reports state of served models
  • Model version update can be disabled by setting FILE_SYSTEM_POLL_WAIT_SECONDS to 0 or negative value.
  • Improved error handling for model loading issues like network problems or access permissions
  • Updated versions of python dependencies

You can use a public docker image based on Intel python base image via a command:

docker pull intelaipg/openvino-model-server:2019.1.1

OpenVINO Model Server 2019 R1

10 May 11:43
d74d495
Compare
Choose a tag to compare

There is now changed the naming convention for OpenVINO Model Server versions. It became consistent with OpenVINO SDK release names. It should be now easier to map which version of OVMS is using which inference engine backend.

In OpenVINO Model Server 2019 R1 there is introduced support for Inference Engine in version 2019 R1.
Refer to OpenVINO-RelNotes to learn more about introduced improvements. Most important enhancements are:

  • Added support for many new operations in ONNX*, TensorFlow* and MXNet* frameworks. Topologies like Tiny YOLO v3, full DeepLab v3, bi-direction
  • More than 10 new pre-trained models are added including gaze estimation, action recognition encoder/decoder, text recognition, instance segmentation networks to expand to newer use cases.
  • Improved support for Low-Precision 8-bit Integer inference
  • upgraded mkl-dnn version to v0.18
  • Added support for many new layers, activation types and operations.

Exemplary grpc client has now option to transpose the input data in two directions: NCHW>NHWC and NHWC>NCHW.

Special kudos for @joakimr-axis for his contribution is dockerfiles cleanup and enhancements.

You can use a public docker image based on Intel python base image via a command:
docker pull intelaipg/openvino-model-server:2019.1

OpenVINO Model Server v0.5

27 Mar 15:56
0f9a054
Compare
Choose a tag to compare

Release 0.5 adds the following improvements:

  • added version policy which controls filtering conditions for the served model versions
  • automatic update of served model versions based on file system changes
  • demonstrative Jupyter notebook showing OVMS deployment and evaluation
  • added custom Minio configuration options which support S3 compliant storage providers - PR23
  • added support for anonymous access to S3 and GS cloud storage
  • added support for Movidius stick

You can use a public docker image based on Intel python base image via a command:
docker pull intelaipg/openvino-model-server:0.5