Accelerate ML inference with ONNX Runtime #27458

hqucms · 2019-07-08T13:11:17Z

Currently the inference of DNN models (DeepJet, DeepAK8, DeepTauID etc.) in CMSSW typically relies on the original training frameworks (TensorFlow or MXNet). However, these frameworks are typically optimized more towards (GPU) training rather than (CPU) inference, therefore may not always provide the best performance for inference.

ONNX runtime is a performance-focused inference engine for Open Neural Network Exchange (ONNX) models. It might be interesting to exploit it for ML inference in CMSSW for a few reasons:

The ONNX format supports conversion from many of the mainstream frameworks (TF, Keras, Pytorch, MXNet etc.) for most of the common/conventional DNN operators (Dense, Conv, RNN, etc.) therefore ONNX runtime can cover models from more training frameworks (e.g., PyTorch) than the current ones we have (TF and MXNet).
ONNX runtime is optimized for inference (including on CPUs). Some preliminary tests show that it can bring ~1.5x speed-up compared to MXNet+OpenBLAS for DeepAK8. More interestingly, it seems to bring ~3-5x speed-up for the AK4 DeepJet model compared to TensorFlow. Might be interesting to see what we will get for DeepTauID.
ONNX runtime is designed to be thread-safe (https://github.com/Microsoft/onnxruntime/blob/master/docs/HighLevelDesign.md#key-design-decisions).

Of course one obvious drawback is that, as new DNN models/operators are being proposed constantly, ONNX format will not be able support all of them (or it takes time). In that case the only choice is probably to use the original training framework for inference.

Related issues: #25230, #27119

@slava77 @perrotta @davidlange6 @mverzett @mbluj

cmsbuild · 2019-07-08T13:11:43Z

A new Issue was created by @hqucms Huilin Qu.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

Dr15Jones · 2019-07-08T13:23:55Z

The ability to use the same inference session objects from mulitple threads is extremely useful for CMSSW.

slava77 · 2019-07-08T13:51:46Z

assign reconstruction

@hqucms
I recall that ONNX was a part of the discussion for an alternative to mxnet last year, but was not selected due to insufficient support.
Has it evolved since then to provide needed support?

cmsbuild · 2019-07-08T13:52:08Z

New categories assigned: reconstruction

@slava77,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

hqucms · 2019-07-08T13:55:31Z

I recall that ONNX was a part of the discussion for an alternative to mxnet last year, but was not selected due to insufficient support.
Has it evolved since then to provide needed support?

@slava77 Yes -- actually this ONNX runtime was first released Nov 29, 2018, after the discussion which I think happened in ~summer last year.

mverzett · 2019-07-08T14:00:16Z

assign reconstruction

@hqucms
I recall that ONNX was a part of the discussion for an alternative to mxnet last year, but was not selected due to insufficient support.
Has it evolved since then to provide needed support?

@slava77, the discussion at the time was about converting DeepAK8 to ONNX and have it read by TF, which at the time turned out to be impossible.

@hqucms is ONNX finally also supported by TF officially? If so, that's a great news.

hqucms · 2019-07-08T14:05:23Z

@hqucms is ONNX finally also supported by TF officially? If so, that's a great news.

@mverzett I am not sure if it's "officially" but the support seems indeed improved quite a bit since then.

mbluj · 2019-07-08T14:20:23Z

FYI, @kandrosov

slava77 · 2019-09-20T09:56:33Z

@hqucms
do we have the externals in place already to try using ONNX?
What is the plan for deepAK8 (since you might know the most about it)?

hqucms · 2019-09-20T10:15:25Z

@slava77
ONNXRuntime 0.5.0 has been integrated in the externals in cms-sw/cmsdist#5080. I did some tests, in general things look good, but from there I noticed that the threading model does not seem to fit with CMSSW very well (basically, each session manages its own thread pool). However, there has been some recent developments (microsoft/onnxruntime#1609, microsoft/onnxruntime#1647, microsoft/onnxruntime#1841) to refine the threading model in ONNXRuntime, and it looks like it's getting very close to being able to support also a no-thread mode, which is probably what we want. I will give it a try once I find some time.

hqucms · 2019-10-03T16:55:58Z

A preliminary implementation is made in #28112.

hqucms · 2019-10-08T16:43:58Z

I also tried this on DeepTau, but with the current configuration (enabling only SSE), ONNXRuntime is actually a bit slower than the current TF backend. However, if we enable AVX, then ONNXRuntime is ~1.5 - 2x faster (seems faster than TF w/ AVX enabled), plus that ONNXRuntime can fall back to SSE on CPUs w/o AVX at runtime (TF built w/ AVX will fail in this case).

slava77 · 2019-10-08T18:39:28Z

On 10/8/19 6:43 PM, Huilin Qu wrote: I also tried this on DeepTau, but with the current configuration (enabling only SSE), ONNXRuntime is actually a bit slower than the current TF backend. However, if we enable AVX, then ONNXRuntime is ~1.5 - 2x faster (seems faster than TF w/ AVX enabled), plus that ONNXRuntime can fall back to SSE on CPUs w/o AVX at runtime (TF built w/ AVX will fail in this case).

We observed (or alleged) previously that the dynamic option leads do different results executed on different architectures. Does this issue still persist?

hqucms · 2019-10-08T18:55:51Z

@slava77
Yes, unfortunately that's still the case.

slava77 · 2020-09-05T01:59:38Z

If I'm not mistaken, this issue is essentially addressed
What is still missing?

@hqucms
please summarize what's left to be done, if any.

hqucms · 2020-09-06T00:49:33Z

@slava77

No, I think nothing is missing. I will close the issue.

cmsbuild added the pending-assignment label Jul 8, 2019

hqucms mentioned this issue Jul 8, 2019

reduce ML inference time in b-tag related jet taggers #25230

Closed

hqucms mentioned this issue Jul 8, 2019

Add onnxruntime cms-sw/cmsdist#5080

Merged

cmsbuild added pending-signatures reconstruction-pending and removed pending-assignment labels Jul 8, 2019

hqucms mentioned this issue Oct 3, 2019

ONNXRuntime-based implementation of DeepJet, DeepAK8 and DeepDoubleX #28112

Merged

hqucms closed this as completed Sep 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate ML inference with ONNX Runtime #27458

Accelerate ML inference with ONNX Runtime #27458

hqucms commented Jul 8, 2019

cmsbuild commented Jul 8, 2019

Dr15Jones commented Jul 8, 2019

slava77 commented Jul 8, 2019

cmsbuild commented Jul 8, 2019

hqucms commented Jul 8, 2019

mverzett commented Jul 8, 2019

hqucms commented Jul 8, 2019 •

edited

Loading

mbluj commented Jul 8, 2019

slava77 commented Sep 20, 2019

hqucms commented Sep 20, 2019

hqucms commented Oct 3, 2019

hqucms commented Oct 8, 2019

slava77 commented Oct 8, 2019 via email

hqucms commented Oct 8, 2019

slava77 commented Sep 5, 2020

hqucms commented Sep 6, 2020

Accelerate ML inference with ONNX Runtime #27458

Accelerate ML inference with ONNX Runtime #27458

Comments

hqucms commented Jul 8, 2019

cmsbuild commented Jul 8, 2019

Dr15Jones commented Jul 8, 2019

slava77 commented Jul 8, 2019

cmsbuild commented Jul 8, 2019

hqucms commented Jul 8, 2019

mverzett commented Jul 8, 2019

hqucms commented Jul 8, 2019 • edited Loading

mbluj commented Jul 8, 2019

slava77 commented Sep 20, 2019

hqucms commented Sep 20, 2019

hqucms commented Oct 3, 2019

hqucms commented Oct 8, 2019

slava77 commented Oct 8, 2019 via email

hqucms commented Oct 8, 2019

slava77 commented Sep 5, 2020

hqucms commented Sep 6, 2020

hqucms commented Jul 8, 2019 •

edited

Loading