Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate ML inference with ONNX Runtime #27458

Closed
hqucms opened this issue Jul 8, 2019 · 16 comments
Closed

Accelerate ML inference with ONNX Runtime #27458

hqucms opened this issue Jul 8, 2019 · 16 comments

Comments

@hqucms
Copy link
Contributor

hqucms commented Jul 8, 2019

Currently the inference of DNN models (DeepJet, DeepAK8, DeepTauID etc.) in CMSSW typically relies on the original training frameworks (TensorFlow or MXNet). However, these frameworks are typically optimized more towards (GPU) training rather than (CPU) inference, therefore may not always provide the best performance for inference.

ONNX runtime is a performance-focused inference engine for Open Neural Network Exchange (ONNX) models. It might be interesting to exploit it for ML inference in CMSSW for a few reasons:

  • The ONNX format supports conversion from many of the mainstream frameworks (TF, Keras, Pytorch, MXNet etc.) for most of the common/conventional DNN operators (Dense, Conv, RNN, etc.) therefore ONNX runtime can cover models from more training frameworks (e.g., PyTorch) than the current ones we have (TF and MXNet).

  • ONNX runtime is optimized for inference (including on CPUs). Some preliminary tests show that it can bring ~1.5x speed-up compared to MXNet+OpenBLAS for DeepAK8. More interestingly, it seems to bring ~3-5x speed-up for the AK4 DeepJet model compared to TensorFlow. Might be interesting to see what we will get for DeepTauID.

  • ONNX runtime is designed to be thread-safe (https://github.com/Microsoft/onnxruntime/blob/master/docs/HighLevelDesign.md#key-design-decisions).

Of course one obvious drawback is that, as new DNN models/operators are being proposed constantly, ONNX format will not be able support all of them (or it takes time). In that case the only choice is probably to use the original training framework for inference.

Related issues: #25230, #27119

@slava77 @perrotta @davidlange6 @mverzett @mbluj

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2019

A new Issue was created by @hqucms Huilin Qu.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor

The ability to use the same inference session objects from mulitple threads is extremely useful for CMSSW.

@slava77
Copy link
Contributor

slava77 commented Jul 8, 2019

assign reconstruction

@hqucms
I recall that ONNX was a part of the discussion for an alternative to mxnet last year, but was not selected due to insufficient support.
Has it evolved since then to provide needed support?

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 8, 2019

New categories assigned: reconstruction

@slava77,@perrotta you have been requested to review this Pull request/Issue and eventually sign? Thanks

@hqucms
Copy link
Contributor Author

hqucms commented Jul 8, 2019

I recall that ONNX was a part of the discussion for an alternative to mxnet last year, but was not selected due to insufficient support.
Has it evolved since then to provide needed support?

@slava77 Yes -- actually this ONNX runtime was first released Nov 29, 2018, after the discussion which I think happened in ~summer last year.

@mverzett
Copy link
Contributor

mverzett commented Jul 8, 2019

assign reconstruction

@hqucms
I recall that ONNX was a part of the discussion for an alternative to mxnet last year, but was not selected due to insufficient support.
Has it evolved since then to provide needed support?

@slava77, the discussion at the time was about converting DeepAK8 to ONNX and have it read by TF, which at the time turned out to be impossible.

@hqucms is ONNX finally also supported by TF officially? If so, that's a great news.

@hqucms
Copy link
Contributor Author

hqucms commented Jul 8, 2019

@hqucms is ONNX finally also supported by TF officially? If so, that's a great news.

@mverzett I am not sure if it's "officially" but the support seems indeed improved quite a bit since then.

@mbluj
Copy link
Contributor

mbluj commented Jul 8, 2019

FYI, @kandrosov

@slava77
Copy link
Contributor

slava77 commented Sep 20, 2019

@hqucms
do we have the externals in place already to try using ONNX?
What is the plan for deepAK8 (since you might know the most about it)?

@hqucms
Copy link
Contributor Author

hqucms commented Sep 20, 2019

@slava77
ONNXRuntime 0.5.0 has been integrated in the externals in cms-sw/cmsdist#5080. I did some tests, in general things look good, but from there I noticed that the threading model does not seem to fit with CMSSW very well (basically, each session manages its own thread pool). However, there has been some recent developments (microsoft/onnxruntime#1609, microsoft/onnxruntime#1647, microsoft/onnxruntime#1841) to refine the threading model in ONNXRuntime, and it looks like it's getting very close to being able to support also a no-thread mode, which is probably what we want. I will give it a try once I find some time.

@hqucms
Copy link
Contributor Author

hqucms commented Oct 3, 2019

A preliminary implementation is made in #28112.

@hqucms
Copy link
Contributor Author

hqucms commented Oct 8, 2019

I also tried this on DeepTau, but with the current configuration (enabling only SSE), ONNXRuntime is actually a bit slower than the current TF backend. However, if we enable AVX, then ONNXRuntime is ~1.5 - 2x faster (seems faster than TF w/ AVX enabled), plus that ONNXRuntime can fall back to SSE on CPUs w/o AVX at runtime (TF built w/ AVX will fail in this case).

@slava77
Copy link
Contributor

slava77 commented Oct 8, 2019 via email

@hqucms
Copy link
Contributor Author

hqucms commented Oct 8, 2019

@slava77
Yes, unfortunately that's still the case.

@slava77
Copy link
Contributor

slava77 commented Sep 5, 2020

If I'm not mistaken, this issue is essentially addressed
What is still missing?

@hqucms
please summarize what's left to be done, if any.

@hqucms
Copy link
Contributor Author

hqucms commented Sep 6, 2020

@slava77

No, I think nothing is missing. I will close the issue.

@hqucms hqucms closed this as completed Sep 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants