-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate ML inference with ONNX Runtime #27458
Comments
A new Issue was created by @hqucms Huilin Qu. @davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
The ability to use the same inference session objects from mulitple threads is extremely useful for CMSSW. |
assign reconstruction @hqucms |
@slava77 Yes -- actually this ONNX runtime was first released Nov 29, 2018, after the discussion which I think happened in ~summer last year. |
@slava77, the discussion at the time was about converting DeepAK8 to ONNX and have it read by TF, which at the time turned out to be impossible. @hqucms is ONNX finally also supported by TF officially? If so, that's a great news. |
FYI, @kandrosov |
@hqucms |
@slava77 |
A preliminary implementation is made in #28112. |
I also tried this on DeepTau, but with the current configuration (enabling only SSE), ONNXRuntime is actually a bit slower than the current TF backend. However, if we enable AVX, then ONNXRuntime is ~1.5 - 2x faster (seems faster than TF w/ AVX enabled), plus that ONNXRuntime can fall back to SSE on CPUs w/o AVX at runtime (TF built w/ AVX will fail in this case). |
On 10/8/19 6:43 PM, Huilin Qu wrote:
I also tried this on DeepTau, but with the current configuration
(enabling only SSE), ONNXRuntime is actually a bit slower than the
current TF backend. However, if we enable AVX, then ONNXRuntime is ~1.5
- 2x faster (seems faster than TF w/ AVX enabled), plus that ONNXRuntime
can fall back to SSE on CPUs w/o AVX at runtime (TF built w/ AVX will
fail in this case).
We observed (or alleged) previously that the dynamic option leads do
different results executed on different architectures.
Does this issue still persist?
|
@slava77 |
If I'm not mistaken, this issue is essentially addressed @hqucms |
No, I think nothing is missing. I will close the issue. |
Currently the inference of DNN models (DeepJet, DeepAK8, DeepTauID etc.) in CMSSW typically relies on the original training frameworks (TensorFlow or MXNet). However, these frameworks are typically optimized more towards (GPU) training rather than (CPU) inference, therefore may not always provide the best performance for inference.
ONNX runtime is a performance-focused inference engine for Open Neural Network Exchange (ONNX) models. It might be interesting to exploit it for ML inference in CMSSW for a few reasons:
The ONNX format supports conversion from many of the mainstream frameworks (TF, Keras, Pytorch, MXNet etc.) for most of the common/conventional DNN operators (Dense, Conv, RNN, etc.) therefore ONNX runtime can cover models from more training frameworks (e.g., PyTorch) than the current ones we have (TF and MXNet).
ONNX runtime is optimized for inference (including on CPUs). Some preliminary tests show that it can bring ~1.5x speed-up compared to MXNet+OpenBLAS for DeepAK8. More interestingly, it seems to bring ~3-5x speed-up for the AK4 DeepJet model compared to TensorFlow. Might be interesting to see what we will get for DeepTauID.
ONNX runtime is designed to be thread-safe (https://github.com/Microsoft/onnxruntime/blob/master/docs/HighLevelDesign.md#key-design-decisions).
Of course one obvious drawback is that, as new DNN models/operators are being proposed constantly, ONNX format will not be able support all of them (or it takes time). In that case the only choice is probably to use the original training framework for inference.
Related issues: #25230, #27119
@slava77 @perrotta @davidlange6 @mverzett @mbluj
The text was updated successfully, but these errors were encountered: