onnx use more memory than pytorch for some model #16264
Labels
ep:CUDA
issues related to the CUDA execution provider
model:transformer
issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.
Describe the issue
cuda 10.2
onnx=1.8
onnxruntime-gpu=1.6
For sequnce labeling task (input the token ids, output the start_pos, end_pos), the pytorch use 1.8G, but onnx use 1.9G (although the onnx inference speed is faster). --- torch 1.10, bert base fine-tuning
For text classification task, the pytoch use 2.2G, onnx just use 0.8G. -- torch 1.9.0, roberta_base fine-tuning
To reproduce
I am use this script and datasets sequence labeling, and running just five epoch.
Then convert the torch model to onnx model.
Urgency
No response
Platform
Linux
OS Version
ubuntu 18
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.6
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: