Got segmentation fault error when using 'InferenceSession' API #11964

baoachun · 2022-06-23T10:07:41Z

Describe the bug
I'm using onnxruntime Python API to do inference, but there is segmentation fault error when using 'InferenceSession'.

Urgency
emergency

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): centos
ONNX Runtime installed from (source or binary): pypi
ONNX Runtime version: 1.11.0
Python version: 3.8.6
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: N
GPU model and memory: N

To Reproduce

import onnx
import onnxruntime as ort
import torch
import torchvision

model = torchvision.models.alexnet()
model.eval()
input_names = ['input']
output_names = ['output']
x = torch.randn(1,3,224,224, requires_grad=False)
torch.onnx.export(model, x, 'alexnet.onnx', input_names=input_names, output_names=output_names, verbose='True', opset_version=12)

model_onnx = onnx.load('alexnet.onnx')
onnx.checker.check_model(model_onnx)
session = ort.InferenceSession('alexnet.onnx')

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
gdb message

Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

The text was updated successfully, but these errors were encountered:

faxu · 2022-07-08T21:56:45Z

CC @pranavsharma

TTrapper · 2022-07-27T16:37:56Z

Any updates? I am experiencing the same problem on onnxruntime==1.12.0. When using onnxruntime==1.11.0 it just hangs as described here:

#10166

pranavsharma · 2022-07-27T22:17:39Z

I cannot repro the issue. I used the exact same python script you've pasted in this issue. I get no segfault.

(mypython3) [pranav@pranav-dev-centos79 ~]$ cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
(mypython3) [pranav@pranav-dev-centos79 ~]$ python -V
Python 3.8.13
(mypython3) [pranav@pranav-dev-centos79 ~]$ pip list | grep onnx
onnx 1.12.0
onnxruntime 1.12.0

TTrapper · 2022-07-29T16:22:52Z

CentOS Linux release 7.6.1810 (Core)
Python 3.8.1
onnx 1.12.0
onnxruntime 1.12.0

Here is the full output I am getting from the above script. No segfault here, but it does crash:

Exported graph: graph(%input : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu),
      %features.0.weight : Float(64, 3, 11, 11, strides=[363, 121, 11, 1], requires_grad=1, device=cpu),
      %features.0.bias : Float(64, strides=[1], requires_grad=1, device=cpu),
      %features.3.weight : Float(192, 64, 5, 5, strides=[1600, 25, 5, 1], requires_grad=1, device=cpu),
      %features.3.bias : Float(192, strides=[1], requires_grad=1, device=cpu),
      %features.6.weight : Float(384, 192, 3, 3, strides=[1728, 9, 3, 1], requires_grad=1, device=cpu),
      %features.6.bias : Float(384, strides=[1], requires_grad=1, device=cpu),
      %features.8.weight : Float(256, 384, 3, 3, strides=[3456, 9, 3, 1], requires_grad=1, device=cpu),
      %features.8.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %features.10.weight : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=1, device=cpu),
      %features.10.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %classifier.1.weight : Float(4096, 9216, strides=[9216, 1], requires_grad=1, device=cpu),
      %classifier.1.bias : Float(4096, strides=[1], requires_grad=1, device=cpu),
      %classifier.4.weight : Float(4096, 4096, strides=[4096, 1], requires_grad=1, device=cpu),
      %classifier.4.bias : Float(4096, strides=[1], requires_grad=1, device=cpu),
      %classifier.6.weight : Float(1000, 4096, strides=[4096, 1], requires_grad=1, device=cpu),
      %classifier.6.bias : Float(1000, strides=[1], requires_grad=1, device=cpu)):
  %input.1 : Float(1, 64, 55, 55, strides=[193600, 3025, 55, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[11, 11], pads=[2, 2, 2, 2], strides
=[4, 4], onnx_name="Conv_0"](%input, %features.0.weight, %features.0.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.p
y:453:0
  %onnx::MaxPool_18 : Float(1, 64, 55, 55, strides=[193600, 3025, 55, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_1"](%input.1) # /mnt/nlu/users/michael_traynor/onnx
bug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.4 : Float(1, 64, 27, 27, strides=[46656, 729, 27, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx_n
ame="MaxPool_2"](%onnx::MaxPool_18) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
  %input.8 : Float(1, 192, 27, 27, strides=[139968, 729, 27, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[5, 5], pads=[2, 2, 2, 2], strides=[
1, 1], onnx_name="Conv_3"](%input.4, %features.3.weight, %features.3.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.p
y:453:0
  %onnx::MaxPool_21 : Float(1, 192, 27, 27, strides=[139968, 729, 27, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_4"](%input.8) # /mnt/nlu/users/michael_traynor/onnx
bug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.12 : Float(1, 192, 13, 13, strides=[32448, 169, 13, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx
_name="MaxPool_5"](%onnx::MaxPool_21) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
  %input.16 : Float(1, 384, 13, 13, strides=[64896, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_6"](%input.12, %features.6.weight, %features.6.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.
py:453:0
  %onnx::Conv_24 : Float(1, 384, 13, 13, strides=[64896, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_7"](%input.16) # /mnt/nlu/users/michael_traynor/onnxbug
/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.20 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_8"](%onnx::Conv_24, %features.8.weight, %features.8.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/
conv.py:453:0
  %onnx::Conv_26 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_9"](%input.20) # /mnt/nlu/users/michael_traynor/onnxbug
/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.24 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_10"](%onnx::Conv_26, %features.10.weight, %features.10.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modul
es/conv.py:453:0
  %onnx::MaxPool_28 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_11"](%input.24) # /mnt/nlu/users/michael_traynor/onn
xbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.28 : Float(1, 256, 6, 6, strides=[9216, 36, 6, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx_name
="MaxPool_12"](%onnx::MaxPool_28) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
  %onnx::Flatten_30 : Float(1, 256, 6, 6, strides=[9216, 36, 6, 1], requires_grad=1, device=cpu) = onnx::AveragePool[kernel_shape=[1, 1], strides=[1, 1], onnx_name="AveragePool_13"](%
input.28) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1214:0
  %input.32 : Float(1, 9216, strides=[9216, 1], requires_grad=1, device=cpu) = onnx::Flatten[axis=1, onnx_name="Flatten_14"](%onnx::Flatten_30) # /mnt/nlu/users/michael_traynor/onnxbu
g/venv_torch_onnx/lib/python3.8/site-packages/torchvision/models/alexnet.py:50:0
  %input.36 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_15"](%input.32, %classifier.1.weight, %classifie
r.1.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-

packages/torch/nn/modules/linear.py:114:0
  %onnx::Gemm_33 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_16"](%input.36) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx
/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.40 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_17"](%onnx::Gemm_33, %classifier.4.weight, %clas
sifier.4.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %onnx::Gemm_35 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_18"](%input.40) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx
/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %output : Float(1, 1000, strides=[1000, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_19"](%onnx::Gemm_35, %classifier.6.weight, %classi
fier.6.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  return (%output)

Traceback (most recent call last):
  File "example_github.py", line 15, in <module>
    session = ort.InferenceSession('alexnet.onnx')
  File "/mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 384, in _create_inference_sessio
n
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:183 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int (*)(int, Eigen::ThreadPoolI
nterface*), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg:

Alexander-Mark · 2024-06-25T06:26:09Z

Still encountering this issue and can recreate with different numpy versions (I just pin onnxruntime = "==1.16.3" for Cent OS 7 compatibility).

This produces seg fault:

numpy = "==2.0.0"
onnxruntime = "==1.16.3"

This does not:

numpy = "==1.26.4"
onnxruntime = "==1.16.3"

The relevant trace is:

Fatal Python error: Segmentation fault

Current thread 0x000078c3488bb000 (most recent call first):
File ".../python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220 in run

I trigger this error with the following:

session = onnxruntime.InferenceSession("models/model.onnx")
tokenizer = AutoTokenizer.from_pretrained("models/")
inputs = tokenizer(
            texts,
            padding=True,
            truncation=True,
            return_attention_mask=True,
            return_token_type_ids=True,
            return_tensors="np",
        )
preds = session.run(None, dict(inputs))[0]

Sorry I don't have time to dig into this issue further for you.

pranavsharma added the core runtime issues related to core runtime label Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got segmentation fault error when using 'InferenceSession' API #11964

Got segmentation fault error when using 'InferenceSession' API #11964

baoachun commented Jun 23, 2022

faxu commented Jul 8, 2022

TTrapper commented Jul 27, 2022 •

edited

Loading

pranavsharma commented Jul 27, 2022

TTrapper commented Jul 29, 2022

Alexander-Mark commented Jun 25, 2024

Got segmentation fault error when using 'InferenceSession' API #11964

Got segmentation fault error when using 'InferenceSession' API #11964

Comments

baoachun commented Jun 23, 2022

faxu commented Jul 8, 2022

TTrapper commented Jul 27, 2022 • edited Loading

pranavsharma commented Jul 27, 2022

TTrapper commented Jul 29, 2022

Alexander-Mark commented Jun 25, 2024

TTrapper commented Jul 27, 2022 •

edited

Loading