Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got segmentation fault error when using 'InferenceSession' API #11964

Open
baoachun opened this issue Jun 23, 2022 · 5 comments
Open

Got segmentation fault error when using 'InferenceSession' API #11964

baoachun opened this issue Jun 23, 2022 · 5 comments
Labels
core runtime issues related to core runtime

Comments

@baoachun
Copy link

Describe the bug
I'm using onnxruntime Python API to do inference, but there is segmentation fault error when using 'InferenceSession'.
image

Urgency
emergency

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): centos
  • ONNX Runtime installed from (source or binary): pypi
  • ONNX Runtime version: 1.11.0
  • Python version: 3.8.6
  • Visual Studio version (if applicable):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: N
  • GPU model and memory: N

To Reproduce

import onnx
import onnxruntime as ort
import torch
import torchvision

model = torchvision.models.alexnet()
model.eval()
input_names = ['input']
output_names = ['output']
x = torch.randn(1,3,224,224, requires_grad=False)
torch.onnx.export(model, x, 'alexnet.onnx', input_names=input_names, output_names=output_names, verbose='True', opset_version=12)

model_onnx = onnx.load('alexnet.onnx')
onnx.checker.check_model(model_onnx)
session = ort.InferenceSession('alexnet.onnx')

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
gdb message
image

Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

@faxu
Copy link
Contributor

faxu commented Jul 8, 2022

CC @pranavsharma

@TTrapper
Copy link

TTrapper commented Jul 27, 2022

Any updates? I am experiencing the same problem on onnxruntime==1.12.0. When using onnxruntime==1.11.0 it just hangs as described here:

#10166

@pranavsharma
Copy link
Contributor

I cannot repro the issue. I used the exact same python script you've pasted in this issue. I get no segfault.

(mypython3) [pranav@pranav-dev-centos79 ~]$ cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
(mypython3) [pranav@pranav-dev-centos79 ~]$ python -V
Python 3.8.13
(mypython3) [pranav@pranav-dev-centos79 ~]$ pip list | grep onnx
onnx 1.12.0
onnxruntime 1.12.0

@pranavsharma pranavsharma added the core runtime issues related to core runtime label Jul 27, 2022
@TTrapper
Copy link

CentOS Linux release 7.6.1810 (Core)
Python 3.8.1
onnx 1.12.0
onnxruntime 1.12.0

Here is the full output I am getting from the above script. No segfault here, but it does crash:

Exported graph: graph(%input : Float(1, 3, 224, 224, strides=[150528, 50176, 224, 1], requires_grad=0, device=cpu),
      %features.0.weight : Float(64, 3, 11, 11, strides=[363, 121, 11, 1], requires_grad=1, device=cpu),
      %features.0.bias : Float(64, strides=[1], requires_grad=1, device=cpu),
      %features.3.weight : Float(192, 64, 5, 5, strides=[1600, 25, 5, 1], requires_grad=1, device=cpu),
      %features.3.bias : Float(192, strides=[1], requires_grad=1, device=cpu),
      %features.6.weight : Float(384, 192, 3, 3, strides=[1728, 9, 3, 1], requires_grad=1, device=cpu),
      %features.6.bias : Float(384, strides=[1], requires_grad=1, device=cpu),
      %features.8.weight : Float(256, 384, 3, 3, strides=[3456, 9, 3, 1], requires_grad=1, device=cpu),
      %features.8.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %features.10.weight : Float(256, 256, 3, 3, strides=[2304, 9, 3, 1], requires_grad=1, device=cpu),
      %features.10.bias : Float(256, strides=[1], requires_grad=1, device=cpu),
      %classifier.1.weight : Float(4096, 9216, strides=[9216, 1], requires_grad=1, device=cpu),
      %classifier.1.bias : Float(4096, strides=[1], requires_grad=1, device=cpu),
      %classifier.4.weight : Float(4096, 4096, strides=[4096, 1], requires_grad=1, device=cpu),
      %classifier.4.bias : Float(4096, strides=[1], requires_grad=1, device=cpu),
      %classifier.6.weight : Float(1000, 4096, strides=[4096, 1], requires_grad=1, device=cpu),
      %classifier.6.bias : Float(1000, strides=[1], requires_grad=1, device=cpu)):
  %input.1 : Float(1, 64, 55, 55, strides=[193600, 3025, 55, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[11, 11], pads=[2, 2, 2, 2], strides
=[4, 4], onnx_name="Conv_0"](%input, %features.0.weight, %features.0.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.p
y:453:0
  %onnx::MaxPool_18 : Float(1, 64, 55, 55, strides=[193600, 3025, 55, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_1"](%input.1) # /mnt/nlu/users/michael_traynor/onnx
bug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.4 : Float(1, 64, 27, 27, strides=[46656, 729, 27, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx_n
ame="MaxPool_2"](%onnx::MaxPool_18) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
  %input.8 : Float(1, 192, 27, 27, strides=[139968, 729, 27, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[5, 5], pads=[2, 2, 2, 2], strides=[
1, 1], onnx_name="Conv_3"](%input.4, %features.3.weight, %features.3.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.p
y:453:0
  %onnx::MaxPool_21 : Float(1, 192, 27, 27, strides=[139968, 729, 27, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_4"](%input.8) # /mnt/nlu/users/michael_traynor/onnx
bug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.12 : Float(1, 192, 13, 13, strides=[32448, 169, 13, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx
_name="MaxPool_5"](%onnx::MaxPool_21) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
  %input.16 : Float(1, 384, 13, 13, strides=[64896, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_6"](%input.12, %features.6.weight, %features.6.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/conv.
py:453:0
  %onnx::Conv_24 : Float(1, 384, 13, 13, strides=[64896, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_7"](%input.16) # /mnt/nlu/users/michael_traynor/onnxbug
/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.20 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_8"](%onnx::Conv_24, %features.8.weight, %features.8.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/
conv.py:453:0
  %onnx::Conv_26 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_9"](%input.20) # /mnt/nlu/users/michael_traynor/onnxbug
/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.24 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=0, device=cpu) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[3, 3], pads=[1, 1, 1, 1], strides=[
1, 1], onnx_name="Conv_10"](%onnx::Conv_26, %features.10.weight, %features.10.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modul
es/conv.py:453:0
  %onnx::MaxPool_28 : Float(1, 256, 13, 13, strides=[43264, 169, 13, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_11"](%input.24) # /mnt/nlu/users/michael_traynor/onn
xbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.28 : Float(1, 256, 6, 6, strides=[9216, 36, 6, 1], requires_grad=1, device=cpu) = onnx::MaxPool[ceil_mode=0, kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2], onnx_name
="MaxPool_12"](%onnx::MaxPool_28) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:782:0
  %onnx::Flatten_30 : Float(1, 256, 6, 6, strides=[9216, 36, 6, 1], requires_grad=1, device=cpu) = onnx::AveragePool[kernel_shape=[1, 1], strides=[1, 1], onnx_name="AveragePool_13"](%
input.28) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/functional.py:1214:0
  %input.32 : Float(1, 9216, strides=[9216, 1], requires_grad=1, device=cpu) = onnx::Flatten[axis=1, onnx_name="Flatten_14"](%onnx::Flatten_30) # /mnt/nlu/users/michael_traynor/onnxbu
g/venv_torch_onnx/lib/python3.8/site-packages/torchvision/models/alexnet.py:50:0
  %input.36 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_15"](%input.32, %classifier.1.weight, %classifie
r.1.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-

packages/torch/nn/modules/linear.py:114:0
  %onnx::Gemm_33 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_16"](%input.36) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx
/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %input.40 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_17"](%onnx::Gemm_33, %classifier.4.weight, %clas
sifier.4.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  %onnx::Gemm_35 : Float(1, 4096, strides=[4096, 1], requires_grad=1, device=cpu) = onnx::Relu[onnx_name="Relu_18"](%input.40) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx
/lib/python3.8/site-packages/torch/nn/functional.py:1455:0
  %output : Float(1, 1000, strides=[1000, 1], requires_grad=1, device=cpu) = onnx::Gemm[alpha=1., beta=1., transB=1, onnx_name="Gemm_19"](%onnx::Gemm_35, %classifier.6.weight, %classi
fier.6.bias) # /mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/torch/nn/modules/linear.py:114:0
  return (%output)

Traceback (most recent call last):
  File "example_github.py", line 15, in <module>
    session = ort.InferenceSession('alexnet.onnx')
  File "/mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/mnt/nlu/users/michael_traynor/onnxbug/venv_torch_onnx/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 384, in _create_inference_sessio
n
    sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:183 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int (*)(int, Eigen::ThreadPoolI
nterface*), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg:

@Alexander-Mark
Copy link

Still encountering this issue and can recreate with different numpy versions (I just pin onnxruntime = "==1.16.3" for Cent OS 7 compatibility).

This produces seg fault:

numpy = "==2.0.0"
onnxruntime = "==1.16.3"

This does not:

numpy = "==1.26.4"
onnxruntime = "==1.16.3"

The relevant trace is:

Fatal Python error: Segmentation fault

Current thread 0x000078c3488bb000 (most recent call first):
File ".../python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220 in run

I trigger this error with the following:

session = onnxruntime.InferenceSession("models/model.onnx")
tokenizer = AutoTokenizer.from_pretrained("models/")
inputs = tokenizer(
            texts,
            padding=True,
            truncation=True,
            return_attention_mask=True,
            return_token_type_ids=True,
            return_tensors="np",
        )
preds = session.run(None, dict(inputs))[0]

Sorry I don't have time to dig into this issue further for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime
Projects
None yet
Development

No branches or pull requests

5 participants