XGBoost onnx backend inference not working with a single example #676

jfrery · 2023-01-17T10:34:49Z

Here is the code to reproduce:

import numpy
from xgboost.sklearn import XGBClassifier
from hummingbird.ml import convert

x, y = numpy.random.randn(1000, 10), numpy.random.randint(0, 2, 1000)
clf = XGBClassifier(n_estimators=20)
clf.fit(x, y)
extra_config = {
    "tree_implementation": "gemm",
    "onnx_target_opset": 14,
}
extra_config["n_features"] = x.shape[1]
onnx_model = convert(
    clf,
    backend="onnx",
    test_input=x[:10],
    extra_config=extra_config,
)
onnx_model.predict(x[:2])  # Works as expected
onnx_model.predict(x[:1])  # Does not work

And here is the error thrown by onnxruntime.

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Transpose node. Name:'/_operators.0/Transpose' Status Message: perm: [ 1 0 ] does not align with rank of input data: 1

Looking at the onnx graph, it seems that there is a squeeze operation with axis=None. With a single example, the input data shape would be (n_trees, 1, n_examples). Without the axis being specified, for a single example, the last 2 dimensions are removed which then doesn't match the transpose operation.

The text was updated successfully, but these errors were encountered:

stillmatic · 2023-01-17T15:19:27Z

I can reproduce this bug and am running into it myself as well. I ran the following code (just changing the input array to onnx_model from 10 to 20 for clarity, so it's not the same as the length of the input array).

import numpy
from xgboost.sklearn import XGBClassifier
from hummingbird.ml import convert

x, y = numpy.random.randn(1000, 10), numpy.random.randint(0, 2, 1000)
clf = XGBClassifier(n_estimators=20)
clf.fit(x, y)
extra_config = {
    "tree_implementation": "gemm",
    "onnx_target_opset": 14,
}
extra_config["n_features"] = x.shape[1]
onnx_model = convert(
    clf,
    backend="onnx",
    test_input=x[:20],
    extra_config=extra_config,
)

Notably, looking at this onnx_model.graph.output we see that while the first output variable (the labels) correctly has a symbolic dimension, the second output variable (predictions) does not have a symbolic dimension.

[name: "variable"
type {
  tensor_type {
    elem_type: 7
    shape {
      dim {
        dim_param: "sym"
      }
    }
  }
}
, name: "onnx::ArgMax_31"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 20
      }
      dim {
        dim_value: 2
      }
    }
  }
}
]

When running predict, you should also see a warning:

2023-01-17 15:17:37.789612378 [W:onnxruntime:, execution_frame.cc:828 VerifyOutputSizes] Expected shape from model of {20} does not match actual shape of {2} for output variable

I believe that this issue is related to #656 -- the output of the model is assumed to be a single output, instead of labels and predictions. Adding some debug logic around

hummingbird/hummingbird/ml/_topology.py

Line 260 in 36ebab4

dynamic_axes_cfg = {

, to print the input and output names yields:

input: ['input_0']
output: ['variable']

So we see that variable - the only output - gets correctly modified to have the symbolic dimension but the second doesn't. It's probably possible for end users to fix this in post (manually editing outputted ONNX graph) but I think it's worth fixing in the core library.

stillmatic · 2023-01-17T17:42:35Z

hm, hacking a fix into the outputs setup, such that

onnx_model.model.graph.output
Out[10]:
[name: "labels"
type {
  tensor_type {
    elem_type: 7
    shape {
      dim {
        dim_param: "sym"
      }
    }
  }
}
, name: "predictions"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_param: "sym"
      }
      dim {
        dim_value: 2
      }
    }
  }
}
]

still yields the same error as in OP. This error also presents in v0.4.6 of Hummingbird, so it's before the change to dynamic_axes when converting.

For reference, the mentioned bad Transpose node in the graph

node {
  input: "/_operators.0/Squeeze_output_0"
  output: "/_operators.0/Transpose_output_0"
  name: "/_operators.0/Transpose"
  op_type: "Transpose"
  attribute {
    name: "perm"
    ints: 1
    ints: 0
    type: INTS
  }
}

backend torch works correctly

torch_model = convert(
    clf,
    backend="torch",
    test_input=x[:20],
    extra_config=extra_config,
)
torch_model.predict_proba(x[:2])
torch_model.predict_proba(x[:1])

so my sense is that this is a bug in the torch -> onnx export path, since torch is the intermediate representation for the ONNX conversion anyways

narrowing further, if you use tree_trav instead of gemm, the code works. so likely a problem in the GEMM -> ONNX conversion

interesaaat · 2023-01-17T18:28:38Z

It is not the first time we are having troubles with GEMM and ONNX. @jfrery can you please change the tree implementation from gemm to tree_trav or perf_tree_trav . You can pass extra_config={"tree_implementation": "tree_trav"} at conversion time to switch to tree_trav, for example. Closing this.

jfrery · 2023-01-17T18:40:53Z

Actually we need the Gemm implementation. Maybe we can reopen and we will try to fix this?

interesaaat · 2023-01-17T18:50:52Z

It looks that this is more a problem with the onnx export in pytorch since the torch converter works. Can you open an issue with them showing that torch work, while onnx export fails? We can then use this issue to track the downstream one.

ksaur added the bug Something isn't working label Jan 17, 2023

interesaaat mentioned this issue Jan 17, 2023

Should SKLearn operators be assumed to produce a single output? #656

Open

interesaaat closed this as completed Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XGBoost onnx backend inference not working with a single example #676

XGBoost onnx backend inference not working with a single example #676

jfrery commented Jan 17, 2023

stillmatic commented Jan 17, 2023

stillmatic commented Jan 17, 2023 •

edited

Loading

interesaaat commented Jan 17, 2023

jfrery commented Jan 17, 2023

interesaaat commented Jan 17, 2023 •

edited

Loading

XGBoost onnx backend inference not working with a single example #676

XGBoost onnx backend inference not working with a single example #676

Comments

jfrery commented Jan 17, 2023

stillmatic commented Jan 17, 2023

stillmatic commented Jan 17, 2023 • edited Loading

interesaaat commented Jan 17, 2023

jfrery commented Jan 17, 2023

interesaaat commented Jan 17, 2023 • edited Loading

stillmatic commented Jan 17, 2023 •

edited

Loading

interesaaat commented Jan 17, 2023 •

edited

Loading