Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost onnx backend inference not working with a single example #676

Closed
jfrery opened this issue Jan 17, 2023 · 5 comments
Closed

XGBoost onnx backend inference not working with a single example #676

jfrery opened this issue Jan 17, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@jfrery
Copy link
Contributor

jfrery commented Jan 17, 2023

Here is the code to reproduce:

import numpy
from xgboost.sklearn import XGBClassifier
from hummingbird.ml import convert

x, y = numpy.random.randn(1000, 10), numpy.random.randint(0, 2, 1000)
clf = XGBClassifier(n_estimators=20)
clf.fit(x, y)
extra_config = {
    "tree_implementation": "gemm",
    "onnx_target_opset": 14,
}
extra_config["n_features"] = x.shape[1]
onnx_model = convert(
    clf,
    backend="onnx",
    test_input=x[:10],
    extra_config=extra_config,
)
onnx_model.predict(x[:2])  # Works as expected
onnx_model.predict(x[:1])  # Does not work

And here is the error thrown by onnxruntime.

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Transpose node. Name:'/_operators.0/Transpose' Status Message: perm: [ 1 0 ] does not align with rank of input data: 1

Looking at the onnx graph, it seems that there is a squeeze operation with axis=None. With a single example, the input data shape would be (n_trees, 1, n_examples). Without the axis being specified, for a single example, the last 2 dimensions are removed which then doesn't match the transpose operation.

@stillmatic
Copy link
Contributor

I can reproduce this bug and am running into it myself as well. I ran the following code (just changing the input array to onnx_model from 10 to 20 for clarity, so it's not the same as the length of the input array).

import numpy
from xgboost.sklearn import XGBClassifier
from hummingbird.ml import convert

x, y = numpy.random.randn(1000, 10), numpy.random.randint(0, 2, 1000)
clf = XGBClassifier(n_estimators=20)
clf.fit(x, y)
extra_config = {
    "tree_implementation": "gemm",
    "onnx_target_opset": 14,
}
extra_config["n_features"] = x.shape[1]
onnx_model = convert(
    clf,
    backend="onnx",
    test_input=x[:20],
    extra_config=extra_config,
)

Notably, looking at this onnx_model.graph.output we see that while the first output variable (the labels) correctly has a symbolic dimension, the second output variable (predictions) does not have a symbolic dimension.

[name: "variable"
type {
  tensor_type {
    elem_type: 7
    shape {
      dim {
        dim_param: "sym"
      }
    }
  }
}
, name: "onnx::ArgMax_31"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_value: 20
      }
      dim {
        dim_value: 2
      }
    }
  }
}
]

When running predict, you should also see a warning:

2023-01-17 15:17:37.789612378 [W:onnxruntime:, execution_frame.cc:828 VerifyOutputSizes] Expected shape from model of {20} does not match actual shape of {2} for output variable

I believe that this issue is related to #656 -- the output of the model is assumed to be a single output, instead of labels and predictions. Adding some debug logic around

dynamic_axes_cfg = {
, to print the input and output names yields:

input: ['input_0']
output: ['variable']

So we see that variable - the only output - gets correctly modified to have the symbolic dimension but the second doesn't. It's probably possible for end users to fix this in post (manually editing outputted ONNX graph) but I think it's worth fixing in the core library.

@ksaur ksaur added the bug Something isn't working label Jan 17, 2023
@stillmatic
Copy link
Contributor

stillmatic commented Jan 17, 2023

hm, hacking a fix into the outputs setup, such that

onnx_model.model.graph.output
Out[10]:
[name: "labels"
type {
  tensor_type {
    elem_type: 7
    shape {
      dim {
        dim_param: "sym"
      }
    }
  }
}
, name: "predictions"
type {
  tensor_type {
    elem_type: 1
    shape {
      dim {
        dim_param: "sym"
      }
      dim {
        dim_value: 2
      }
    }
  }
}
]

still yields the same error as in OP. This error also presents in v0.4.6 of Hummingbird, so it's before the change to dynamic_axes when converting.

For reference, the mentioned bad Transpose node in the graph

node {
  input: "/_operators.0/Squeeze_output_0"
  output: "/_operators.0/Transpose_output_0"
  name: "/_operators.0/Transpose"
  op_type: "Transpose"
  attribute {
    name: "perm"
    ints: 1
    ints: 0
    type: INTS
  }
}

backend torch works correctly

torch_model = convert(
    clf,
    backend="torch",
    test_input=x[:20],
    extra_config=extra_config,
)
torch_model.predict_proba(x[:2])
torch_model.predict_proba(x[:1])

so my sense is that this is a bug in the torch -> onnx export path, since torch is the intermediate representation for the ONNX conversion anyways

narrowing further, if you use tree_trav instead of gemm, the code works. so likely a problem in the GEMM -> ONNX conversion

@interesaaat
Copy link
Collaborator

It is not the first time we are having troubles with GEMM and ONNX. @jfrery can you please change the tree implementation from gemm to tree_trav or perf_tree_trav . You can pass extra_config={"tree_implementation": "tree_trav"} at conversion time to switch to tree_trav, for example. Closing this.

@jfrery
Copy link
Contributor Author

jfrery commented Jan 17, 2023

Actually we need the Gemm implementation. Maybe we can reopen and we will try to fix this?

@interesaaat
Copy link
Collaborator

interesaaat commented Jan 17, 2023

It looks that this is more a problem with the onnx export in pytorch since the torch converter works. Can you open an issue with them showing that torch work, while onnx export fails? We can then use this issue to track the downstream one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants