While predicting the model doesn't check if data dtypes have changed #3626

sbushmanov · 2020-12-04T14:31:56Z

Summary

Suppose we trained a model with a pandas df, some of the features defined as categorical. Then, if we feed a numpy array, the model silently accepts an array, but produces wrong (?) results. It would be nice to have:

Check if inputs dtypes are the same as at the train time
Error message if input types have changed.

Train demo:

from seaborn import load_dataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from lightgbm import LGBMClassifier, Dataset
from scipy.special import logit, expit, softmax
import shap

titanic = load_dataset("titanic")
X = titanic.drop(["survived","alive","adult_male","who",'deck'],1)
y = titanic["survived"]

features = X.columns
cat_features = []
for cat in X.select_dtypes(exclude="number"):
    cat_features.append(cat)
    X[cat] = X[cat].astype("category").cat.codes.astype("category")

X_train, X_val, y_train, y_val = train_test_split(X,y,train_size=.8, random_state=42)

clf = LGBMClassifier(max_depth=3, n_estimators=1000, objective="binary")
clf.fit(X_train,y_train, eval_set=(X_val,y_val), early_stopping_rounds=100, verbose=100, categorical_feature=cat_features)

Predict on df:

clf.predict_proba(X_train[:1])
# array([[0.81781113, 0.18218887]])

Predict on numpy array (result chnages):

clf.predict_proba(X_train[:1].values)
# array([[0.83461009, 0.16538991]])

The text was updated successfully, but these errors were encountered:

guolinke · 2020-12-04T14:58:50Z

For categorical feature in pandas.DF, there is mapping (from categories to integer) saved in model. So if you convert it to numpy without that mapping, it produces the wrong results.

sbushmanov · 2020-12-04T15:33:52Z

Thanks for answering. But this is exactly why I'm suggesting this as a feature, not as a bug, because feeding numpy array accepted, but silently produces wrong results.

guolinke · 2020-12-05T00:44:06Z

@sbushmanov I think it is a trade-off. If we only accept the same data type in prediction, using a trained model will be limited.
However, due to the mapping in pandas categorical features, I think we should at least check for that, avoid that mapping being ignored.

sbushmanov · 2020-12-05T05:10:16Z

I think issuing at least a Warning is warranted. It took me half an hour to troubleshoot this one without a hint.

StrikerRUS · 2020-12-26T20:52:56Z

Adding this as a sub-issue for Check input for prediction item in Feature Requests Hub: #2302.

github-actions · 2023-08-23T18:54:03Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

sbushmanov changed the title ~~While predicting the model doesn't check if data structure has changed~~ While predicting the model doesn't check if data dtypes has changed Dec 4, 2020

sbushmanov changed the title ~~While predicting the model doesn't check if data dtypes has changed~~ While predicting the model doesn't check if data dtypes have changed Dec 4, 2020

StrikerRUS closed this as completed Dec 26, 2020

StrikerRUS mentioned this issue Dec 26, 2020

Feature Requests & Voting Hub #2302

Open

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

While predicting the model doesn't check if data dtypes have changed #3626

While predicting the model doesn't check if data dtypes have changed #3626

sbushmanov commented Dec 4, 2020

guolinke commented Dec 4, 2020

sbushmanov commented Dec 4, 2020

guolinke commented Dec 5, 2020

sbushmanov commented Dec 5, 2020 •

edited

Loading

StrikerRUS commented Dec 26, 2020

github-actions bot commented Aug 23, 2023

While predicting the model doesn't check if data dtypes have changed #3626

While predicting the model doesn't check if data dtypes have changed #3626

Comments

sbushmanov commented Dec 4, 2020

Summary

guolinke commented Dec 4, 2020

sbushmanov commented Dec 4, 2020

guolinke commented Dec 5, 2020

sbushmanov commented Dec 5, 2020 • edited Loading

StrikerRUS commented Dec 26, 2020

github-actions bot commented Aug 23, 2023

sbushmanov commented Dec 5, 2020 •

edited

Loading