Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighting isnt applied when using custom/sklearn eval_metric callback via sklearn API #10040

Closed
jth5472 opened this issue Feb 11, 2024 · 1 comment · Fixed by #10050
Closed

Comments

@jth5472
Copy link

jth5472 commented Feb 11, 2024

Weights arent propagated when specifying custom eval metrics. Example below:

import pandas as pd
from sklearn.metrics import log_loss
from xgboost import XGBClassifier

def get_data():
    X = pd.DataFrame.from_dict({
        "A": [1, 2, 3] * 100,
        "B": [4, 5, 6] * 100,
    })
    y = pd.Series([1, 0, 1] * 100)
    w = pd.Series([1, 2, 3] * 100)
    return X, y, w

def log_loss_wrap(*args, sample_weight = None, **kwargs):
    print(f"Weights empty: {sample_weight is None}") # True
    return log_loss(*args, sample_weight = sample_weight, **kwargs)

def train():

    X, y, w = get_data()
    xgb = XGBClassifier(
        eval_metric = log_loss_wrap,
        n_estimators = 1
    )

    xgb.fit(
        X,
        y,
        eval_set = [(X, y)],
        sample_weight_eval_set = [w]
    )
if __name__ == "__main__":
    train()

I believe this also true for providing a custom objective function, although I'm not sure if the gradient,hess is weighted internally instead.

I think a simple fix could be as follows (here):

def _metric_decorator(func: Callable) -> Metric:
    """Decorate a metric function from sklearn.

    Converts an metric function that uses the typical sklearn metric signature so that it
    is compatible with :py:func:`train`

    """

    def inner(y_score: np.ndarray, dmatrix: DMatrix) -> Tuple[str, float]:
        y_true = dmatrix.get_label()
        weights = dmatrix.get_weight()
        return func.__name__, func(y_true, y_score, sample_weight = weights if len(weights) else None)

    return inner
@jth5472
Copy link
Author

jth5472 commented Feb 11, 2024

Yikes, after rechecking the code I linked I realized this was already fixed (#8706). I was running xgboost==1.7.5 in my local environment which is why I was having issues. Sorry 😬

I guess the question still stands about propagating weights for custom objective functions as well if its not handled internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants