Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost model in Flow doesn't seem to converge #12166

Closed
exalate-issue-sync bot opened this issue May 13, 2023 · 5 comments
Closed

XGBoost model in Flow doesn't seem to converge #12166

exalate-issue-sync bot opened this issue May 13, 2023 · 5 comments

Comments

@exalate-issue-sync
Copy link

Unable to successfully build a classification XGBoost model in Flow on the Airlines data set which was possible to for the GBM.
Steps:

  1. import the [^AirlinesTest.csv] [^AirlinesTrain.csv], parse them and create GBM model as follows
    {{
    buildModel 'gbm', {"model_id":"gbm-6ac7e1fc-3d36-40c7-ac7c-73c7c54fda4b","training_frame":"AirlinesTrain.hex","validation_frame":"AirlinesTest.hex","nfolds":0,"response_column":"IsDepDelayed","ignored_columns":["IsDepDelayed_REC"],"ignore_const_cols":true,"ntrees":50,"max_depth":5,"min_rows":10,"nbins":20,"seed":-1,"learn_rate":0.1,"sample_rate":1,"col_sample_rate":1,"score_each_iteration":false,"score_tree_interval":0,"balance_classes":false,"nbins_top_level":1024,"nbins_cats":1024,"r2_stopping":1.7976931348623157e+308,"stopping_rounds":0,"stopping_metric":"AUTO","stopping_tolerance":0.001,"max_runtime_secs":0,"learn_rate_annealing":1,"distribution":"AUTO","huber_alpha":0.9,"checkpoint":"","col_sample_rate_per_tree":1,"min_split_improvement":0.00001,"histogram_type":"AUTO","categorical_encoding":"AUTO","custom_metric_func":"","build_tree_one_node":false,"sample_rate_per_class":[],"col_sample_rate_change_per_level":1,"max_abs_leafnode_pred":1.7976931348623157e+308,"pred_noise_bandwidth":0,"calibrate_model":false}}}

All looks good:
!image-2018-02-06-14-01-16-745.png|thumbnail!

  1. Try to do the same using the XGBoost model

{{buildModel 'xgboost', {"model_id":"xgboost-502c6c48-a3d5-4f13-82e7-12ff527d1973","training_frame":"AirlinesTrain.hex","validation_frame":"AirlinesTest.hex","nfolds":0,"response_column":"IsDepDelayed","ignored_columns":["IsDepDelayed_REC"],"ignore_const_cols":true,"seed":-1,"ntrees":50,"max_depth":6,"min_rows":1,"min_child_weight":1,"learn_rate":0.3,"eta":0.3,"sample_rate":1,"subsample":1,"col_sample_rate":1,"colsample_bylevel":1,"score_each_iteration":false,"stopping_rounds":0,"stopping_metric":"AUTO","stopping_tolerance":0.001,"max_runtime_secs":0,"distribution":"AUTO","categorical_encoding":"LabelEncoder","col_sample_rate_per_tree":1,"colsample_bytree":1,"score_tree_interval":0,"min_split_improvement":0,"gamma":0,"max_leaves":0,"tree_method":"auto","grow_policy":"depthwise","dmatrix_type":"auto","quiet_mode":true,"max_abs_leafnode_pred":0,"max_delta_step":0,"max_bins":256,"min_sum_hessian_in_leaf":100,"min_data_in_leaf":0,"sample_type":"uniform","normalize_type":"tree","rate_drop":0,"one_drop":false,"skip_drop":0,"booster":"gbtree","reg_lambda":0,"reg_alpha":0,"backend":"auto","gpu_id":0}}}

Sadly, the model doesn't seem to be either trained well or well presented in Flow:
!image-2018-02-06-14-05-15-736.png|thumbnail!

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: [~accountid:557058:b36244f2-45c9-4479-9677-d1ccf6f8a61d] please take a look but don't feel obliged to fix it on your branch!

@exalate-issue-sync
Copy link
Author

Pavel Pscheidl commented: With learning rate set exactly to 1 and not less, the model with Stefan's setup converges.

@exalate-issue-sync
Copy link
Author

Pavel Pscheidl commented: The booster itself returns predictions of constat value 0.5 across all iterations. This behavior can be reproduced with many datasets, including "prostate". The params inserted are as follows:

!Snímek obrazovky pořízený 2018-02-13 22-24-32.png|thumbnail!

@exalate-issue-sync
Copy link
Author

Pavel Pscheidl commented: This is a known issue with XGBoost, when the library takes system Locale and uses it to determine decimal separator.

There are many countries with different decimal separator from US/UK decimal point, usually a decimal comma (arabic style). For example Japan shares the same settings with US/UK (one more reason for this to work for the US guys and Mateusz). https://en.wikipedia.org/wiki/Decimal_separator

This issue has been reported many times (e.g. here dmlc/xgboost#2512), however for some uses, this is considered a feature and is unlikely to be fixed soon.

Therefore, fixing this issue on our side by passing localized parameters is the preferred option.

@hasithjp
Copy link
Member

JIRA Issue Migration Info

Jira Issue: PUBDEV-5294
Assignee: Pavel Pscheidl
Reporter: Stefan Pacinda
State: Resolved
Fix Version: 3.18.0.2
Attachments: Available (Count: 5)
Development PRs: Available

Linked PRs from JIRA

#2055

Attachments From Jira

Attachment Name: AirlinesTest.csv
Attached By: Stefan Pacinda
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5294/AirlinesTest.csv

Attachment Name: AirlinesTrain.csv
Attached By: Stefan Pacinda
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5294/AirlinesTrain.csv

Attachment Name: image-2018-02-06-14-01-16-745.png
Attached By: Stefan Pacinda
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5294/image-2018-02-06-14-01-16-745.png

Attachment Name: image-2018-02-06-14-05-15-736.png
Attached By: Hasith Perera
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5294/image-2018-02-06-14-05-15-736.png

Attachment Name: Snímek obrazovky pořízený 2018-02-13 22-24-32.png
Attached By: Pavel Pscheidl
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5294/Snímek obrazovky pořízený 2018-02-13 22-24-32.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant