XGBoost model in Flow doesn't seem to converge #12166

exalate-issue-sync · 2023-05-13T01:09:14Z

Unable to successfully build a classification XGBoost model in Flow on the Airlines data set which was possible to for the GBM.
Steps:

import the [^AirlinesTest.csv] [^AirlinesTrain.csv], parse them and create GBM model as follows
{{
buildModel 'gbm', {"model_id":"gbm-6ac7e1fc-3d36-40c7-ac7c-73c7c54fda4b","training_frame":"AirlinesTrain.hex","validation_frame":"AirlinesTest.hex","nfolds":0,"response_column":"IsDepDelayed","ignored_columns":["IsDepDelayed_REC"],"ignore_const_cols":true,"ntrees":50,"max_depth":5,"min_rows":10,"nbins":20,"seed":-1,"learn_rate":0.1,"sample_rate":1,"col_sample_rate":1,"score_each_iteration":false,"score_tree_interval":0,"balance_classes":false,"nbins_top_level":1024,"nbins_cats":1024,"r2_stopping":1.7976931348623157e+308,"stopping_rounds":0,"stopping_metric":"AUTO","stopping_tolerance":0.001,"max_runtime_secs":0,"learn_rate_annealing":1,"distribution":"AUTO","huber_alpha":0.9,"checkpoint":"","col_sample_rate_per_tree":1,"min_split_improvement":0.00001,"histogram_type":"AUTO","categorical_encoding":"AUTO","custom_metric_func":"","build_tree_one_node":false,"sample_rate_per_class":[],"col_sample_rate_change_per_level":1,"max_abs_leafnode_pred":1.7976931348623157e+308,"pred_noise_bandwidth":0,"calibrate_model":false}}}

All looks good:
!image-2018-02-06-14-01-16-745.png|thumbnail!

Try to do the same using the XGBoost model

{{buildModel 'xgboost', {"model_id":"xgboost-502c6c48-a3d5-4f13-82e7-12ff527d1973","training_frame":"AirlinesTrain.hex","validation_frame":"AirlinesTest.hex","nfolds":0,"response_column":"IsDepDelayed","ignored_columns":["IsDepDelayed_REC"],"ignore_const_cols":true,"seed":-1,"ntrees":50,"max_depth":6,"min_rows":1,"min_child_weight":1,"learn_rate":0.3,"eta":0.3,"sample_rate":1,"subsample":1,"col_sample_rate":1,"colsample_bylevel":1,"score_each_iteration":false,"stopping_rounds":0,"stopping_metric":"AUTO","stopping_tolerance":0.001,"max_runtime_secs":0,"distribution":"AUTO","categorical_encoding":"LabelEncoder","col_sample_rate_per_tree":1,"colsample_bytree":1,"score_tree_interval":0,"min_split_improvement":0,"gamma":0,"max_leaves":0,"tree_method":"auto","grow_policy":"depthwise","dmatrix_type":"auto","quiet_mode":true,"max_abs_leafnode_pred":0,"max_delta_step":0,"max_bins":256,"min_sum_hessian_in_leaf":100,"min_data_in_leaf":0,"sample_type":"uniform","normalize_type":"tree","rate_drop":0,"one_drop":false,"skip_drop":0,"booster":"gbtree","reg_lambda":0,"reg_alpha":0,"backend":"auto","gpu_id":0}}}

Sadly, the model doesn't seem to be either trained well or well presented in Flow:
!image-2018-02-06-14-05-15-736.png|thumbnail!

exalate-issue-sync · 2023-05-13T01:09:16Z

Michal Kurka commented: [~accountid:557058:b36244f2-45c9-4479-9677-d1ccf6f8a61d] please take a look but don't feel obliged to fix it on your branch!

exalate-issue-sync · 2023-05-13T01:09:18Z

Pavel Pscheidl commented: With learning rate set exactly to 1 and not less, the model with Stefan's setup converges.

exalate-issue-sync · 2023-05-13T01:09:19Z

Pavel Pscheidl commented: The booster itself returns predictions of constat value 0.5 across all iterations. This behavior can be reproduced with many datasets, including "prostate". The params inserted are as follows:

!Snímek obrazovky pořízený 2018-02-13 22-24-32.png|thumbnail!

exalate-issue-sync · 2023-05-13T01:09:21Z

Pavel Pscheidl commented: This is a known issue with XGBoost, when the library takes system Locale and uses it to determine decimal separator.

There are many countries with different decimal separator from US/UK decimal point, usually a decimal comma (arabic style). For example Japan shares the same settings with US/UK (one more reason for this to work for the US guys and Mateusz). https://en.wikipedia.org/wiki/Decimal_separator

This issue has been reported many times (e.g. here dmlc/xgboost#2512), however for some uses, this is considered a feature and is unlikely to be fixed soon.

Therefore, fixing this issue on our side by passing localized parameters is the preferred option.

hasithjp · 2023-05-15T10:13:52Z

JIRA Issue Migration Info

Jira Issue: PUBDEV-5294
Assignee: Pavel Pscheidl
Reporter: Stefan Pacinda
State: Resolved
Fix Version: 3.18.0.2
Attachments: Available (Count: 5)
Development PRs: Available

Linked PRs from JIRA

#2055

Attachments From Jira

Attachment Name: AirlinesTest.csv
Attached By: Stefan Pacinda
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5294/AirlinesTest.csv

Attachment Name: AirlinesTrain.csv
Attached By: Stefan Pacinda
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5294/AirlinesTrain.csv

Attachment Name: image-2018-02-06-14-01-16-745.png
Attached By: Stefan Pacinda
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5294/image-2018-02-06-14-01-16-745.png

Attachment Name: image-2018-02-06-14-05-15-736.png
Attached By: Hasith Perera
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5294/image-2018-02-06-14-05-15-736.png

Attachment Name: Snímek obrazovky pořízený 2018-02-13 22-24-32.png
Attached By: Pavel Pscheidl
File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-5294/Snímek obrazovky pořízený 2018-02-13 22-24-32.png

hasithjp closed this as completed May 15, 2023

hasithjp added the fixVersion/3.18.0.2 label May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XGBoost model in Flow doesn't seem to converge #12166

XGBoost model in Flow doesn't seem to converge #12166

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

hasithjp commented May 15, 2023

XGBoost model in Flow doesn't seem to converge #12166

XGBoost model in Flow doesn't seem to converge #12166

Comments

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

exalate-issue-sync bot commented May 13, 2023

hasithjp commented May 15, 2023