-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XGBoost model in Flow doesn't seem to converge #12166
Comments
Michal Kurka commented: [~accountid:557058:b36244f2-45c9-4479-9677-d1ccf6f8a61d] please take a look but don't feel obliged to fix it on your branch! |
Pavel Pscheidl commented: With learning rate set exactly to 1 and not less, the model with Stefan's setup converges. |
Pavel Pscheidl commented: The booster itself returns predictions of constat value 0.5 across all iterations. This behavior can be reproduced with many datasets, including "prostate". The params inserted are as follows: !Snímek obrazovky pořízený 2018-02-13 22-24-32.png|thumbnail! |
Pavel Pscheidl commented: This is a known issue with XGBoost, when the library takes system Locale and uses it to determine decimal separator. There are many countries with different decimal separator from US/UK decimal point, usually a decimal comma (arabic style). For example Japan shares the same settings with US/UK (one more reason for this to work for the US guys and Mateusz). https://en.wikipedia.org/wiki/Decimal_separator This issue has been reported many times (e.g. here dmlc/xgboost#2512), however for some uses, this is considered a feature and is unlikely to be fixed soon. Therefore, fixing this issue on our side by passing localized parameters is the preferred option. |
JIRA Issue Migration Info Jira Issue: PUBDEV-5294 Linked PRs from JIRA Attachments From Jira Attachment Name: AirlinesTest.csv Attachment Name: AirlinesTrain.csv Attachment Name: image-2018-02-06-14-01-16-745.png Attachment Name: image-2018-02-06-14-05-15-736.png Attachment Name: Snímek obrazovky pořízený 2018-02-13 22-24-32.png |
Unable to successfully build a classification XGBoost model in Flow on the Airlines data set which was possible to for the GBM.
Steps:
{{
buildModel 'gbm', {"model_id":"gbm-6ac7e1fc-3d36-40c7-ac7c-73c7c54fda4b","training_frame":"AirlinesTrain.hex","validation_frame":"AirlinesTest.hex","nfolds":0,"response_column":"IsDepDelayed","ignored_columns":["IsDepDelayed_REC"],"ignore_const_cols":true,"ntrees":50,"max_depth":5,"min_rows":10,"nbins":20,"seed":-1,"learn_rate":0.1,"sample_rate":1,"col_sample_rate":1,"score_each_iteration":false,"score_tree_interval":0,"balance_classes":false,"nbins_top_level":1024,"nbins_cats":1024,"r2_stopping":1.7976931348623157e+308,"stopping_rounds":0,"stopping_metric":"AUTO","stopping_tolerance":0.001,"max_runtime_secs":0,"learn_rate_annealing":1,"distribution":"AUTO","huber_alpha":0.9,"checkpoint":"","col_sample_rate_per_tree":1,"min_split_improvement":0.00001,"histogram_type":"AUTO","categorical_encoding":"AUTO","custom_metric_func":"","build_tree_one_node":false,"sample_rate_per_class":[],"col_sample_rate_change_per_level":1,"max_abs_leafnode_pred":1.7976931348623157e+308,"pred_noise_bandwidth":0,"calibrate_model":false}}}
All looks good:
!image-2018-02-06-14-01-16-745.png|thumbnail!
{{buildModel 'xgboost', {"model_id":"xgboost-502c6c48-a3d5-4f13-82e7-12ff527d1973","training_frame":"AirlinesTrain.hex","validation_frame":"AirlinesTest.hex","nfolds":0,"response_column":"IsDepDelayed","ignored_columns":["IsDepDelayed_REC"],"ignore_const_cols":true,"seed":-1,"ntrees":50,"max_depth":6,"min_rows":1,"min_child_weight":1,"learn_rate":0.3,"eta":0.3,"sample_rate":1,"subsample":1,"col_sample_rate":1,"colsample_bylevel":1,"score_each_iteration":false,"stopping_rounds":0,"stopping_metric":"AUTO","stopping_tolerance":0.001,"max_runtime_secs":0,"distribution":"AUTO","categorical_encoding":"LabelEncoder","col_sample_rate_per_tree":1,"colsample_bytree":1,"score_tree_interval":0,"min_split_improvement":0,"gamma":0,"max_leaves":0,"tree_method":"auto","grow_policy":"depthwise","dmatrix_type":"auto","quiet_mode":true,"max_abs_leafnode_pred":0,"max_delta_step":0,"max_bins":256,"min_sum_hessian_in_leaf":100,"min_data_in_leaf":0,"sample_type":"uniform","normalize_type":"tree","rate_drop":0,"one_drop":false,"skip_drop":0,"booster":"gbtree","reg_lambda":0,"reg_alpha":0,"backend":"auto","gpu_id":0}}}
Sadly, the model doesn't seem to be either trained well or well presented in Flow:
!image-2018-02-06-14-05-15-736.png|thumbnail!
The text was updated successfully, but these errors were encountered: