Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce results on local machine vs cloud #6905

Closed
lingzhou125 opened this issue Apr 25, 2021 · 3 comments
Closed

Unable to reproduce results on local machine vs cloud #6905

lingzhou125 opened this issue Apr 25, 2021 · 3 comments

Comments

@lingzhou125
Copy link

Local Machine
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.942219119220772,
gamma=0.30088663356155, gpu_id=0, importance_type='gain',
interaction_constraints='', learning_rate=0.05, max_delta_step=0,
max_depth=10, min_child_weight=6, missing=nan,
monotone_constraints='()', n_estimators=250, n_jobs=-1,
num_parallel_tree=1, random_state=0, reg_alpha=3.54578021703862,
reg_lambda=0.426143991951751, scale_pos_weight=1,
subsample=0.946270611429848, tree_method='gpu_hist',
validate_parameters=1, verbosity=3)

Cloud
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.942219119220772,
gamma=0.30088663356155, gpu_id=0, importance_type='gain',
interaction_constraints='', learning_rate=0.05, max_delta_step=0,
max_depth=10, min_child_weight=6, missing=nan,
monotone_constraints='()', n_estimators=250, n_jobs=-1,
num_parallel_tree=1, random_state=0, reg_alpha=3.54578021703862,
reg_lambda=0.426143991951751, scale_pos_weight=1,
subsample=0.946270611429848, tree_method='hist',
validate_parameters=1, verbosity=3)

It's not off by a little... it's wildly different. The training set shape is (501808, 314). To be clear, the results are reproducible between runs on the local machine and the cloud but the results between the local machine and cloud are no where close.

@trivialfis
Copy link
Member

You have specified different tree methods.

@lingzhou125
Copy link
Author

I started a gpu instance of sagemaker and tried it again and still different results. I have also tried tree_method='exact' on both and received different results

@trivialfis
Copy link
Member

These types of reproducibility issues are quite difficult to solve. You can try setting the n_jobs to 1. Most of them are caused by floating-point errors. Floating-point addition is non-associative so in a parallel execution environment, it can present non-reproducibility behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants