-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Important: reproducible segfault from version 3.0.0 onwards (both 3.0.0 and 3.1.0 affected) #3603
Comments
If you run this through gdb, you get this: so, a double free in Dataset? (although this may not be the source of the problem but rather a consequence)
|
I guess bin_construct_sample_cnt=1 is the root cause. We alreay add warnings about this. |
Refer #3521 |
I'll try rerunning the example and report back. What's the minimum sensible value, something like 257 so it never chooses int8? That's not clear. Or actually, hold on, it's not well-defined, because it has to be 257 unique values observed, not the sample count. Wonder if it would be possible to throw a preliminary exception (saying 'this has to be > ...') instead of a segfault? |
I've updated the script to use [#150]
d.signals.shape=(5447, 38), depth=6, num_trees=26
{'boosting': 'goss',
'learning_rate': 0.01,
'min_data_in_leaf': 1,
'min_sum_hessian_in_leaf': 0.0,
'bagging_fraction': 1.0,
'pos_bagging_fraction': 1.0,
'neg_bagging_fraction': 1.0,
'feature_fraction': 0.75,
'feature_fraction_bynode': 0.75,
'max_delta_step': 0.0,
'lambda_l1': 0.01,
'lambda_l2': 0.0,
'drop_rate': 1.0,
'max_drop': 100,
'skip_drop': 1.0,
'xgboost_dart_mode': True,
'uniform_drop': False,
'refit_decay_rate': 0.0,
'max_bin': 10,
'min_data_in_bin': 25,
'bin_construct_sample_cnt': 200000,
'is_unbalance': True,
'scale_pos_weight': 1.0,
'sigmoid': 0.01,
'other_rate': 0.8,
'objective': 'binary',
'tree_learner': 'serial',
'num_leaves': 64,
'verbosity': -1,
'num_threads': 1,
'seed': 42,
'histogram_pool_size': -1,
'max_depth': 6}
[LightGBM] [Fatal] Check failed: (best_split_info.left_count) > (0)
at /tmp/pip-req-build-kncfj4kn/compile/src/treelearner/serial_tree_learner.cpp, line 651 . |
@guolinke Any thoughts on the last case? (is that another implicit consequence of histogram binning changes, or something else?) |
@aldanor we remove the We suggest to use larger |
Thanks. So does that mean |
@aldanor avoid to set it |
@guolinke Thanks, that seems to have fixed it (although it would be nice if it was validated prior to running fits, especially the segfault with the low bin-construct hist sample count which could be probably prevented by requiring some minimum reasonable value for that parameter). I close this off for now. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Environment setup:
(Should lightgbm run this kind of testing on CI to catch these sort of errors as early as possible?)
Test script (click to expand)
On lightgbm 2.3.1, the script finishes successfully.
Starting from version 3.0.0, it fails after a few iterations like this (the one below is from 3.1.0):
By altering the generator a bit, you can also get a segfault from malloc about corrupted linked lists etc.
The text was updated successfully, but these errors were encountered: