-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PUBDEV-5294 - XGBoost model in Flow doesn't seem to converge #2055
Conversation
@@ -168,40 +172,40 @@ public XGBoostModel(Key<XGBoostModel> selfKey, XGBoostParameters parms, XGBoostO | |||
} | |||
if (p._eta != 0.3) { | |||
Log.info("Using user-provided parameter eta instead of learn_rate."); | |||
params.put("eta", p._eta); | |||
params.put("eta", localizedNumberFormatter.format(p._eta)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of handling every single case I would recommend allowing to put the number directly in the map and then post-process it to a correct XGBoost input. This will prevent future mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 to this, someone might just add a new param as is without realizing this in the future. A post process of all float/double values right before we return the params map might be safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Good catch! Looks good if you could only make that one improvement @michalkurka suggested. |
2f7aed4
to
9661331
Compare
9661331
to
1141b86
Compare
Thank you guys, all set. Feel free to do review. I'll watch tests. |
import java.util.HashMap; | ||
import java.util.Map; | ||
|
||
import static hex.tree.xgboost.XGBoost.makeDataInfo; | ||
|
||
public class XGBoostModel extends Model<XGBoostModel, XGBoostModel.XGBoostParameters, XGBoostOutput> { | ||
|
||
private static final NumberFormat localizedNumberFormatter = DecimalFormat.getNumberInstance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see documentation to DecimalFormat - there is no guarantee of being thread safe
* Decimal formats are generally not synchronized.
* It is recommended to create separate format instances for each thread.
* If multiple threads access a format concurrently, it must be synchronized
* externally.
@h2o-ops please test |
@Pscheidl seems ready to merge - the test failure is irrelevant |
@stexyz Please test your original scenario and let us know. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XGBoost converges on the Airlines dataset now when invoked from the Flow.
Nice! |
* PUBDEV-5294 - XGBoost model in Flow doesn't seem to converge * Fix issue with DecimalFormat (not thread safe!) * Fix formatting
Issue: https://0xdata.atlassian.net/browse/PUBDEV-5294
This is a known issue with XGBoost, when the library takes system Locale and uses it to determine decimal separator.
There are many countries with different decimal separator from US/UK decimal point, usually a decimal comma (arabic style). For example Japan shares the same settings with US/UK (one more reason for this to work for the US guys and Mateusz). https://en.wikipedia.org/wiki/Decimal_separator
This issue has been reported many times (e.g. here dmlc/xgboost#2512), however for some uses, this is considered a feature and is unlikely to be fixed soon.
Therefore, fixing this issue on our side by passing localized parameters is the preferred option.