-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jvm-packages] eta parsing error #2512
Comments
This is a similar issue: apache/mxnet#5056. It's related with std::stof and std::stod. |
I'm experiencing the same isssue |
@manfredcalvo Where in the code did you perform your locale edit? Would it be enough to just change a environment variable? |
@bnoreus I changed the locale to en_US.UTF-8 in the environment variables (e.g. LC_NUMERIC, LC_MONETARY...) and that worked. But I think it shouldn't depend on the environment, there should be locale-independent parsing functions to do that since Java will always convert a float to a string with dot as the decimal separator and that can cause unwanted exceptions. |
@manfredcalvo Thank's a lot! This probably saved me so many hours of work. I got it to work for me by setting this in my .bashrc But I agree with you. This should be fixed. I also think there are installation errors related to this problem. (I've had huge problems installing xgboost4j due to failing tests that I think fail because of this reason, my solution before was just to ignore the tests.. Instead I ended up with this problem) |
Operating System: Ubuntu 16.04
Compiler: g++
Package used: jvm-package specifically xgboost4j and python-package
xgboost
version used: 0.7gcc version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Issue:
I ran an experiment using the data 'dermatology.data' in the folder demo/multiclass_classification. I created an script on java that make the same that /demo/multiclass_classification/train.py. When I compare the results using as error logloss when the objective function was multi:sofmax on each language were:
Java:
[0] test-mlogloss:1.791759 train-mlogloss:1.791759
[1] test-mlogloss:1.791759 train-mlogloss:1.791759
[2] test-mlogloss:1.791759 train-mlogloss:1.791759
[3] test-mlogloss:1.791759 train-mlogloss:1.791759
[4] test-mlogloss:1.791759 train-mlogloss:1.791759
Python:
[0] train-mlogloss:1.54662 test-mlogloss:1.57447
[1] train-mlogloss:1.35498 test-mlogloss:1.39797
[2] train-mlogloss:1.19883 test-mlogloss:1.25218
[3] train-mlogloss:1.06734 test-mlogloss:1.13098
[4] train-mlogloss:0.955641 test-mlogloss:1.03268
In this experiment the eta is equal to 0.1 but the results of the logloss in java are the same than when the eta is equal to 0.
what I discovered is that the locale of my operating system was setting up to use ',' as the separator between the integer part and the decimal part of a real number insted of '.' and that cause that the parser of the params in the c++ code truncate the eta from 0.1 to 0 and in general truncated any value x.y to x.
So I did a quick fix setting up the locale to en_US.UTF-8 in xgboost4j.cpp to obligate c++ to use that locate to identify '.' as the separator of the integer and decimal part of a real number. That fix worked for me but I think that is better if you can do something in the parsing code of the parameters to improve the library and it can take in account this type of discrepancies.
Thanks
The text was updated successfully, but these errors were encountered: