[jvm-packages] eta parsing error #2512

manfredcalvo · 2017-07-12T17:13:29Z

Operating System: Ubuntu 16.04

Compiler: g++

Package used: jvm-package specifically xgboost4j and python-package

xgboost version used: 0.7

gcc version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609

Issue:

I ran an experiment using the data 'dermatology.data' in the folder demo/multiclass_classification. I created an script on java that make the same that /demo/multiclass_classification/train.py. When I compare the results using as error logloss when the objective function was multi:sofmax on each language were:

Java:

[0] test-mlogloss:1.791759 train-mlogloss:1.791759
[1] test-mlogloss:1.791759 train-mlogloss:1.791759
[2] test-mlogloss:1.791759 train-mlogloss:1.791759
[3] test-mlogloss:1.791759 train-mlogloss:1.791759
[4] test-mlogloss:1.791759 train-mlogloss:1.791759

Python:

[0] train-mlogloss:1.54662 test-mlogloss:1.57447
[1] train-mlogloss:1.35498 test-mlogloss:1.39797
[2] train-mlogloss:1.19883 test-mlogloss:1.25218
[3] train-mlogloss:1.06734 test-mlogloss:1.13098
[4] train-mlogloss:0.955641 test-mlogloss:1.03268

In this experiment the eta is equal to 0.1 but the results of the logloss in java are the same than when the eta is equal to 0.

what I discovered is that the locale of my operating system was setting up to use ',' as the separator between the integer part and the decimal part of a real number insted of '.' and that cause that the parser of the params in the c++ code truncate the eta from 0.1 to 0 and in general truncated any value x.y to x.

So I did a quick fix setting up the locale to en_US.UTF-8 in xgboost4j.cpp to obligate c++ to use that locate to identify '.' as the separator of the integer and decimal part of a real number. That fix worked for me but I think that is better if you can do something in the parsing code of the parameters to improve the library and it can take in account this type of discrepancies.

Thanks

The text was updated successfully, but these errors were encountered:

manfredcalvo · 2017-07-12T21:55:59Z

This is a similar issue: apache/mxnet#5056. It's related with std::stof and std::stod.

bnoreus · 2017-07-18T14:52:33Z

I'm experiencing the same isssue

bnoreus · 2017-07-18T15:15:20Z

@manfredcalvo Where in the code did you perform your locale edit? Would it be enough to just change a environment variable?

manfredcalvo · 2017-07-18T15:29:54Z

@bnoreus I changed the locale to en_US.UTF-8 in the environment variables (e.g. LC_NUMERIC, LC_MONETARY...) and that worked. But I think it shouldn't depend on the environment, there should be locale-independent parsing functions to do that since Java will always convert a float to a string with dot as the decimal separator and that can cause unwanted exceptions.

bnoreus · 2017-07-18T15:47:47Z

@manfredcalvo Thank's a lot! This probably saved me so many hours of work. I got it to work for me by setting this in my .bashrc
LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

But I agree with you. This should be fixed. I also think there are installation errors related to this problem. (I've had huge problems installing xgboost4j due to failing tests that I think fail because of this reason, my solution before was just to ignore the tests.. Instead I ended up with this problem)

manfredcalvo changed the title ~~eta parsing error with locale different than US. [jvm-packages]~~ [jvm-packages] eta parsing error with locale different than US. Jul 13, 2017

manfredcalvo changed the title ~~[jvm-packages] eta parsing error with locale different than US.~~ [jvm-packages] eta parsing error Jul 13, 2017

Pscheidl mentioned this issue Feb 15, 2018

PUBDEV-5294 - XGBoost model in Flow doesn't seem to converge h2oai/h2o-3#2055

Merged

CodingCat closed this as completed Apr 2, 2018

lock bot locked as resolved and limited conversation to collaborators Oct 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[jvm-packages] eta parsing error #2512

[jvm-packages] eta parsing error #2512

manfredcalvo commented Jul 12, 2017 •

edited

Loading

manfredcalvo commented Jul 12, 2017 •

edited

Loading

bnoreus commented Jul 18, 2017

bnoreus commented Jul 18, 2017

manfredcalvo commented Jul 18, 2017

bnoreus commented Jul 18, 2017

[jvm-packages] eta parsing error #2512

[jvm-packages] eta parsing error #2512

Comments

manfredcalvo commented Jul 12, 2017 • edited Loading

manfredcalvo commented Jul 12, 2017 • edited Loading

bnoreus commented Jul 18, 2017

bnoreus commented Jul 18, 2017

manfredcalvo commented Jul 18, 2017

bnoreus commented Jul 18, 2017

manfredcalvo commented Jul 12, 2017 •

edited

Loading

manfredcalvo commented Jul 12, 2017 •

edited

Loading