[RFC] Version 0.82 release candidate #4201

hcho3 · 2019-03-04T08:30:45Z

v0.82 (2019.03.03)

This release is packed with many new features and bug fixes.

Roadmap: better performance scaling for multi-core CPUs (#3957)

Poor performance scaling of the hist algorithm for multi-core CPUs has been under investigation (Call for contribution: improve multi-core CPU performance of 'hist' #3810). Performance optimizations for Intel CPUs #3957 marks an important step toward better performance scaling, by using software pre-fetching and replacing STL vectors with C-style arrays. Special thanks to @Laurae2 and @SmirnovEgorRu.
See Call for contribution: improve multi-core CPU performance of 'hist' #3810 for latest progress on this roadmap.

New feature: Distributed Fast Histogram Algorithm (`hist`) (#4011, #4102, #4140, #4128)

It is now possible to run the hist algorithm in distributed setting. Special thanks to @CodingCat. The benefits include:
1. Faster local computation via feature binning
2. Support for monotonic constraints and feature interaction constraints
3. Simpler codebase than approx, allowing for future improvement
Depth-wise tree growing is now performed in a separate code path, so that cross-node syncronization is performed only once per level.

New feature: Multi-Node, Multi-GPU training (#4095)

Distributed training is now able to utilize clusters equipped with NVIDIA GPUs. In particular, the rabit AllReduce layer will communicate GPU device information. Special thanks to @mt-jones, @RAMitchell, @rongou, @trivialfis, @canonizer, and @jeffdk.
Resource management systems will be able to assign a rank for each GPU in the cluster.
In Dask, users will be able to construct a collection of XGBoost processes over an inhomogeneous device cluster (i.e. workers with different number and/or kinds of GPUs).

New feature: Multiple validation datasets in XGBoost4J-Spark (#3904, #3910)

You can now track the performance of the model during training with multiple evaluation datasets. By specifying eval_sets or call setEvalSets over a XGBoostClassifier or XGBoostRegressor, you can pass in multiple evaluation datasets typed as a Map from String to DataFrame. Special thanks to @CodingCat.
See the usage of multiple validation datasets here

New feature: Additional metric functions for GPUs (#3952)

Element-wise metrics have been ported to GPU: rmse, mae, logloss, poisson-nloglik, gamma-deviance, gamma-nloglik, error, tweedie-nloglik. Special thanks to @trivialfis and @RAMitchell.
With supported metrics, XGBoost will select the correct devices based on your system and n_gpus parameter.

New feature: Column sampling at individual nodes (splits) (#3971)

Columns (features) can now be sampled at individual tree nodes, in addition to per-tree and per-level sampling. To enable per-node sampling, set colsample_bynode parameter, which represents the fraction of columns sampled at each node. This parameter is set to 1.0 by default (i.e. no sampling per node). Special thanks to @canonizer.
The colsample_bynode parameter works cumulatively with other colsample_by* parameters: for example, {'colsample_bynode':0.5, 'colsample_bytree':0.5} with 100 columns will give 25 features to choose from at each split.

Major API change: consistent logging level via `verbosity` (#3982, #4002, #4138)

XGBoost now allows fine-grained control over logging. You can set verbosity to 0 (silent), 1 (warning), 2 (info), and 3 (debug). This is useful for controlling the amount of logging outputs. Special thanks to @trivialfis.
Parameters silent and debug_verbose are now deprecated.
Note: Sometimes XGBoost tries to change configurations based on heuristics, which is displayed as warning message. If there's unexpected behaviour, please try to increase value of verbosity.

Major bug fix: external memory (#4040, #4193)

Clarify object ownership in multi-threaded prefetcher, to avoid memory error.
Correctly merge two column batches (which uses CSC layout).
Add unit tests for external memory.
Special thanks to @trivialfis and @hcho3.

Major bug fix: early stopping fixed in XGBoost4J and XGBoost4J-Spark (#3928, #4176)

Early stopping in XGBoost4J and XGBoost4J-Spark is now consistent with its counterpart in the Python package. Training stops if the current iteration is earlyStoppingSteps away from the best iteration. If there are multiple evaluation sets, only the last one is used to determinate early stop.
See the updated documentation here
Special thanks to @CodingCat, @yanboliang, and @mingyang.

Major bug fix: infrequent features should not crash distributed training (#4045)

For infrequently occuring features, some partitions may not get any instance. This scenario used to crash distributed training due to mal-formed ranges. The problem has now been fixed.
In practice, one-hot-encoded categorical variables tend to produce rare features, particularly when the cardinality is high.
Special thanks to @CodingCat.

Performance improvements

Faster, more space-efficient radix sorting in gpu_hist (Improve update position function #3895)
Subtraction trick in histogram calculation in gpu_hist (GPU performance logging/improvements #3945)
More performant re-partition in XGBoost4J-Spark ([jvm-packages] For Performance consideration and Alignment input parameter of repartition function #4049)

Bug-fixes

Fix semantics of gpu_id when running multiple XGBoost processes on a multi-GPU machine (Fix specifying gpu_id, add tests. #3851)
Fix page storage path for external memory on Windows (Make C++ unit tests run and pass on Windows #3869)
Fix configuration setup so that DART utilizes GPU (Fix ignoring dart in updater configuration. #4024)
Eliminate NAN values from SHAP prediction (Avoid generating NaNs in UnwoundPathSum #3943)
Prevent empty quantile sketches in hist (Prevent empty quantiles in fast hist #4155)
Enable running objectives with 0 GPU (Enable running objectives with 0 GPU. #3878)
Parameters are no longer dependent on system locale (Update dmlc-core submodule #3891, Update dmlc-core submodule #3907)
Use consistent data type in the GPU coordinate descent code (Use consistent type for sharding GPU data in GPU coordinate updater #3917)
Remove undefined behavior in the CLI config parser on the ARM platform (Use int instead of char in CLI config parser #3976)
Initialize counters in GPU AllReduce (Initialized AllReducer counters to 0. #3987)
Prevent deadlocks in GPU AllReduce (Use nccl group calls to prevent from dead lock. #4113)
Load correct values from sliced NumPy arrays (Accept numpy array view. #4147, Fix #4163: always copy sliced data #4165)
Fix incorrect GPU device selection (Fix incorrect device in multi-GPU algorithm #4161)
Make feature binning logic in hist aware of query groups when running a ranking task (Make `HistCutMatrix::Init' be aware of groups. #4115). For ranking task, query groups are weighted, not individual instances.
Generate correct C++ exception type for LOG(FATAL) macro (Upgrade rabit. #4159)
Python package
- Python package should run on system without PATH environment variable (handle $PATH not being set in python library #3845)
- Fix coef_ and intercept_ signature to be compatible with sklearn.RFECV (Fix coef_ and intercept_ signature to be compatible with sklearn.RFECV #3873)
- Use UTF-8 encoding in Python package README, to support non-English locale (open README with utf-8 and add gcc-8 #3867)
- Add AUC-PR to list of metrics to maximize for early stopping (Add AUC-PR to list of metrics to maximize for early stopping #3936)
- Allow loading pickles without self.booster attribute, for backward compatibility (Fix #3894: Allow loading pickles without self.booster attributes #3938, Fix #3894: Allow loading pickles without self.booster attributes (redux) #3944)
- White-list DART for feature importances (Check booster for dart in feature importance. #4073)
- Update usage of h2oai/datatable (Update datatable usage #4123)
XGBoost4J-Spark
- Address scalability issue in prediction ([jvm-packages] fix the scalability issue of prediction #4033)
- Enforce the use of per-group weights for ranking task ([jvm-packages] force use per-group weights in spark layer #4118)
- Fix vector size of rawPredictionCol in XGBoostClassificationModel ([jvm-packages] Fix vector size of 'rawPredictionCol' in XGBoostClassificationModel #3932)
- More robust error handling in Spark tracker ([jvm-packages] fix safe execution #4046, [jvm-packages] better fix for shutdown applications #4108)
- Fix return type of setEvalSets ([jvm-packages] fix return type of setEvalSets #4105)
- Return correct value of getMaxLeaves ([jvm-packages] minor fix of params #4114)

API changes

Add experimental parameter single_precision_histogram to use single-precision histograms for the gpu_hist algorithm (Single precision histograms on GPU #3965)
Python package
- Add option to select type of feature importances in the scikit-learn inferface (use gain for sklearn feature_importances_ #3876)
- Add trees_to_df() method to dump decision trees as Pandas data frame (Added trees_to_df() method for Booster class #4153)
- Add options to control node shapes in the GraphViz plotting function (Add parameter to make node type configurable in plot tree #3859)
- Add xgb_model option to XGBClassifier, to load previously saved model (enable xgb_model in scklearn XGBClassifier and test. #4092)
- Passing lists into DMatrix is now deprecated (Deprecation warning for lists passed into DMatrix #3970)
XGBoost4J
- Support multiple feature importance features ([jvm-packages] Updates to Java Booster to support other feature importance measures #3801)

Maintenance: Refactor C++ code for legibility and maintainability

Refactor hist algorithm code and add unit tests (Add unittests for updaters. #3836)
Minor refactoring of split evaluator in gpu_hist (Minor refactor of split evaluation in gpu_hist #3889)
Removed unused leaf vector field in the tree model (Remove leaf vector, add tree serialisation test #3989)
Simplify the tree representation by combining TreeModel and RegTree classes (Combine TreeModel and RegTree #3995)
Simplify and harden tree expansion code (Reduce tree expand boilerplate code #4008, Require leaf statistics when expanding tree #4015)
De-duplicate parameter classes in the linear model algorithms (Merge duplicated linear updater parameters. #4013)
Robust handling of ranges with C++20 span in gpu_exact and gpu_coord_descent (Use Span in GPU exact updater. #4020, Use Span in gpu coordinate. #4029)
Simplify tree training code (Simplify tree building code. #3825). Also use Span class for robust handling of ranges.

Maintenance: testing, continuous integration, build system

Disallow std::regex since it's not supported by GCC 4.8.x (In lint check, disallow std::regex since it's not supported by GCC 4.8.x #3870)
Add multi-GPU tests for coordinate descent algorithm for linear models (Fix gpu coordinate running on multi-gpu. #3893, Fix test_gpu_coordinate. #3974)
Enforce naming style in Python lint (Enforce naming style in Python lint #3896)
Refactor Python tests (Refactor Python tests. #3897, Add back python2 tests for Travis light weight tests. #3901): Use pytest exclusively, display full trace upon failure
Address DeprecationWarning when using Python collections (Address deprecation of Python ABC. #3909)
Use correct group for maven site plugin ([jvm-packages] Fix #3898: use correct group ID for maven-site-plugin #3937)
Jenkins CI is now using on-demand EC2 instances exclusively, due to unreliability of Spot instances (Disable retries in Jenkins CI, since we're now using On-Demand instances instead of Spot #3948)
Better GPU performance logging (GPU performance logging/improvements #3945)
Fix GPU tests on machines with only 1 GPU (Fixed single-GPU tests. #4053)
Eliminate CRAN check warnings and notes (Fix CRAN check warnings/notes #3988)
Add unit tests for tree serialization (Remove leaf vector, add tree serialisation test #3989)
Add unit tests for tree fitting functions in hist (Prevent empty quantiles in fast hist #4155)
Add a unit test for gpu_exact algorithm (Use Span in GPU exact updater. #4020)
Correct JVM CMake GPU flag (Correct JVM CMake GPU flag. #4071)
Fix failing Travis CI on Mac (Fix failing Travis CI on Mac #4086)
Speed up Jenkins by not compiling CMake (Speed up Jenkins CI by not compiling CMake #4099)
Analyze C++ and CUDA code using clang-tidy, as part of Jenkins CI pipeline (Perform clang-tidy on both cpp and cuda source. #4034)
Fix broken R test: Install Homebrew GCC (Fix broken R test: Install Homebrew GCC #4142)
Check for empty datasets in GPU unit tests (Fix empty subspan. #4151)
Fix Windows compilation (Fix for windows compilation #4139)
Comply with latest convention of cpplint (Add comment after #endif. #4157)
Fix a unit test in gpu_hist (Fix gpu_hist apply_split test. #4158)
Speed up data generation in Python tests (a few tweaks to speed up data generation #4164)

Usability Improvements

Add link to InfoWorld 2019 Technology of the Year Award (Add link to InfoWorld 2019 award #4116)
Remove outdated AWS YARN tutorial (Fix #3857: take down AWS YARN tutorial, as it is outdated #3885)
Document current limitation in number of features (Document current limitation in number of features #3886)
Remove unnecessary warning when gblinear is selected (Remove unnecessary warning when 'gblinear' is selected #3888)
Document limitation of CSV parser: header not supported (Address #3933: document limitation of DMLC CSV parser + recommend Pandas #3934)
Log training parameters in XGBoost4J-Spark ([jvm-packages]adding logs for parameters #4091)
Clarify early stopping behavior in the scikit-learn interface (scikit-learn api section documentation correction #3967)
Clarify behavior of max_depth parameter (fix doc about max_depth #4078)
Revise Python docstrings for ranking task (Update Python docstring for ranking functions #4121). In particular, weights must be per-group in learning-to-rank setting.
Document parameter num_parallel_tree (Document num_parallel_tree. #4022)
Add Jenkins status badge (Add Jenkins status badge #4090)
Warn users against using internal functions of Booster object (Prevent training without setting up caches. #4066)
Reformat benchmark_tree.py to comply with Python style convention (reformat benchmark_tree.py to get rid of lint errors #4126)
Clarify a comment in objectiveTrait ([jvm-packages] fix comments in objectiveTrait #4174)
Fix typos and broken links in documentation (Fix a typo in the R-package documentation: max.deph -> max.depth #3890, [jvm-packages] Minor update on xgboost4j README.md #3872, Fix Typo in learner.cc #3902, Fix link in binary classification demo README.md (#3918) #3919, Fix typo in Feature Interaction Constraints tutorial #3975, fix typos #4027, demo's doc: Update README.MD to fix typo #4156, Update README.rst #4167)

Acknowledgement

Contributors (in no particular order): Jiaming Yuan (@trivialfis), Hyunsu Cho (@hcho3), Nan Zhu (@CodingCat), Rory Mitchell (@RAMitchell), Yanbo Liang (@yanboliang), Andy Adinets (@canonizer), Tong He (@hetong007), Yuan Tang (@terrytangyuan)

First-time Contributors (in no particular order): Jelle Zijlstra (@JelleZijlstra), Jiacheng Xu (@jiachengxu), @ajing, Kashif Rasul (@kashif), @theycallhimavi, Joey Gao (@pjgao), Prabakaran Kumaresshan (@nixphix), Huafeng Wang (@huafengw), @lyxthe, Sam Wilkinson (@scwilkinson), Tatsuhito Kato (@stabacov), Shayak Banerjee (@shayakbanerjee), Kodi Arfer (@Kodiologist), @KyleLi1985, Egor Smirnov (@SmirnovEgorRu), @tmitanitky, Pasha Stetsenko (@st-pasha), Kenichi Nagahara (@keni-chi), Abhai Kollara Dilip (@abhaikollara), Patrick Ford (@pford221), @hshujuan, Matthew Jones (@mt-jones), Thejaswi Rao (@teju85), Adam November (@anovember)

First-time Reviewers (in no particular order): Mingyang Hu (@mingyang), Theodore Vasiloudis (@thvasilo), Jakub Troszok (@troszok), Rong Ou (@rongou), @Denisevi4, Matthew Jones (@mt-jones), Jeff Kaplan (@jeffdk)

thvasilo · 2019-03-04T14:05:01Z

Any change in particular you'd like us to try testing @hcho3 ?

trivialfis · 2019-03-04T14:11:53Z

One thing I am interested in is, how experimental is our external memory implementation? We can't mark it beta forever...

hcho3 · 2019-03-04T17:20:39Z

@thvasilo I'm mainly soliciting feedback with regard to the release notes. After a short while, committers will approve the release.

hcho3 · 2019-03-04T17:22:24Z

@trivialfis I'd like to see external memory implemented for hist, with robust unit tests. Then we can remove the beta label.

CodingCat · 2019-03-04T17:27:27Z

LGTM.

terrytangyuan

Looks great!

CodingCat · 2019-03-04T18:44:14Z

~~just noticed one thing, @hcho3~~

actually the latest code in distributed hist only syncs stats at root level (for both loss-guided and depth-width) and the stats of all other nodes are derived from the cache (https://github.com/dmlc/xgboost/blob/master/src/tree/updater_quantile_hist.cc#L743-L780) (thanks for the suggestion from @RAMitchell )

~~instead of~~

"""
Depth-wise tree growing is now performed in a separate code path, so that cross-node synchronization is performed only once per level.
"""

hcho3 · 2019-03-04T18:49:55Z

@CodingCat Wait, I thought depthwise had fewer sync steps than lossguide?

CodingCat · 2019-03-04T18:51:37Z

nvm, I am brain-damaged.....

hcho3 · 2019-03-05T02:14:29Z

With 5 approvals, I am going to release 0.82. Thanks all!

hlbkin · 2019-03-05T10:56:05Z

When would binary wheels for 0.82 be available? (especially multi-gpu)

https://s3-us-west-2.amazonaws.com/xgboost-wheels/list.html

hcho3 · 2019-03-05T17:23:10Z

@hlbkin It is already available on PyPI. Run pip install xgboost==0.82. This one will have multi-GPU enabled.

Kodiologist · 2019-03-08T14:18:47Z

When will CRAN get a new release of the R package?

hetong007 · 2019-03-08T17:13:28Z

There's a recent change in R-devel which causes failure in unittests. I'll patch 0.82 with a fix and submit to CRAN before March 20th. Best regards, Tong He Kodi Arfer <[email protected]> 于2019年3月8日周五上午6:18写道：

…

When will CRAN get a new release of the R package? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4201 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABOugE2LmEOSRgZQK1JaPA7wnxz_Qhowks5vUnFMgaJpZM4bbzn5> .

hcho3 · 2019-03-13T13:04:15Z

@Kodiologist 0.82 is now released on CRAN: https://cran.r-project.org/web/packages/xgboost/index.html

Release 0.82

c38bc87

hcho3 mentioned this pull request Mar 4, 2019

[WIP] 0.82 version release #4198

Closed

CodingCat approved these changes Mar 4, 2019

View reviewed changes

RAMitchell approved these changes Mar 4, 2019

View reviewed changes

terrytangyuan approved these changes Mar 4, 2019

View reviewed changes

hetong007 approved these changes Mar 4, 2019

View reviewed changes

trivialfis approved these changes Mar 4, 2019

View reviewed changes

hcho3 merged commit 3f83dcd into dmlc:master Mar 5, 2019

hcho3 deleted the release_0.82 branch March 5, 2019 02:14

lock bot locked as resolved and limited conversation to collaborators Jun 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Version 0.82 release candidate #4201

[RFC] Version 0.82 release candidate #4201

hcho3 commented Mar 4, 2019

thvasilo commented Mar 4, 2019

trivialfis commented Mar 4, 2019

hcho3 commented Mar 4, 2019

hcho3 commented Mar 4, 2019

CodingCat commented Mar 4, 2019

terrytangyuan left a comment

CodingCat commented Mar 4, 2019 •

edited

Loading

hcho3 commented Mar 4, 2019

CodingCat commented Mar 4, 2019

hcho3 commented Mar 5, 2019

hlbkin commented Mar 5, 2019

hcho3 commented Mar 5, 2019

Kodiologist commented Mar 8, 2019

hetong007 commented Mar 8, 2019 via email

hcho3 commented Mar 13, 2019

[RFC] Version 0.82 release candidate #4201

[RFC] Version 0.82 release candidate #4201

Conversation

hcho3 commented Mar 4, 2019

v0.82 (2019.03.03)

Roadmap: better performance scaling for multi-core CPUs (#3957)

New feature: Distributed Fast Histogram Algorithm (hist) (#4011, #4102, #4140, #4128)

New feature: Multi-Node, Multi-GPU training (#4095)

New feature: Multiple validation datasets in XGBoost4J-Spark (#3904, #3910)

New feature: Additional metric functions for GPUs (#3952)

New feature: Column sampling at individual nodes (splits) (#3971)

Major API change: consistent logging level via verbosity (#3982, #4002, #4138)

Major bug fix: external memory (#4040, #4193)

Major bug fix: early stopping fixed in XGBoost4J and XGBoost4J-Spark (#3928, #4176)

Major bug fix: infrequent features should not crash distributed training (#4045)

Performance improvements

Bug-fixes

API changes

Maintenance: Refactor C++ code for legibility and maintainability

Maintenance: testing, continuous integration, build system

Usability Improvements

Acknowledgement

thvasilo commented Mar 4, 2019

trivialfis commented Mar 4, 2019

hcho3 commented Mar 4, 2019

hcho3 commented Mar 4, 2019

CodingCat commented Mar 4, 2019

terrytangyuan left a comment

Choose a reason for hiding this comment

CodingCat commented Mar 4, 2019 • edited Loading

hcho3 commented Mar 4, 2019

CodingCat commented Mar 4, 2019

hcho3 commented Mar 5, 2019

hlbkin commented Mar 5, 2019

hcho3 commented Mar 5, 2019

Kodiologist commented Mar 8, 2019

hetong007 commented Mar 8, 2019 via email

hcho3 commented Mar 13, 2019

New feature: Distributed Fast Histogram Algorithm (`hist`) (#4011, #4102, #4140, #4128)

Major API change: consistent logging level via `verbosity` (#3982, #4002, #4138)

CodingCat commented Mar 4, 2019 •

edited

Loading