-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial GPU acceleration support for LightGBM (#368)
* add dummy gpu solver code * initial GPU code * fix crash bug * first working version * use asynchronous copy * use a better kernel for root * parallel read histogram * sparse features now works, but no acceleration, compute on CPU * compute sparse feature on CPU simultaneously * fix big bug; add gpu selection; add kernel selection * better debugging * clean up * add feature scatter * Add sparse_threshold control * fix a bug in feature scatter * clean up debug * temporarily add OpenCL kernels for k=64,256 * fix up CMakeList and definition USE_GPU * add OpenCL kernels as string literals * Add boost.compute as a submodule * add boost dependency into CMakeList * fix opencl pragma * use pinned memory for histogram * use pinned buffer for gradients and hessians * better debugging message * add double precision support on GPU * fix boost version in CMakeList * Add a README * reconstruct GPU initialization code for ResetTrainingData * move data to GPU in parallel * fix a bug during feature copy * update gpu kernels * update gpu code * initial port to LightGBM v2 * speedup GPU data loading process * Add 4-bit bin support to GPU * re-add sparse_threshold parameter * remove kMaxNumWorkgroups and allows an unlimited number of features * add feature mask support for skipping unused features * enable kernel cache * use GPU kernels withoug feature masks when all features are used * REAdme. * REAdme. * update README * fix typos (#349) * change compile to gcc on Apple as default * clean vscode related file * refine api of constructing from sampling data. * fix bug in the last commit. * more efficient algorithm to sample k from n. * fix bug in filter bin * change to boost from average output. * fix tests. * only stop training when all classes are finshed in multi-class. * limit the max tree output. change hessian in multi-class objective. * robust tree model loading. * fix test. * convert the probabilities to raw score in boost_from_average of classification. * fix the average label for binary classification. * Add boost_from_average to docs (#354) * don't use "ConvertToRawScore" for self-defined objective function. * boost_from_average seems doesn't work well in binary classification. remove it. * For a better jump link (#355) * Update Python-API.md * for a better jump in page A space is needed between `#` and the headers content according to Github's markdown format [guideline](https://guides.github.com/features/mastering-markdown/) After adding the spaces, we can jump to the exact position in page by click the link. * fixed something mentioned by @wxchan * Update Python-API.md * add FitByExistingTree. * adapt GPU tree learner for FitByExistingTree * avoid NaN output. * update boost.compute * fix typos (#361) * fix broken links (#359) * update README * disable GPU acceleration by default * fix image url * cleanup debug macro * remove old README * do not save sparse_threshold_ in FeatureGroup * add details for new GPU settings * ignore submodule when doing pep8 check * allocate workspace for at least one thread during builing Feature4 * move sparse_threshold to class Dataset * remove duplicated code in GPUTreeLearner::Split * Remove duplicated code in FindBestThresholds and BeforeFindBestSplit * do not rebuild ordered gradients and hessians for sparse features * support feature groups in GPUTreeLearner * Initial parallel learners with GPU support * add option device, cleanup code * clean up FindBestThresholds; add some omp parallel * constant hessian optimization for GPU * Fix GPUTreeLearner crash when there is zero feature * use np.testing.assert_almost_equal() to compare lists of floats in tests * travis for GPU
- Loading branch information
1 parent
db3d1f8
commit 0bb4a82
Showing
30 changed files
with
4,163 additions
and
246 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
[submodule "include/boost/compute"] | ||
path = compute | ||
url = https://github.com/boostorg/compute |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
#!/bin/bash | ||
|
||
# Original script from https://github.com/gregvw/amd_sdk/ | ||
|
||
# Location from which get nonce and file name from | ||
URL="http://developer.amd.com/tools-and-sdks/opencl-zone/opencl-tools-sdks/amd-accelerated-parallel-processing-app-sdk/" | ||
URLDOWN="http://developer.amd.com/amd-license-agreement-appsdk/" | ||
|
||
NONCE1_STRING='name="amd_developer_central_downloads_page_nonce"' | ||
FILE_STRING='name="f"' | ||
POSTID_STRING='name="post_id"' | ||
NONCE2_STRING='name="amd_developer_central_nonce"' | ||
|
||
#For newest FORM=`wget -qO - $URL | sed -n '/download-2/,/64-bit/p'` | ||
FORM=`wget -qO - $URL | sed -n '/download-5/,/64-bit/p'` | ||
|
||
# Get nonce from form | ||
NONCE1=`echo $FORM | awk -F ${NONCE1_STRING} '{print $2}'` | ||
NONCE1=`echo $NONCE1 | awk -F'"' '{print $2}'` | ||
echo $NONCE1 | ||
|
||
# get the postid | ||
POSTID=`echo $FORM | awk -F ${POSTID_STRING} '{print $2}'` | ||
POSTID=`echo $POSTID | awk -F'"' '{print $2}'` | ||
echo $POSTID | ||
|
||
# get file name | ||
FILE=`echo $FORM | awk -F ${FILE_STRING} '{print $2}'` | ||
FILE=`echo $FILE | awk -F'"' '{print $2}'` | ||
echo $FILE | ||
|
||
FORM=`wget -qO - $URLDOWN --post-data "amd_developer_central_downloads_page_nonce=${NONCE1}&f=${FILE}&post_id=${POSTID}"` | ||
|
||
NONCE2=`echo $FORM | awk -F ${NONCE2_STRING} '{print $2}'` | ||
NONCE2=`echo $NONCE2 | awk -F'"' '{print $2}'` | ||
echo $NONCE2 | ||
|
||
wget --content-disposition --trust-server-names $URLDOWN --post-data "amd_developer_central_nonce=${NONCE2}&f=${FILE}" -O AMD-SDK.tar.bz2; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.