Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent training without setting up caches. #4066

Merged
merged 2 commits into from
Feb 3, 2019

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Jan 19, 2019

* Add warning for internal functions.
* Check number of features.
@trivialfis
Copy link
Member Author

trivialfis commented Jan 19, 2019

Calling Booster.update without first initializing Booster with cache for dtrain results in num_feature == 0. I can raise an error in Python, but checking it in C++ seems more reliable. Added documentation effectively marked Booster.update and Booster.boost as internal methods.

@RAMitchell , @hcho3 Could you take a look see if there is better decision?

@codecov-io
Copy link

codecov-io commented Jan 19, 2019

Codecov Report

Merging #4066 into master will decrease coverage by 0.01%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4066      +/-   ##
==========================================
- Coverage   60.56%   60.55%   -0.02%     
==========================================
  Files         130      130              
  Lines       11756    11758       +2     
==========================================
  Hits         7120     7120              
- Misses       4636     4638       +2
Impacted Files Coverage Δ
python-package/xgboost/core.py 77.38% <ø> (ø) ⬆️
src/learner.cc 26.23% <0%> (-0.14%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1fc37e4...5eb543a. Read the comment docs.

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we really say don't use something that is documented as a part of the public api? Maybe it is better to say 'power users only' or something.

@trivialfis
Copy link
Member Author

Can we really say don't use something that is documented as a part of the public api? Maybe it is better to say 'power users only' or something.

Actually I don't like my solution. Let's see if I can make new datasets part of the caches safely.

@trivialfis
Copy link
Member Author

@RAMitchell Turns out I can't make the incoming dataset for training to become part of the caches for the following reason:

In c_api, DMatrix handler is a std::shared_ptr, while Learner.UpdaterOneIter accepts raw pointer. Every time c_api::XGBoostUpdaterOneIter calls Learner::UpdaterOneIter, it first calls shared_ptr::get() to pass the raw pointer. So there's no way to let the caches from Learner obtain ownership of this DMatrix without changing Learner's interface, hence can not make it part of the caches (which is a vector of shared pointer).

And no, it's not "power user only". I'm a power user ( I think :) ), I don't know how to make it work other than making a copy of the abstracted APIs.

@trivialfis
Copy link
Member Author

I will go ahead and merge this if no objections. @RAMitchell @hcho3

@hcho3
Copy link
Collaborator

hcho3 commented Jan 31, 2019

Go ahead.

@trivialfis
Copy link
Member Author

@thvasilo Thanks for the pointer.

@hcho3 hcho3 merged commit 1088dff into dmlc:master Feb 3, 2019
@trivialfis trivialfis deleted the fix/num_feature branch February 3, 2019 10:01
@lock lock bot locked as resolved and limited conversation to collaborators May 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Blocking] python kernel failed when call Booster().predict
5 participants