-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] remove unnecessary conversions between data structures #8546
Conversation
else: | ||
params = dict((k, v) for k, v in params.items()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flake8-comprehensions
warned about the use of dict()
here, but when I looked more closely, it seems like this line isn't necessary at all.
It's been in xgboost
for 6+ years (since #1142), so maybe something has changed since then.
But if the code enters this point, it's already guaranteed that params
is a dictionary.
xgboost/python-package/xgboost/training.py
Lines 512 to 518 in 67752e3
if isinstance(params, list): | |
_metrics = [x[1] for x in params if x[0] == 'eval_metric'] | |
params = dict(params) | |
if 'eval_metric' in params: | |
params['eval_metric'] = _metrics | |
else: | |
params = dict((k, v) for k, v in params.items()) |
since it can only be a dictionary or list
xgboost/python-package/xgboost/training.py
Line 398 in 67752e3
params: BoosterParam, |
xgboost/python-package/xgboost/_typing.py
Line 28 in 67752e3
BoosterParam = Union[List, Dict] # better be sequence |
And for something that is already a dictionary, I can't think of a situation where re-iterating over all the contents like this would produce a different dictionary.
params = {"a": 1, "b": [2, 3], "c": None}
new_params = dict((k, v) for k, v in params.items())
params == new_params
# True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trivialfis I found that this change was actually to blame for the failure of tests/python/test_early_stopping.py
, not the one from this comment thread: #8546 (comment).
That dict((k, v) ...
code was serving a second purpose...it ensured that the rest of xgboost.cv()
worked on a copy of params
, so calls like params.pop()
wouldn't have side-effects on the passed-in dictionary.
I fixed that in 849ea92.
There is one remaining test Python test failure (python/test_with_sklearn.py::test_pandas_input
) on macOS (build link) and one on Windows (build link).
assert isinstance(
clf_isotonic.calibrated_classifiers_[0].base_estimator, xgb.XGBClassifier
)
E AttributeError: '_CalibratedClassifier' object has no attribute 'base_estimator'
I think that might not be related to this PR's changes, given that I also see it on this other PR which only touches the R package: #8565 (build link).
I wonder if that's related to the newest release of scikit-learn
supporting pandas as an input and output (scikit-learn/scikit-learn#5523).
I noticed, for example, that the failing Mac build here is using scikit-learn=1.2.0
, which was just released today (PyPI link).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully #8580 will fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the cleanup! Could you please rebase onto or merge the latest branch?
Sure! Just merged in I chose to merge instead of rebase since this repo uses squash merging. |
python-package/xgboost/core.py
Outdated
@@ -1160,7 +1160,7 @@ def feature_names(self, feature_names: Optional[FeatureNames]) -> None: | |||
raise ValueError(msg) | |||
# prohibit to use symbols may affect to parse. e.g. []< | |||
if not all(isinstance(f, str) and | |||
not any(x in f for x in set(('[', ']', '<'))) | |||
not any(x in f for x in {'[', ']', '<'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not any(x in f for x in {'[', ']', '<'}) | |
not any(x in f for x in ['[', ']', '<']) |
pylint
rightly caught that since this is iterating over values, not doing a set comparison, an iterable type should be used here instead of a set.
/home/runner/work/xgboost/xgboost/python-package/xgboost/core.py:1163: convention (C0208, use-sequence-for-iteration, DMatrix.feature_names) Use a sequence type when iterating over values
build link: https://github.com/dmlc/xgboost/actions/runs/3635213908/jobs/6134104007
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, the change seems to be causing failure in tests. can you reproduce the failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I'm not sure why. I can't see how iterating over a list of strings should be different (in a way that matters for this code) than iterating over a set of strings.
I just merged latest master
into this branch, in case the test failures aren't related to this PR's changes. If they still fail, I'll try to reproduce this locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yes, I was able to reproduce the failing tests locally.
cd python-package
pip install .
cd ..
pytest tests/python/test_early_stopping.py
I'll try to figure out which of these changes is to blame, sorry about that!
let me fix the failure with updated sklearn tomorrow. |
apologies for the push. I need to bring in the CI fixes and hope you don't mind. |
no problem, do what you need! And let me know if I can help in any way. |
I ran
flake8
over the code in the Python package, withflake8-comprehensions
installed.This found a few opportunities for small optimizations. For example, the results of
set([1, 2, 3])
and{1, 2, 3}
are equivalent, but theset([])
form has the extra cost of construction an intermediate list (which is immediately thrown away).This PR proposes resolving those warnings, for some tiny efficiency improvements.
Thanks for your time and consideration.