[python] remove unnecessary conversions between data structures #8546

jameslamb · 2022-12-05T17:19:55Z

I ran flake8 over the code in the Python package, with flake8-comprehensions installed.

pip install flake8 flake8-comprehensions
flake8 \
    --ignore=B,F,E,W \
    --exclude='python-package/build' \
    .

This found a few opportunities for small optimizations. For example, the results of set([1, 2, 3]) and {1, 2, 3} are equivalent, but the set([]) form has the extra cost of construction an intermediate list (which is immediately thrown away).

This PR proposes resolving those warnings, for some tiny efficiency improvements.

./python-package/setup.py:282:51: C400 Unnecessary generator - rewrite as a list comprehension.
./python-package/xgboost/core.py:1163:48: C405 Unnecessary tuple literal - rewrite as a set literal.
./python-package/xgboost/core.py:1511:18: C402 Unnecessary generator - rewrite as a dict comprehension.
./python-package/xgboost/tracker.py:144:23: C405 Unnecessary list literal - rewrite as a set literal.
./python-package/xgboost/tracker.py:258:23: C405 Unnecessary list literal - rewrite as a set literal.
./python-package/xgboost/training.py:518:18: C402 Unnecessary generator - rewrite as a dict comprehension.

Thanks for your time and consideration.

jameslamb · 2022-12-05T17:23:42Z

python-package/xgboost/training.py

-    else:
-        params = dict((k, v) for k, v in params.items())


flake8-comprehensions warned about the use of dict() here, but when I looked more closely, it seems like this line isn't necessary at all.

It's been in xgboost for 6+ years (since #1142), so maybe something has changed since then.

But if the code enters this point, it's already guaranteed that params is a dictionary.

xgboost/python-package/xgboost/training.py

Lines 512 to 518 in 67752e3

if isinstance(params, list):

_metrics = [x[1] for x in params if x[0] == 'eval_metric']

params = dict(params)

if 'eval_metric' in params:

params['eval_metric'] = _metrics

else:

params = dict((k, v) for k, v in params.items())

since it can only be a dictionary or list

xgboost/python-package/xgboost/training.py

Line 398 in 67752e3

params: BoosterParam,

xgboost/python-package/xgboost/_typing.py

Line 28 in 67752e3

BoosterParam = Union[List, Dict] # better be sequence

And for something that is already a dictionary, I can't think of a situation where re-iterating over all the contents like this would produce a different dictionary.

params = {"a": 1, "b": [2, 3], "c": None} new_params = dict((k, v) for k, v in params.items()) params == new_params # True

Thank you for catching that!

@trivialfis I found that this change was actually to blame for the failure of tests/python/test_early_stopping.py, not the one from this comment thread: #8546 (comment).

That dict((k, v) ... code was serving a second purpose...it ensured that the rest of xgboost.cv() worked on a copy of params, so calls like params.pop() wouldn't have side-effects on the passed-in dictionary.

I fixed that in 849ea92.

There is one remaining test Python test failure (python/test_with_sklearn.py::test_pandas_input) on macOS (build link) and one on Windows (build link).

assert isinstance( clf_isotonic.calibrated_classifiers_[0].base_estimator, xgb.XGBClassifier ) E AttributeError: '_CalibratedClassifier' object has no attribute 'base_estimator'

I think that might not be related to this PR's changes, given that I also see it on this other PR which only touches the R package: #8565 (build link).

I wonder if that's related to the newest release of scikit-learn supporting pandas as an input and output (scikit-learn/scikit-learn#5523).

I noticed, for example, that the failing Mac build here is using scikit-learn=1.2.0, which was just released today (PyPI link).

Hopefully #8580 will fix this.

trivialfis

Thank you for the cleanup! Could you please rebase onto or merge the latest branch?

jameslamb · 2022-12-07T02:16:22Z

Could you please rebase onto or merge the latest branch

Sure! Just merged in master.

I chose to merge instead of rebase since this repo uses squash merging.

jameslamb · 2022-12-07T13:40:13Z

python-package/xgboost/core.py

@@ -1160,7 +1160,7 @@ def feature_names(self, feature_names: Optional[FeatureNames]) -> None:
                raise ValueError(msg)
            # prohibit to use symbols may affect to parse. e.g. []<
            if not all(isinstance(f, str) and
-                       not any(x in f for x in set(('[', ']', '<')))
+                       not any(x in f for x in {'[', ']', '<'})


Suggested change

not any(x in f for x in {'[', ']', '<'})

not any(x in f for x in ['[', ']', '<'])

pylint rightly caught that since this is iterating over values, not doing a set comparison, an iterable type should be used here instead of a set.

/home/runner/work/xgboost/xgboost/python-package/xgboost/core.py:1163: convention (C0208, use-sequence-for-iteration, DMatrix.feature_names) Use a sequence type when iterating over values

build link: https://github.com/dmlc/xgboost/actions/runs/3635213908/jobs/6134104007

hmm, the change seems to be causing failure in tests. can you reproduce the failure?

Interesting, I'm not sure why. I can't see how iterating over a list of strings should be different (in a way that matters for this code) than iterating over a set of strings.

I just merged latest master into this branch, in case the test failures aren't related to this PR's changes. If they still fail, I'll try to reproduce this locally.

Ok yes, I was able to reproduce the failing tests locally.

cd python-package pip install . cd .. pytest tests/python/test_early_stopping.py

I'll try to figure out which of these changes is to blame, sorry about that!

trivialfis · 2022-12-09T19:06:43Z

let me fix the failure with updated sklearn tomorrow.

trivialfis · 2022-12-13T19:49:23Z

apologies for the push. I need to bring in the CI fixes and hope you don't mind.

jameslamb · 2022-12-14T00:08:15Z

no problem, do what you need! And let me know if I can help in any way.

[python] remove unnecessary conversions between data structures

8f19727

jameslamb commented Dec 5, 2022

View reviewed changes

jameslamb added 3 commits December 5, 2022 22:19

fix mypy errors

d7f8f50

Merge branch 'master' into python/comprehensions

a4d8448

revert unnecessary change

4a66751

trivialfis approved these changes Dec 6, 2022

View reviewed changes

Merge branch 'master' into python/comprehensions

6f5597b

jameslamb commented Dec 7, 2022

View reviewed changes

jameslamb added 3 commits December 7, 2022 07:40

Update python-package/xgboost/core.py

62ac34a

Merge branch 'master' into python/comprehensions

9273fca

ensure that cv() doesn't have side effects on params

849ea92

Merge branch 'master' into python/comprehensions

622bad5

trivialfis merged commit 06ea6c7 into dmlc:master Dec 14, 2022

jameslamb deleted the python/comprehensions branch December 14, 2022 12:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] remove unnecessary conversions between data structures #8546

[python] remove unnecessary conversions between data structures #8546

jameslamb commented Dec 5, 2022 •

edited

Loading

jameslamb Dec 5, 2022 •

edited

Loading

trivialfis Dec 6, 2022

jameslamb Dec 9, 2022

jameslamb Dec 12, 2022

trivialfis left a comment

jameslamb commented Dec 7, 2022

jameslamb Dec 7, 2022 •

edited

Loading

trivialfis Dec 8, 2022 •

edited

Loading

jameslamb Dec 8, 2022

jameslamb Dec 9, 2022

trivialfis commented Dec 9, 2022

trivialfis commented Dec 13, 2022

jameslamb commented Dec 14, 2022

	if isinstance(params, list):
	_metrics = [x[1] for x in params if x[0] == 'eval_metric']
	params = dict(params)
	if 'eval_metric' in params:
	params['eval_metric'] = _metrics
	else:
	params = dict((k, v) for k, v in params.items())

	not any(x in f for x in {'[', ']', '<'})
	not any(x in f for x in ['[', ']', '<'])

[python] remove unnecessary conversions between data structures #8546

[python] remove unnecessary conversions between data structures #8546

Conversation

jameslamb commented Dec 5, 2022 • edited Loading

jameslamb Dec 5, 2022 • edited Loading

Choose a reason for hiding this comment

trivialfis Dec 6, 2022

Choose a reason for hiding this comment

jameslamb Dec 9, 2022

Choose a reason for hiding this comment

jameslamb Dec 12, 2022

Choose a reason for hiding this comment

trivialfis left a comment

Choose a reason for hiding this comment

jameslamb commented Dec 7, 2022

jameslamb Dec 7, 2022 • edited Loading

Choose a reason for hiding this comment

trivialfis Dec 8, 2022 • edited Loading

Choose a reason for hiding this comment

jameslamb Dec 8, 2022

Choose a reason for hiding this comment

jameslamb Dec 9, 2022

Choose a reason for hiding this comment

trivialfis commented Dec 9, 2022

trivialfis commented Dec 13, 2022

jameslamb commented Dec 14, 2022

jameslamb commented Dec 5, 2022 •

edited

Loading

jameslamb Dec 5, 2022 •

edited

Loading

jameslamb Dec 7, 2022 •

edited

Loading

trivialfis Dec 8, 2022 •

edited

Loading