Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Weighted Leaves Default to False #34

Merged
merged 2 commits into from
Feb 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions docs/user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,17 +119,16 @@ Multi-target quantile regression is also supported. If the target values are mul
Quantile Weighting
~~~~~~~~~~~~~~~~~~

By default, the predict method calculates quantiles by weighting each sample inversely according to the size of its leaf node (`weighted_leaves = True`). If `weighted_leaves = False`, each sample in a leaf (including repeated bootstrap samples) will be given equal weight. Note that this leaf-based weighting can only be used with weighted quantiles.

By default, the predict method calculates quantiles using a weighted quantile method (`weighted_quantile = True`), which assigns a weight to each sample in the training set based on the number of times that it co-occurs in the same leaves as the test sample. When the number of samples in the training set is larger than the expected size of this list (i.e., :math:`n_{train} \gg n_{trees} \cdot n_{leaves} \cdot n_{leafsamples}`), it can be more efficient to calculate an unweighted quantile (`weighted_quantile = False`), which aggregates the list of training `y` values for each leaf node to which the test sample belongs across all trees. For a given input, both methods can return the same output values::

>>> import numpy as np
>>> kwargs = {"weighted_leaves": False}
>>> y_pred_weighted = reg.predict(X_test, weighted_quantile=True, **kwargs)
>>> y_pred_unweighted = reg.predict(X_test, weighted_quantile=False, **kwargs)
>>> y_pred_weighted = reg.predict(X_test, weighted_quantile=True)
>>> y_pred_unweighted = reg.predict(X_test, weighted_quantile=False)
>>> np.allclose(y_pred_weighted, y_pred_unweighted)
True

By default, the predict method calculates quantiles by giving each sample in a leaf (including repeated bootstrap samples) equal weight (`weighted_leaves = False`). If `weighted_leaves = True`, each sample will be weighted inversely according to the size of its leaf node. Note that this leaf-based weighting can only be used with weighted quantiles.

Out-of-Bag Estimation
~~~~~~~~~~~~~~~~~~~~~

Expand Down
4 changes: 2 additions & 2 deletions quantile_forest/_quantile_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -449,7 +449,7 @@ def predict(
quantiles=None,
interpolation="linear",
weighted_quantile=True,
weighted_leaves=True,
weighted_leaves=False,
aggregate_leaves_first=True,
oob_score=False,
indices=None,
Expand Down Expand Up @@ -490,7 +490,7 @@ def predict(
number of training samples relative to siblings is small, weighted
quantiles can be more efficient to compute than unweighted ones.

weighted_leaves : bool, default=True
weighted_leaves : bool, default=False
Weight samples inversely to the size of their leaf node.
Only used if `weighted_quantile=True` and `max_samples_leaf!=1`.

Expand Down
4 changes: 2 additions & 2 deletions quantile_forest/_quantile_forest_fast.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,7 @@ cdef class QuantileForest:
UINT8_t[:, :] X_indices=None,
char* interpolation=b"linear",
bint weighted_quantile=<bint>True,
bint weighted_leaves=<bint>True,
bint weighted_leaves=<bint>False,
bint aggregate_leaves_first=<bint>True,
):
"""Return predictions for ``est.apply`` outputs.
Expand All @@ -644,7 +644,7 @@ cdef class QuantileForest:
weighted_quantile : bool, default=True
Calculate weighted quantiles.

weighted_leaves : bool, default=True
weighted_leaves : bool, default=False
Weight samples inversely to the size of their leaf node.

aggregate_leaves_first : bool, default=True
Expand Down