Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python package] Weighted Feature Sampling in LGBM Random Forest Models #6129

Closed
LaneMatthewJ opened this issue Oct 6, 2023 · 3 comments
Closed

Comments

@LaneMatthewJ
Copy link

Summary

If reasonably feasible, I would like to ask the dev team to add weighted feature sampling to the python lightgbm api. Weighted feature sampling is an integral part of the iterative random forest algorithm defined by Basu et al. .

Motivation

Iterative random forests with weighted weighted feature sampling can improve the overall accuracy of Gene Regulatory Networks derived from Random Forest based algorithms. We would like to use the lightgbm implementation of random forest to implement these networks.

Description

By implementing a feature_weight parameter, users could pass in a probability vector of how often each feature ought to be sampled during the feature sampling stage of the random forest. A default probability vector would be a uniform random distribution, while subsequent vectors can be user defined.

References

Basu, S., Kumbier, K., Brown, J. B., & Yu, B. (2018). Iterative random forests to discover predictive and stable high-order interactions. In Proceedings of the National Academy of Sciences (Vol. 115, Issue 8, pp. 1943–1948). Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.1711236115

Walker, A. M., Cliff, A., Romero, J., Shah, M. B., Jones, P., Felipe Machado Gazolla, J. G., Jacobson, D. A., & Kainer, D. (2022). Evaluating the performance of random forest and iterative random forest based methods when applied to gene expression data. In Computational and Structural Biotechnology Journal (Vol. 20, pp. 3372–3386). Elsevier BV. https://doi.org/10.1016/j.csbj.2022.06.037

@jameslamb
Copy link
Collaborator

Thanks very much for your interest in LightGBM and for the excellent write-up with references!

But this request is already captured in other feature requests here:

It seems that what you're asking for is identical to #4605, just more specific (this request is limited to Random Forest mode, where #4605 is asking for general-purpose finer-grained control of feature sampling).

I'm going to mark this as duplicate, close it, and post on #4605 referring to it. Please add any other thoughts (or offers to help, if you'd like to try implementing this!) there.

@LaneMatthewJ
Copy link
Author

Ah! When I searched for other implementations I had specifically searched for "weighted feature sampling".

Thanks for linking those other requests @jameslamb!

Copy link

github-actions bot commented Oct 9, 2024

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants