-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scikit-learn-intelex integration #1316
Comments
Just following up on this to see if it would be of interest. |
we have shifted development to TPOT2, which is a refactored version of TPOT1 that is hopefully easier to work with (We will pin something about it to the issues page soon). You can find that here https://github.com/EpistasisLab/tpot2 But yes, I would be interested in exploring this. I think option 2 makes the most sense. there are other similar accelerated packages we were considering, such as cuML. Option2 would give them all the same interface. |
Great, I can open up a PR reflecting an integration described with option 2 to continue discussion here. I see in TPOT2 the configs are a bit different in format than in the original library, and I a not seeing the cuML config or other similar custom ones - any suggestions on approach for this? |
The configuration setup is different in TPOT2. Rather than a single configuration dictionary, TPOT2 takes in three. One for the leaves, roots, and inner nodes. Additionally, we allow multiple configurations to be selected simultaneously and have broken up the configuration dictionary into modular pieces (selection, transformers, classifiers, regressors, etc). Some configurations are also not fixed and depend on the shape of your dataset. More information on how to set this up can be found in tutorial 2 here. To add a custom configuration to TPOT2, a file defining the search space can be added to the configs folder here. Then an option can be added to this function to allow it as an option for the TPOTEstimator. This approach could be used to add cuML support or sklearnex. We still need to add cuML to TPOT2, which is on the to-do list. |
Context
The Intel(R) Extension for Scikit-learn (sklearnex) provides accelerations to popular classical machine learning algorithms, both on CPU and GPU. Given TPOT's heavy usage of scikit-learn algorithms, we believe there are compelling reasons for an integration of some sort with sklearnex's optimized regression and classification algorithms. Initial experimentation has shown potential for significant performance improvements - see this jupyter notebook for further detail.
Proposal
There are a few directions that this could go:
use_sklearnex
flag when initializing their TPOT classifier or regressor, in which case their config would use sklearnex implementations of algorithms instead of the default sklearn implementation (where possible). See an example of what this might look like in the code backend here: fork and how it could translate into performance improvements in the notebook.neural_network.MLPClassifier
)In either case, there would be corresponding docs/tests updates and an additional tutorial created for a smooth integration, as well as any other additions you feel would be necessary.
Thank you for your consideration and look forward to continuing this discussion.
The text was updated successfully, but these errors were encountered: