Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with H2O-3 #3

Open
FavioVazquez opened this issue Apr 7, 2022 · 6 comments
Open

Compatibility with H2O-3 #3

FavioVazquez opened this issue Apr 7, 2022 · 6 comments

Comments

@FavioVazquez
Copy link

Hi guys! Amazing job you did with this package. I work at H2O.ai and I’d like to know how can I help to make this compatible with our open source AutoML solution.

Let me know how can we get started helping you with this :)

@Ennosigaeon
Copy link
Owner

Ennosigaeon commented Apr 8, 2022

Thanks for the positive feedback!

To integrate H20-3 basically three different information are necessary:

  1. You would have to provide a set of evaluated configurations/pipelines over time that are supposed to be visualized.
  2. If additional insights about specific configurations should be displayed, in addition access to the fitted models is necessary for some on-the-fly predictions.
  3. Access to the train/test data set

The logic for integration frameworks is implemented in the adapter package. These adapters are responsible for converting an arbitrary object to a RunHistory.

I will try to provide a base implementation for H2O next week and will come back to you if I need help with extracting the required information from H2O.

@FavioVazquez
Copy link
Author

Thanks! Please let me know how can I help, we can even setup a zoom call, we are very interested :)

@Ennosigaeon
Copy link
Owner

@FavioVazquez I have prepared a first draft for H2O (see the H2O example).

I currently have three points I am a bit struggling with related to the underlying search space:

  1. Is there some way to get an overview of all available models that are going to be evaluated during the Grid/Random search?
  2. Is there a generic way to obtain the available hyperparameters of each estimator? For example the GBM class mixes actual hyperparameters (like ntrees or max_depth) with "meta-parameters" like training_frame.
  3. Do you perform any kind of preprocessing that should be displayed in the pipeline overview?

@FavioVazquez
Copy link
Author

Hi @Ennosigaeon sorry for the delay. I'm working with the development team to answer all of your questions. Do you need something else from our side?

@FavioVazquez
Copy link
Author

@Ennosigaeon here are the answers:

  1. Grid/Random Search needs to be defined by the user so they will have to explicitly ask for which algorithm they would like to tune. But I think we don't have a list of options in our python api docs
  2. You can see all the meta-parameters that will be tuned by looking at the function, there will be a list of the parameters. If you are not sure what parameters you should tune it might be helpful for them to read over what AutoML tunes: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html#random-grid-search-parameters or use AutoML instead which will do the automatic grid search for each of the common algorithms
  3. No we don't.

@FavioVazquez
Copy link
Author

Btw we don't do data pre-processing in the grid search but you can use h2o-3 for data munging etc. we have functions for that @Ennosigaeon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants