-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starting the change for XGBoost integration into EVADb. #1232
Conversation
Unit test added is failing with the following error:
Will need to check this further. |
Please merge the latest staging please. Thanks! |
22978c4
to
3ff70a9
Compare
Thanks @jineetd. I will add the dependency requirements. |
Hi @jineetd, please update the documentation. You can use https://evadb.readthedocs.io/en/stable/source/reference/ai/model-train-sklearn.html as the reference. |
Sure @xzdandy , shall update the documentation for XGBoost. |
evadb/functions/xgboost.py
Outdated
def forward(self, frames: pd.DataFrame) -> pd.DataFrame: | ||
# Last column is the value to predict, hence don't pass that to the | ||
# predict method. | ||
predictions = self.model.predict(frames.iloc[:, :-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no guarantee that last the column is the value to predict I think in this case. We need to store the column to predict in this case. You can again check the ludwig for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it seems that the auto_train methods for Ludwig and XGBoost are different. In Ludwig, you provide the entire dataset (X + Y) to the auto train method and then specify the column which is supposed to act as Y. Whereas XGBoost auto train specifies the feature matrix X and prediction variable Y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate what is the difference?
What I meant originally is that the following query will not work, because the predict column is not the last one.
CREATE FUNCTION IF NOT EXISTS PredictRent FROM
( SELECT number_of_rooms, number_of_bathrooms, days_on_market, rental_price FROM HomeRentals )
TYPE XGBoost
PREDICT 'number_of_rooms';
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, we now pass the prediction column to the .py model files
PREDICT 'rental_price'; | ||
|
||
In the above query, you are creating a new customized function by training a model from the ``HomeRentals`` table using the ``Flaml XGBoost`` framework. | ||
The ``rental_price`` column will be the target column for predication, while the rest columns from the ``SELET`` query are the inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need add documentation on all the parameters XGBoost support. time_limit and metric are the two parameters we support now.
c3c3582
to
57990df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Could you fix the merge conflicts?
With this we won't have to rely on the last column always being the prediction column.
57990df
to
85033c3
Compare
…-db#1232) Co-authored-by: Jineet Desai <[email protected]> Co-authored-by: Andy Xu <[email protected]>
…-db#1232) Co-authored-by: Jineet Desai <[email protected]> Co-authored-by: Andy Xu <[email protected]>
…-db#1232) Co-authored-by: Jineet Desai <[email protected]> Co-authored-by: Andy Xu <[email protected]>
…-db#1232) Co-authored-by: Jineet Desai <[email protected]> Co-authored-by: Andy Xu <[email protected]>
No description provided.