You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is more of a nitpick :) I think there is an implicit assumption that the types of the outcome_variable and treatment_variable(s) should be float. So if we provide a dataframe to DoubleMLData where those variables are of type Decimal, the partialling out step fails with the error shown below. This is more of an issue specially when reading parquet files.
TypeError Traceback (most recent call last)
Cell In[36], line 1
----> 1 dml_plr.fit(n_jobs_cv = -1)
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/doubleml/double_ml.py:605, in DoubleML.fit(self, n_jobs_cv, store_predictions, external_predictions, store_models)
602 ext_prediction_dict[learner] = None
604 # ml estimation of nuisance models and computation of score elements
--> 605 score_elements, preds = self._nuisance_est(self.__smpls, n_jobs_cv,
606 external_predictions=ext_prediction_dict,
607 return_models=store_models)
609 self._set_score_elements(score_elements, self._i_rep, self._i_treat)
611 # calculate rmses and store predictions and targets of the nuisance models
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/doubleml/double_ml_plr.py:231, in DoubleMLPLR._nuisance_est(self, smpls, n_jobs_cv, external_predictions, return_models)
226 g_hat = {'preds': external_predictions['ml_g'],
227 'targets': None,
228 'models': None}
229 else:
230 # get an initial estimate for theta using the partialling out score
--> 231 psi_a = -np.multiply(d - m_hat['preds'], d - m_hat['preds'])
232 psi_b = np.multiply(d - m_hat['preds'], y - l_hat['preds'])
233 theta_initial = -np.nanmean(psi_b) / np.nanmean(psi_a)
TypeError: unsupported operand type(s) for -: 'decimal.Decimal' and 'float'
Minimum reproducible code snippet
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LassoCV
from doubleml import DoubleMLData, DoubleMLPLR
df = pd.read_parquet("/...")
x_cols = [x for x in df.columns if "pre_" in x]
d_col = "event_action"
y_col = "post_outcome"
dml_data = DoubleMLData(df, y_col = y_col, d_cols=d_col, x_cols=x_cols)
learner = RandomForestRegressor(n_jobs = -1)
lasso = LassoCV()
dml_plr = DoubleMLPLR(dml_data, ml_l = learner, ml_g = learner, ml_m=lasso, score= "IV-type", n_folds = 2)
dml_plr.fit(n_jobs_cv = -1)
Expected Result
Model should fit successfully.
Actual Result
TypeError Traceback (most recent call last)
Cell In[36], line 1
----> 1 dml_plr.fit(n_jobs_cv = -1)
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/doubleml/double_ml.py:605, in DoubleML.fit(self, n_jobs_cv, store_predictions, external_predictions, store_models)
602 ext_prediction_dict[learner] = None
604 # ml estimation of nuisance models and computation of score elements
--> 605 score_elements, preds = self._nuisance_est(self.__smpls, n_jobs_cv,
606 external_predictions=ext_prediction_dict,
607 return_models=store_models)
609 self._set_score_elements(score_elements, self._i_rep, self._i_treat)
611 # calculate rmses and store predictions and targets of the nuisance models
File ~/anaconda3/envs/python3/lib/python3.10/site-packages/doubleml/double_ml_plr.py:231, in DoubleMLPLR._nuisance_est(self, smpls, n_jobs_cv, external_predictions, return_models)
226 g_hat = {'preds': external_predictions['ml_g'],
227 'targets': None,
228 'models': None}
229 else:
230 # get an initial estimate for theta using the partialling out score
--> 231 psi_a = -np.multiply(d - m_hat['preds'], d - m_hat['preds'])
232 psi_b = np.multiply(d - m_hat['preds'], y - l_hat['preds'])
233 theta_initial = -np.nanmean(psi_b) / np.nanmean(psi_a)
TypeError: unsupported operand type(s) for -: 'decimal.Decimal' and 'float'
Versions
Linux-5.10.205-195.807.amzn2.x86_64-x86_64-with-glibc2.26
Python 3.10.13 | packaged by conda-forge | (main, Oct 26 2023, 18:07:37) [GCC 12.3.0]
DoubleML 0.7.1
Scikit-Learn 1.3.2
The text was updated successfully, but these errors were encountered:
Thank you for highlighting this.
The predictions created by sklearn are float type such that the partialling out step fails.
I will try to add casting outcome and treatments
Describe the bug
This is more of a nitpick :) I think there is an implicit assumption that the types of the
outcome_variable
andtreatment_variable(s)
should be float. So if we provide a dataframe toDoubleMLData
where those variables are of typeDecimal
, the partialling out step fails with the error shown below. This is more of an issue specially when reading parquet files.Minimum reproducible code snippet
Expected Result
Model should fit successfully.
Actual Result
Versions
The text was updated successfully, but these errors were encountered: