Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statsmodels api #23

Merged
merged 7 commits into from
Oct 19, 2022
Merged

Statsmodels api #23

merged 7 commits into from
Oct 19, 2022

Conversation

s3alfisc
Copy link
Member

@s3alfisc s3alfisc commented Oct 15, 2022

I have written a first 'API' for statsmodels as a wildboottest() function, that works similarly to fwildclusterboot::boottest(). As the bootstrap algorithm (as implemented) does not produce a bootstrapped vcov matrix, I believe that it might be best not to introduce the bootstrap algo via get_robustcov_results() but to write a custom method / function to work with OLS estimation in statsmodels. wildboottest() is a first draft - I am open to any changes you suggest @amichuda.

We can also introduce wild bootstrapped vcov matrices via get_robustcov_results(), but I would then suggest to do so at a later point in time. As mentioned the other day, computing the full bootstrapped vcov slows down the bootstrap quite a bit.

Note that there are still a few open questions regarding the pre-processing of the design matrix and cluster variable - e.g., does

model = sm.OLS(Y, X)
model.exog

drop all missing values, collinear variables, etc?

Example:

from wildboottest.wildboottest import wildboottest
import statsmodels.api as sm
import numpy as np

N = 1000
k = 10
G= 12
X = np.random.normal(0, 1, N * k).reshape((N,k))
beta = np.random.normal(0,1,k)
beta[0] = 0.005
u = np.random.normal(0,1,N)
Y = 1 + X @ beta + u
cluster = np.random.choice(list(range(0,G)), N)
B = 99999


model = sm.OLS(Y, X)
model.exog
results = model.fit(cov_type = 'cluster', cov_kwds = {
   'groups': cluster
})
results.summary()
# >>> results.summary()
# <class 'statsmodels.iolib.summary.Summary'>
# """
#                                  OLS Regression Results                                
# =======================================================================================
# Dep. Variable:                      y   R-squared (uncentered):                   0.799
# Model:                            OLS   Adj. R-squared (uncentered):              0.797
# Method:                 Least Squares   F-statistic:                              790.3
# Date:                Sat, 15 Oct 2022   Prob (F-statistic):                    3.16e-14
# Time:                        12:03:43   Log-Likelihood:                         -1784.6
# No. Observations:                1000   AIC:                                      3589.
# Df Residuals:                     990   BIC:                                      3638.
# Df Model:                          10                                                  
# Covariance Type:              cluster                                                  
# ==============================================================================
#                  coef    std err          z      P>|z|      [0.025      0.975]
# ------------------------------------------------------------------------------
# x1             0.0128      0.064      0.200      0.841      -0.113       0.138

wildboottest(model, "X1", cluster, B)
# 0.8408408408408409

@s3alfisc s3alfisc requested a review from amichuda October 15, 2022 10:14
@amichuda
Copy link
Collaborator

Alex, this looks amazing! Let me take a look at the pull request when I'm near my computer, but this looks great.

I think you're right in terms of statsmodels and the vcov matrix. However, I think what I would propose then is to add it to statsmodels in such a way that the resulting summary table would just show pvalues without a standard error. Perhaps either just NaN or we just cut that column out whenever a user chooses it. How does that sound?

@s3alfisc
Copy link
Member Author

Yes, I will add an option to bootstrap the "full" vcov matrix :) See #24

@amichuda amichuda merged commit c74238e into main Oct 19, 2022
@s3alfisc s3alfisc deleted the statsmodels-api branch April 30, 2023 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants