-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test Different Regression Schemes in StatsModels Package #24
Comments
Swapping out the ordinary-least-squares (OLS) regression from Numpy with the statsmodels implementation of OLS shows virtually no difference with some preliminary tests. Using statsmodels OLS comes with the added benefit of exposing some useful diagnostics for model analysis.
Statsmodels OLS Sample Output:
Statsmodels OLS Output Interpretation |
AIC and Mallow's Cp are equivalent for OLS... so this is useful from that standpoint. How do they fit the distribution to determine confidence intervals? Also, we should be using prediction intervals rather than confidence intervals. |
Is there any benefit to running statsmodels during the selection process? I would think that the numpy routine is much faster. It seems like that might slow things down a bunch. Maybe we should just run statsmodels after the selection routine chooses the best performing models. Then we'll still get the CI's without slowing everything down. |
Ok. The numpy implementation is faster. Benchmark timing test and results shown below. As Kevin suggests, we can use the statsmodels regression for a more in-depth diagnostic of the numpy selected models and probably for the random sampling of the regressed coefficients using the CI distribution. If we want to use some of the statsmodels built-in model-performance metrics in the model selection process, then the trade-off would be felt in regression analysis speed.
Actual Code in FeatureSelectionV3.py MultipleRegression() method
|
Would be nice to test if better models can be generated using the regression schemes available in the StatsModels Python package.
Linear Regression - https://www.statsmodels.org/stable/examples/index.html#regression
Generalized Linear - https://www.statsmodels.org/stable/examples/index.html#glm
Discrete Choice - https://www.statsmodels.org/stable/examples/index.html#discrete
The text was updated successfully, but these errors were encountered: