Skip to content

Exact Post-Selection Inference for Adjusted R Squared: R package

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

pirennesarah/PoSIAdjRSquared

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PoSIAdjRSquared

The PoSIAdjRSquared package allows users to calculate p-values and confidence intervals for regression coefficients after they have been selected by adjusted R squared in linear models. The p-values and confidence intervals are valid after model selection with the same data. This allows the user to use all data for both model selection and inference without losing control over the type I error rate. The provided tests are more powerful than data splitting, which bases inference on less data since it discards all information used for selection.

Installation

You can install the PoSIAdjRSquared package directly in R:

install.packages("PoSIAdjRSquared")
library(PoSIAdjRSquared)

Example

This is a basic example which shows you how to calculate post-selection p-values and confidence intervals for some generated data. The code is similarly applicable to real data.

library(PoSIAdjRSquared)

  # Generate data
  n <- 100
  Data <- datagen.norm(seed = 7, n, p = 10, rho = 0, beta_vec = c(1,0.5,0,0.5,0,0,0,0,0,0))
  X <- Data$X
  y <- Data$y

  # Select model
  result <- fit_all_subset_linear_models(y, X, intercept=FALSE)
  phat <- result$phat
  X_M_phat <- result$X_M_phat
  k <- result$k
  R_M_phat <- result$R_M_phat
  kappa_M_phat <- result$kappa_M_phat
  R_M_k <- result$R_M_k
  kappa_M_k <- result$kappa_M_k

  # Estimate Sigma from residuals of full model
  full_model <- lm(y ~ 0 + X)
  sigma_hat <- sd(resid(full_model))
  Sigma <- diag(n)*(sigma_hat)^2

  # Construct test statistic
  Construct_test <- construct_test_statistic(j = 5, X_M_phat, y, phat, Sigma, intercept=FALSE)
  a <- Construct_test$a
  b <- Construct_test$b
  etaj <- Construct_test$etaj
  etajTy <- Construct_test$etajTy

  # Solve selection event
  Solve <- solve_selection_event(a,b,R_M_k,kappa_M_k,R_M_phat,kappa_M_phat,k)
  z_interval <- Solve$z_interval

  # Post-selection p-value for beta_j=0
  tn_sigma <- sqrt((t(etaj)%*%Sigma)%*%etaj)
  postselp_value_specified_interval(z_interval, etaj, etajTy, tn_mu = 0, tn_sigma)
#> [1] 0.8410427
  
  # Post-selection (1-alpha)% confidence interval
  compute_ci_with_specified_interval(z_interval, etaj, etajTy, Sigma, tn_mu = 0, alpha = 0.05)
#> [1] -0.2394537  0.1111173

Reference

Pirenne, S. and Claeskens, G. (2024). Exact post-selection inference for adjusted R squared selection. Statistics & Probability Letters, 211(110133):1-9. https://doi.org/10.1016/j.spl.2024.110133

About

Exact Post-Selection Inference for Adjusted R Squared: R package

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages