Question 2a: Training RSS is smallest for best subset.

Question 2b: More chance of getting a best test RSS.

Use the rnorm() function to generate a predictor X of length n = 100, as well as a noise vector ε of length n = 100.

rm(list=ls())

set.seed(1)
X <- rnorm(100)
error <- rnorm(100)

Generate a response vector Y of length n = 100 according to the model: Y = β0 +β1X +β2X2 +β3X3 +ε, where β0, β1, β2, and β3 are 4, 9, 2, 1 respectively.

Y <- 4 + 9*X + 2*X^2 - 1*X^3 + error

Use the regsubsets() function to perform best subset selection in order to choose the best model containing the predictors X, X2….X10. What is the best model obtained according to Cp, BIC, and adjusted R2 ? Show some plots to provide evidence for # your answer, and report the coefficients of the best model obtained. Note you will need to use the data.frame() function to create a single data set containing both X # and Y .

require(leaps)
## Loading required package: leaps
## Warning: package 'leaps' was built under R version 4.0.4
dataframe <- data.frame(Y, X)
?regsubsets
## starting httpd help server ...
##  done
fit <- regsubsets(Y ~ poly(X, 10), data = dataframe, nvmax = 10)

fit_summary <- summary(fit)

require(tidyverse);require(ggplot2);require(ggthemes);
## Loading required package: tidyverse
## Warning: package 'tidyverse' was built under R version 4.0.4
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.0     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.3
## Warning: package 'tidyr' was built under R version 4.0.4
## Warning: package 'readr' was built under R version 4.0.3
## Warning: package 'dplyr' was built under R version 4.0.4
## Warning: package 'forcats' was built under R version 4.0.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Loading required package: ggthemes
## Warning: package 'ggthemes' was built under R version 4.0.5
data_frame(Cp = fit_summary$cp,
           BIC = fit_summary$bic,
           AdjR2 = fit_summary$adjr2) %>%
  mutate(id = row_number()) %>%
  gather(value_type, value, -id) %>%
  ggplot(aes(id, value, col = value_type)) +
  geom_line() + geom_point() + ylab('') + xlab('Number of Variables Used') +
  facet_wrap(~ value_type, scales = 'free') + scale_x_continuous(breaks = 1:10)
## Warning: `data_frame()` was deprecated in tibble 1.1.0.
## Please use `tibble()` instead.

3 seems to be the better number of predictors for the model (with high AdjR2 and low BIC and Cp)

Repeat 3, using forward stepwise selection and also using back- wards stepwise selection. How does your answer compare to the results in 3?

Testing forward stepwise selection

require(caret)
## Loading required package: caret
## Warning: package 'caret' was built under R version 4.0.5
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
model_back <- train(Y ~ poly(X, 10), data = dataframe, 
                    method = 'glmStepAIC', direction = 'backward', 
                    trace = 0,
                    trControl = trainControl(method = 'none', verboseIter = FALSE))


summary(model_back$finalModel)
## 
## Call:
## NULL
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8914  -0.5860  -0.1516   0.5892   2.1794  
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      6.35264    0.09557  66.472   <2e-16 ***
## `poly(X, 10)1`  61.56415    0.95569  64.418   <2e-16 ***
## `poly(X, 10)2`  18.13980    0.95569  18.981   <2e-16 ***
## `poly(X, 10)3` -14.70908    0.95569 -15.391   <2e-16 ***
## `poly(X, 10)5`   1.48019    0.95569   1.549    0.125    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.9133516)
## 
##     Null deviance: 4424.513  on 99  degrees of freedom
## Residual deviance:   86.768  on 95  degrees of freedom
## AIC: 281.59
## 
## Number of Fisher Scoring iterations: 2

Testing forward stepwise selection

x_poly <- poly(dataframe$X, 10)

colnames(x_poly) <- paste0('poly', 1:10)
model_forw <- train(y = Y, x = x_poly,
                    method = 'glmStepAIC', direction = 'forward',
                    trace = 0,
                    trControl = trainControl(method = 'none', verboseIter = FALSE))


summary(model_forw$finalModel)
## 
## Call:
## NULL
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8914  -0.5860  -0.1516   0.5892   2.1794  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   6.35264    0.09557  66.472   <2e-16 ***
## poly1        61.56415    0.95569  64.418   <2e-16 ***
## poly2        18.13980    0.95569  18.981   <2e-16 ***
## poly3       -14.70908    0.95569 -15.391   <2e-16 ***
## poly5         1.48019    0.95569   1.549    0.125    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.9133516)
## 
##     Null deviance: 4424.513  on 99  degrees of freedom
## Residual deviance:   86.768  on 95  degrees of freedom
## AIC: 281.59
## 
## Number of Fisher Scoring iterations: 2