Class Activity, September 30
Model selection with the dengue data
In this class activity, we will use stepwise selection to select
diagnostic variables to predict whether a patient has dengue. This uses
the stepAIC
function in the MASS
package.
Questions
- Let’s begin with forward selection, using AIC as a criterion. Use the code below to run forward selection. You can see that the procedure selects WBC, Age, PLT, BMI, Temperature, Vomiting, and DiseaseDay as predictors.
library(MASS)
<- read.csv("https://sta712-f22.github.io/homework/dengue.csv")
dengue
# specify the starting model (intercept-only)
<- glm(Dengue ~ 1, data = dengue, family = binomial)
m0
# forward selection using AIC
# Note we have to specify the largest model we want to consider
<- stepAIC(m0, scope = ~ Sex + Age + DiseaseDay +
forward_aic + Abdominal + Temperature + BMI +
Vomiting + HCT + PLT,
WBC direction = "forward",
trace = 0)
summary(forward_aic)
- Now let’s try forward selection with BIC instead. Do the variables selected change?
# forward selection using BIC
# Note we have to specify the largest model we want to consider
# k = log(n) is used to specify the penalty for BIC instead of AIC
<- stepAIC(m0, scope = ~ Sex + Age + DiseaseDay +
forward_bic + Abdominal + Temperature + BMI +
Vomiting + HCT + PLT,
WBC direction = "forward",
trace = 0, k = log(nrow(dengue)))
summary(forward_bic)
- In the dengue example, AIC and BIC actually gave us the same model.
But often AIC and BIC give different models. Here is an example using
data on risk factors associated with low infant birth weight
(
bwt
), comparing backward selection with AIC and BIC; you can see that BIC selects a smaller model.
example(birthwt) # (load the bwt data)
<- glm(low ~ ., family = binomial, data = bwt)
birthwt_glm
<- stepAIC(birthwt_glm, trace = 0)
birthwt_back_aic <- stepAIC(birthwt_glm, trace = 0,
birthwt_back_bic k = log(nrow(bwt)))
summary(birthwt_back_aic)
summary(birthwt_back_bic)