Class Activity, October 28

The Framingham heart study

The data in this class activity comes from a study on residents of Framingham, MA, which was conducted to research variables related to heart health. We will work with a subset of the data, containing

  • cigsPerDay: The number of cigarettes smoked per day during the study period.
  • education: 1 = High School, 2 = Some College, 3 = College Degree, 4 = Advanced Degree.
  • male: 1 = Male, 0 = Female.
  • age: The age of the individual in years.
  • diabetes: 1 if the individual has diabetes, 0 otherwise.
  • BMI: the individual’s body mass index (BMI)
  • currentSmoker: 1 if the individual currently smokes, 0 otherwise

Questions

While the data were originally collected to study heart health, in this activity we will try to model the number of cigarettes smoked. Since not all participants are smokers, we will restrict our analysis only to the current smokers in the data.

Below is the output of several Poisson and quasi-Poisson regression models using this data. Use this output to answer the following question.

  1. Perform a goodness of fit test to assess whether the initial Poisson model is a good fit to the data. Why might the model be a poor fit?

  2. Calculate the mean deviance estimate \(\widehat{\phi}\) of the dispersion parameter.

  3. Test whether there is any relationship between education level and the number of cigarettes smoked per day, after accounting for sex, age, diabetes, and BMI.

  4. Examining the R output, age appears to have a statistically significant relationship with the number of cigarettes smoked in each model. Do you think this relationship is practically significant (e.g., is there a meaningful difference between the number of cigarettes smoked per day for participants of different ages?)

Output

smokers <- heart_data %>%
  filter(currentSmoker == 1)

m1 <- glm(cigsPerDay ~ male + age + education + diabetes + BMI,
          data = smokers, family = poisson)
summary(m1)
## 
## Call:
## glm(formula = cigsPerDay ~ male + age + education + diabetes + 
##     BMI, family = poisson, data = smokers)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -6.369  -1.583  -0.362   1.422   7.501  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.8371917  0.0489922  57.911  < 2e-16 ***
## male         0.4561700  0.0111074  41.069  < 2e-16 ***
## age         -0.0068063  0.0006789 -10.026  < 2e-16 ***
## education2   0.0164626  0.0126312   1.303 0.192462    
## education3   0.0164274  0.0160498   1.024 0.306057    
## education4  -0.0150328  0.0172291  -0.873 0.382921    
## diabetes    -0.0253978  0.0394181  -0.644 0.519368    
## BMI          0.0050011  0.0014065   3.556 0.000377 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 13471  on 2011  degrees of freedom
## Residual deviance: 11540  on 2004  degrees of freedom
## AIC: 20647
## 
## Number of Fisher Scoring iterations: 5
m2 <- glm(cigsPerDay ~ male + age + education + diabetes + BMI,
          data = smokers, family = quasipoisson)
summary(m2)
## 
## Call:
## glm(formula = cigsPerDay ~ male + age + education + diabetes + 
##     BMI, family = quasipoisson, data = smokers)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -6.369  -1.583  -0.362   1.422   7.501  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.837192   0.115099  24.650  < 2e-16 ***
## male         0.456170   0.026095  17.481  < 2e-16 ***
## age         -0.006806   0.001595  -4.267 2.07e-05 ***
## education2   0.016463   0.029675   0.555    0.579    
## education3   0.016427   0.037706   0.436    0.663    
## education4  -0.015033   0.040477  -0.371    0.710    
## diabetes    -0.025398   0.092606  -0.274    0.784    
## BMI          0.005001   0.003304   1.513    0.130    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 5.519388)
## 
##     Null deviance: 13471  on 2011  degrees of freedom
## Residual deviance: 11540  on 2004  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5
m3 <- glm(cigsPerDay ~ male + age + diabetes + BMI,
          data = smokers, family = quasipoisson)
summary(m3)
## 
## Call:
## glm(formula = cigsPerDay ~ male + age + diabetes + BMI, family = quasipoisson, 
##     data = smokers)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -6.3317  -1.5862  -0.3696   1.4286   7.5679  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.855427   0.107731  26.505  < 2e-16 ***
## male         0.453025   0.025778  17.574  < 2e-16 ***
## age         -0.006970   0.001556  -4.478 7.96e-06 ***
## diabetes    -0.025265   0.092543  -0.273    0.785    
## BMI          0.004903   0.003288   1.491    0.136    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 5.512254)
## 
##     Null deviance: 13471  on 2011  degrees of freedom
## Residual deviance: 11544  on 2007  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5