Class Activity, October 28
The Framingham heart study
The data in this class activity comes from a study on residents of Framingham, MA, which was conducted to research variables related to heart health. We will work with a subset of the data, containing
cigsPerDay
: The number of cigarettes smoked per day during the study period.education
: 1 = High School, 2 = Some College, 3 = College Degree, 4 = Advanced Degree.male
: 1 = Male, 0 = Female.age
: The age of the individual in years.diabetes
: 1 if the individual has diabetes, 0 otherwise.BMI
: the individual’s body mass index (BMI)currentSmoker
: 1 if the individual currently smokes, 0 otherwise
Questions
While the data were originally collected to study heart health, in this activity we will try to model the number of cigarettes smoked. Since not all participants are smokers, we will restrict our analysis only to the current smokers in the data.
Below is the output of several Poisson and quasi-Poisson regression models using this data. Use this output to answer the following question.
Perform a goodness of fit test to assess whether the initial Poisson model is a good fit to the data. Why might the model be a poor fit?
Calculate the mean deviance estimate \(\widehat{\phi}\) of the dispersion parameter.
Test whether there is any relationship between education level and the number of cigarettes smoked per day, after accounting for sex, age, diabetes, and BMI.
Examining the R output, age appears to have a statistically significant relationship with the number of cigarettes smoked in each model. Do you think this relationship is practically significant (e.g., is there a meaningful difference between the number of cigarettes smoked per day for participants of different ages?)
Output
<- heart_data %>%
smokers filter(currentSmoker == 1)
<- glm(cigsPerDay ~ male + age + education + diabetes + BMI,
m1 data = smokers, family = poisson)
summary(m1)
##
## Call:
## glm(formula = cigsPerDay ~ male + age + education + diabetes +
## BMI, family = poisson, data = smokers)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.369 -1.583 -0.362 1.422 7.501
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.8371917 0.0489922 57.911 < 2e-16 ***
## male 0.4561700 0.0111074 41.069 < 2e-16 ***
## age -0.0068063 0.0006789 -10.026 < 2e-16 ***
## education2 0.0164626 0.0126312 1.303 0.192462
## education3 0.0164274 0.0160498 1.024 0.306057
## education4 -0.0150328 0.0172291 -0.873 0.382921
## diabetes -0.0253978 0.0394181 -0.644 0.519368
## BMI 0.0050011 0.0014065 3.556 0.000377 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 13471 on 2011 degrees of freedom
## Residual deviance: 11540 on 2004 degrees of freedom
## AIC: 20647
##
## Number of Fisher Scoring iterations: 5
<- glm(cigsPerDay ~ male + age + education + diabetes + BMI,
m2 data = smokers, family = quasipoisson)
summary(m2)
##
## Call:
## glm(formula = cigsPerDay ~ male + age + education + diabetes +
## BMI, family = quasipoisson, data = smokers)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.369 -1.583 -0.362 1.422 7.501
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.837192 0.115099 24.650 < 2e-16 ***
## male 0.456170 0.026095 17.481 < 2e-16 ***
## age -0.006806 0.001595 -4.267 2.07e-05 ***
## education2 0.016463 0.029675 0.555 0.579
## education3 0.016427 0.037706 0.436 0.663
## education4 -0.015033 0.040477 -0.371 0.710
## diabetes -0.025398 0.092606 -0.274 0.784
## BMI 0.005001 0.003304 1.513 0.130
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 5.519388)
##
## Null deviance: 13471 on 2011 degrees of freedom
## Residual deviance: 11540 on 2004 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
<- glm(cigsPerDay ~ male + age + diabetes + BMI,
m3 data = smokers, family = quasipoisson)
summary(m3)
##
## Call:
## glm(formula = cigsPerDay ~ male + age + diabetes + BMI, family = quasipoisson,
## data = smokers)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.3317 -1.5862 -0.3696 1.4286 7.5679
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.855427 0.107731 26.505 < 2e-16 ***
## male 0.453025 0.025778 17.574 < 2e-16 ***
## age -0.006970 0.001556 -4.478 7.96e-06 ***
## diabetes -0.025265 0.092543 -0.273 0.785
## BMI 0.004903 0.003288 1.491 0.136
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 5.512254)
##
## Null deviance: 13471 on 2011 degrees of freedom
## Residual deviance: 11544 on 2007 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5