# Data Analysis 50 states and Washington D.C.

| 0

## Data Analysis

1) Data 4-12 has cross section data for 50 states and Washington D.C. for mortality rates and their determinants. The Y variable is MORT (Total Mortality rate per 100,000 population).
The explanatory variables are:
INCC = per capita income by state in dollars
POV = proportion of families living below the poverty level
EDU1 = proportion of population completing four years of high school.
EDU2 = proportion of population completing four years or more of college.
ALCC = Per-capita consumption of alcohol in gallons.
TOBC = per-capita consumption of cigarettes in packs.
HEXC = healthcare expenditures per capita in \$.
PHYS= physicians per 100,000 population.
URB = proportion of population living in urban areas.
AGED = Proportion of population over the age of 65.
a) For each explanatory variable, explain why it might have a casual effect on mortality. Make sure you identify if it is positive or negative.
b) Run the regression model. Do the signs of the coefficients coincide with your intuition? Identify the ones that do not.
c) Which variables are statistically significant at the 5% level?
d) What other variables can you think of that might be an explanatory variable in the model?
2) Data 4-13 has data on factors affecting baseball attendance in 78 metropolitan areas, complied by Scott Daniel. The Y variable is ATTEND (Total Attendance 1984-1986, in thousands).
The explanatory variables are:
POP = population of the metropolitan area.
CAPACITY = capacity of the home stadium in thousands.
PRIORWIN = # of wins the team had in the previous season.
CURNTWIN = # of wins the team had in the previous season.
GF = # of games behind the division leader the team is after the season was over.
OTHER = # of other baseball teams in the metropolitan area.
TEAMS = the number of football, basketball, or hockey teams found in the metropolitan area.
a) For each explanatory variable, explain why it might have a causal effect on ATTEND. Make sure you identify if it is positive or negative.
b) Run the regression model. Do the signs of the coefficients coincide with your intuition? Identify the ones that do not.
c) Which variables are statistically significant at the 5% level?
d) What other variables can you think of that might be an explanatory variable in the model?

Last Updated on