1. Experiment. Suppose that the state of Ohio wishes to estimate how additional school funding affects academic outcomes (e.g., test scores, graduation rate). To do this, the state plans to
conduct an experiment in which a subset of districts are selected to receive an additional $2,000
per student. There 611 school districts in Ohio.
a) List the steps that the experiment should follow in order to ensure that the correct causal estimates are found. Be specific.
b) Explain how using a randomized experiment eliminates the potential for omitted variable bias. Use your own words.
c) Is it possible to test if the treatment and control districts are balanced on unobservable characteristics? Explain.
Group Fixed Effects
2. Group Fixed Effects. This question examines whether office workers are better paid than other employees. You should use the data set “industry_benefits.dta” to examine this question.
a) What is the average income for those who do and do not work in an office? Based on this, what is the wage gap between office workers and other employees?
b) Estimate the effect of working in an office on annual earnings and write the resulting regression equation. Is working in an office a statistically significant determinant of income?
c) Now add education and experience as additional variables and write the resulting regression equation. How did these control variables affect the coefficient magnitude and statistical
significance of being an office worker on income?
d) Explain why these changes occurred in part c) by examining the correlation between the new variables (education and experience) and being an office worker and earnings. Use the
expression for omitted variable bias.
e) Add industry fixed effects to the regression in part c). Write the resulting regression equation and explain how this affected the coefficient on being an office worker.
f) Explain the comparisons that are being used in the fixed effects regression in part e) and how they differ from the comparisons used in part c).
g) Rank the industries from the lowest to the highest paying while holding education, experience, and being male fixed.
Group and Time Fixed Effects
3. Group and Time Fixed Effects. This question examines the relationship between being married and wages. You should use the data set “wage_panel.dta” to examine this relationship.
a) How many individuals are there in the data set? For how many years is each person observed? b) Regress the log of wages on being married and write the results. What is the estimated effect
of being married?
c) Add individual fixed effects to the regression in order to make comparisons within a person over time (you may use the “areg” command). What is the estimated effect of being married?
d) You suspect that time is an important omitted variable. Discuss how “year” is likely to be correlated with wages and being married and how this will bias your estimate in part c).
e) Add “year” to the regression and write the results. Interpret the coefficient on “year”. How did adding year affect the coefficient on married? Is this consistent with your prediction in part d)?
f) Now add year fixed effects. How did this affect the coefficient on being married? g) Rank the years from lowest to highest in terms of wages while holding marital status fixed.
4. Instrumental Variables. You wish to identify the causal effect of increased agricultural productivity (output per acre) on weekly income. You have data on the average weekly income
(in dollars) for five provinces in India, the average weekly output of crops (in kilograms per
acre), and average monthly rainfall in centimeters.
province income output rain
prov1 10 6 5
prov2 14 10 25
prov3 8 2 10
prov4 6 4 5
prov5 12 8 10
a) Regress income on output per acre and report the results. Interpret the coefficient on output. b) Identify a potentially omitted variable that may bias the coefficient on output (and identify the
expected sign of the bias generated by omitting this variable).
c) Under what assumptions is rainfall a valid instrument for output? Be specific. d) Find the estimated effect of output per acre on income using rain as an instrumental variable
and report the results. Interpret the coefficient on output.
e) Compare the OLS from part a) and IV estimates from part d) and discuss if each can be interpreted as causal effects.
5. Instrumental Variables. Using the same data as above you are going to examine an alternative way of generating the instrumental variable estimates.
a) Estimate the effect of rain on output and report the results. b) Estimate the effect of rain on income and report the results. c) Show how you can use the estimates from these two regressions to generate the IV estimate
you found in #3. Explain the intuition behind this approach.
6. Regression Discontinuity. A university offers scholarships to applicants based on their high school grade point averages (GPA). In 2016, any student who earned a 3.65 GPA or higher
was offered a scholarship of $10,000 (scholar). We estimate if this program results in more
students attending the university using data for 2,500 applicants.
(0.09) (0.05) (0.03)
a) Interpret the coefficient on “GPA-3.65” in a sentence. Does this make sense? Explain. b) Interpret the coefficient on “scholar” in a sentence. c) Explain what it means for the coefficient in part b) to be called a local average treatment effect. d) Is the effect of the scholarship on attending the university statistically significant at the 95%
e) What is the probability of a student attending the university if they have a GPA of 3.64? 3.66? f) Draw graph of this regression equation and label the slope and the discontinuity.