The project should be a concisely, nicely written report of at least 10 pages. Reports should be written in Rstudio as an .Rnw ﬁle. You should submit a pdf of your report ** and** the R file used to generate your entire report.

Remark: The pdf will be the the main source for the grade, however, the submitted underlying .Rnw ﬁle must be compile-able and correct.

In the end, the reports should contain

- Graphs done in R
- Results of your computation with R
- Inferential statistics done with R
- Explanation/interpretation of your ﬁndings/results.

Think of the report as a ”Real life project” which you do for a company. This means that the reports should be presented nicely and readable for persons with little statistic knowledge (so make sure you clearly explain why you did what you did). **Present your results so that someone would be interested in reading them.**

In addition to the charts you’ve already included, you should now:

- Calculate (and test) the correlation between two appropriate variables. Compute the linear regression for the related pair, plot the scatter plot together with the linear regression and explain the ﬁndings.
- Run an additional, multivariate linear regression, by adding at least one additional independent variable. The additional variable maybe numeric
*or*you can create a “dummy” variable by coding a binary categorical variable with 0s and 1s. Discuss which independent variables are significant. Discuss each coefficient, and briefly discuss what it all means. - Compute a 95% conﬁdence interval for the parameter,
*p*, of a categorical variable with two outcomes. Explain what the conﬁdence interval is in general, and discuss what your result means explicitly. - Compare the conﬁdence intervals of the mean incomes of two subgroups (e.g. male vs female, college vs no college, etc.). Choose subgroups that best suit the other points of your project you discussed so far. Interpret the result.
- Test the diﬀerence of two means of two populations. Make sure to also run a test of two variances to see how to address the variance of the two populations when testing the means.
- Use the R function prop.test() to compare two proportions. Interpret your results.
- Use the R function chisq.test() to test two nominal variables for independence. Interpret your results.

Last Updated on