Computer science R project

posted in: Research Paper | 0

The project should be a concisely, nicely written report of at least 10 pages. Reports should be written in Rstudio as an .Rnw file. You should submit a pdf of your report and the R file used to generate your entire report.

Remark: The pdf will be the the main source for the grade, however, the submitted underlying .Rnw file must be compile-able and correct.

In the end, the reports should contain

  • Graphs done in R
  • Results of your computation with R
  • Inferential statistics done with R
  • Explanation/interpretation of your findings/results.

Think of the report as a ”Real life project” which you do for a company. This means that the reports should be presented nicely and readable for persons with little statistic knowledge (so make sure you clearly explain why you did what you did). Present your results so that someone would be interested in reading them.

In addition to the charts you’ve already included, you should now:

  • Calculate (and test) the correlation between two appropriate variables. Compute the linear regression for the related pair, plot the scatter plot together with the linear regression and explain the findings.
  • Run an additional, multivariate linear regression, by adding at least one additional independent variable. The additional variable maybe numeric or you can create a “dummy” variable by coding a binary categorical variable with 0s and 1s. Discuss which independent variables are significant. Discuss each coefficient, and briefly discuss what it all means.
  • Compute a 95% confidence interval for the parameter, p, of a categorical variable with two outcomes. Explain what the confidence interval is in general, and discuss what your result means explicitly.
  • Compare the confidence intervals of the mean incomes of two subgroups (e.g. male vs female, college vs no college, etc.). Choose subgroups that best suit the other points of your project you discussed so far. Interpret the result.
  • Test the difference of two means of two populations. Make sure to also run a test of two variances to see how to address the variance of the two populations when testing the means.
  • Use the R function prop.test() to compare two proportions. Interpret your results.
  • Use the R function chisq.test() to test two nominal variables for independence. Interpret your results.

 

Last Updated on