Project

For the project in this class, you will analyze data from a Norwegian study on air pollution, conducted at Alnabru in Oslo, Norway, from October 2001 to August 2003, and write a paper on your conclusions. The data, along with a detailed description of the variables, what they mean, and how to read the data into R are available here.

The objective of this project is to study the relationship between the logarithm of the concentration of nitrogen dioxide, NO2 (particles), and several traffic volume and meteorological variables. You should identify a statistical model that accurately describes the log NO2 concentration, with the explanatory variables explaining as much variability as possible, but without making it so complex that you overfit (make it unusable for other data collected through the same mechanism). You are not required to use all the variables in your final model, and you should be mindful of the relationships among some of the variables.

There are two final delivarables for the project:

  1. A paper describing your problem, the analysis you conducted, and your conclusions
  2. Supplementary materials with R output and diagnostic plots for your model

The report and supplementary materials should both be generated by an R markdown document, and I will ask you to submit both the R markdown document and the resulting knitted pdf.

Deadlines

Guidelines for the final report

Items 1-6 above will probably require between 5 and 10 pages, including figures and tables. Please do not go over 10 pages. If your report is looking like it will be less than 5 pages please run it by me and make sure you’re discussing everything in enough detail. You should not change the font size or margins from the defaults for R markdown documents.

Grading and Assessment Criteria

The project grade makes up 10% of the final grade for the class. Here are some things I’ll be considering:

  1. Technical Mastery: Do you demonstrate that you understand the methods you are using? Does the submitted R code work correctly? Can I knit the submitted R markdown files to generate the submitted pdf file?
  2. Writing: How effectively does the written report communicate the goals, procedures, and results of the study? Are the claims adequately discussed and supported? How well is the report structured and organized (this should not be a problem if you follow the structure I laid out!)? Are all of the figures and tables numbered and appropriately referenced? Does the writing style enhance what you are trying to communicate? How well is the report edited?
  3. Statisical Analysis: Are the chosen analyses appropriate for the variables/relationships under investigation, and are the assumptions underlying these analyses met? Do the analyses involve fitting and interpreting a multiple regression model? Are the analyses carried out correctly? Was the appropriateness of the model assessed using diagnostic plots? Is there an effective mix of graphical, numerical, and inferential analyses?
  4. Conclusions: Are the stated conclusions supported and justified by the analysis? Can the effects of confounding variables be controlled for (if not, is that discussed as a limitation of the analysis)? Is the scope of conclusions properly addressed?