Project
For the project in this class, you will analyze data from a Norwegian study on air pollution,
conducted at Alnabru in Oslo, Norway, from October 2001 to August 2003, and write a report on your
conclusions. The data, along with a detailed description of the variables, what they mean,
and how to read the data into R are available here.
The objective of this project is to study the relationship between the logarithm of the concentration
of nitrogen dioxide, NO2 (particles), and several traffic volume and meteorological variables. You should identify a
statistical model that accurately describes the log NO2 concentration, with the explanatory variables
explaining as much variability as possible, but without making it so complex that you overfit
(make it unusable for other data collected through the same mechanism).
You are not required to use all the variables in your final model, and you should be mindful of the
relationships among some of the variables (through the variable descriptions) when you are considering
what varibles to include.
There are two final delivarables for the project:
- A report describing your problem, the analysis you conducted, and your conclusions. You should
include your R code in the body of your report, just as you would in a homework or lab assignment.
- A peer review of another student's report (this will be assigned to you)
The report and review should both be generated by an R markdown document,
and I will ask you to submit both the R markdown document and the resulting knitted pdf.
Deadlines
- 11:59 PM EDT Wed., April 22: Project Assignment 1,
submitted for anonymous peer review (details below).
Submit by uploading to Google drive (Spring 2020 Projects) and sending me an email
letting me know you have done that.
- 11:59 PM EDT Fri., April 24: Project Assignment 2. Submit by uploading to Google drive (Spring 2020 Projects) and sending me an email letting me know you have done that.
- 11:59 AM EDT Tues., May 5: Project Assignment 3.
Final submission of R markdown file and pdf for report with peer review comments integrated.
Submit by uploading to Google drive (Spring 2020 Projects) and sending me an email letting
me know you have done that.
Project Assignment 1 (25%)
This is a rough draft of your report, which will be peer-reviewed by another classmate.
I encourage you to complete as much of your analysis and writing as you can before this
deadline. This includes fitting a multiple regresssion model, checking (four) model conditions,
making appropriate plots and tables of the data to check these conditions or summarize interesting
findings, and identifying any evident problems with model conditions. I do not necessarily expect you
to be able to address these problems at this point, but you should be able to identify them.
Note, your model does not have to be that complicated (a couple of explanatory variables is fine),
but you should justify your model choice through hypothesis testing for coefficients. You can also
compare models using adjusted R2. The more complete your report, the more constructively your
peer can critique it.
There are two components to this part of the project:
- A 10-15 minute virtual meeting with the instructor
prior to April 22 to discuss your plans for the project; you can sign up for a time slot for
(1) on my calendar (email will be sent). This meeting will be scheduled using Zoom unless you indicate
that another platform is required, in which case we will make plans on a case-by-case basis. (5%)
- A rough draft of the report, to be submitted to the Google drive (email invitation will be sent). (20%)
Project Assignment 2 (25%)
Write no more than one page anonymous peer review of your classmate's assignment.
To the best of your ability, you should comment on clarity of writing, technical correctness and
completeness of the statistical analyses, and presentation of results.
More guidance on what to look for and style of peer review will be added to this prior to
assignment of this part of the project.
Project Assignment 3 (50%)
This is your final report. Prior to submission, you should integrate constructive comments that you
received through peer review into your final report.
Guidelines for the final report
- Overall, the project report should be written in clear, concise prose.
- We will use a structure that is similar to a standard scientific report you might share
with a collaborator. Please follow the structure below:
- Title
- Summary: an introduction to the problem we are addressing,
a brief description of the methods you consider, and a summary of the results. Aim for 1 paragraph.
- Data: a brief summary of key features of the dataset.
You should define each variable that will be used (to the level that it is possible to do this,
given the information provided about the data).
Also include a few plots showing a few key insights about the data set.
Note that there will probably not be enough space to present every plot you make during the course of
conducting your analysis; you will have to select a small number of the most informative plots
to include. These plots should be briefly discussed in the text.
At least a few sentences of context and description of the dataset should be included
(how were the data collected? What was measured?), and the number of observations in the data set
should be stated. Aim for about 1-2 pages.
There should be enough detail that the scope of conclusions from your analysis can be assessed.
- Methods: a description of the statistical model used in your analysis. Aim for a page or less.
- Results: a presentation of your results. This should include a paragraph or two stating the results of the analysis with minimal interpretation. Aim for less than a page.
- Discussion: summarize your work, its limitations, and possible future steps/improvements. Address the answers to the problem you outlined in your summary and the scope of your conclusions. This can be a page or two.
- References: cite all sources in a standard format.
Items 1-6 above will probably require between 5 and 10 pages, including figures and tables, but excluding code chunks.
Please do not go over 10 pages (you can eyeball this).
If your report is looking like it will be less than 5 pages please run it by me and make sure
you’re discussing everything in enough detail.
You should not change the font size or margins from the defaults for R markdown documents.
Grading and Assessment Criteria
The project grade makes up 30% of the final grade for the class. Here are some things I’ll be considering:
- Technical Mastery: Do you demonstrate that you understand the methods you are using?
Does the submitted R code work correctly?
Can I knit the submitted R markdown files to generate the submitted pdf file?
- Writing: How effectively does the written report communicate the goals, procedures, and results of the study?
Are the claims adequately discussed and supported?
How well is the report structured and organized (this should not be a problem if you follow the structure I laid out!)?
Are all of the figures and tables numbered and appropriately referenced?
Does the writing style enhance what the author is trying to communicate? How well is the report edited?
- Statisical Analysis: Are the chosen analyses appropriate for the variables/relationships under investigation, and are the assumptions underlying these analyses met? Do the analyses involve fitting and interpreting a multiple regression model? Are the analyses carried out correctly? Was the appropriateness of the model assessed using diagnostic plots? Is there an effective mix of graphical, numerical, and inferential analyses?
- Conclusions: Are the stated conclusions supported and justified by the analysis? Is the scope of conclusions properly addressed?