Skip to content

Latest commit

 

History

History
160 lines (100 loc) · 10.4 KB

current.md

File metadata and controls

160 lines (100 loc) · 10.4 KB

Go back to STAT545A home

STAT 545A Exploratory Data Analysis

1.5 credits
04 September 2013 - 16 October 2013
Mon Wed 9:30 - 11am in ESB 1042, a computing lab on the main ground floor of the Earth Sciences Building (ESB) at 2207 Main Mall Instructor: Jennifer (Jenny) Bryan [email protected]
TA: Song Cai [email protected]
Google Group for Q & A: STAT545A_2013
github repository for course materials: https://github.com/jennybc/STAT545A

cm = class meeting

Monday Sept 02 is a statutory holiday. No class.

cm 01 | Wednesday Sept 04 | Introduction to the course (slides as PDF)

  • Complete the Google form. JB sent a link to all registered students. If you need the link, contact her by email.

  • Ask to join the STAT545A_2013 Google Group or play hard to get and wait for us to invite you.

  • Follow some of the links that interest you in the cm01 lecture slides (link to PDF above). Would be great if people started a thread on the Google group suggesting more or better blogs or articles hitting the same topics or pointing out broken links.

  • Work through

  • Sign up for an account at Rpubs.com. We will try this as the first and gentlest method of generating finished work for this course. JB has conducted a test and it's dead easy. We'll get you started on Monday.

cm 02 | Monday Sept 09 | Create first report, Deep Thoughts, Basic care and feeding of data in R (slides as PDF)

  • Sign up for an educational account at github.com/edu. Take advantage of their special deal for students, where you can get something like 5 private repositories for two years for free. We will experiment with students modifying the course repository via a browser-based workflow for those who do not wish to take the git plunge yet. I will not require you to create or share your own repositories -- you just need a github account in order to edit mine, e.g. the course repository. Usage cases I have in mind:

    • Submitting coursework by adding a link to work you've published on Rpubs.com
    • Suggesting a dataset to work on later in the class or commenting on suggestions made by others
  • Work through

  • Feel free to start thinking about some datasets we could work with later in the class

cm 03 | Wednesday Sept 11 | R objects (beyond data.frames) and R Markdown (slides as PDF)

cm 04 | Monday Sept 16 | Data aggregation (slides as PDF)

  • Note that on Wednesday Sept 18 UBC will observe National Reconciliation Week. No class.

  • Study

  • Submit homework 3 before class starts @ 9:30am Monday Sept. 23.

  • Post a serious proposal for a dataset and/or make a thoughtful contribution to the discussion of an existing proposal. We need to start fleshing these out. The time needed for data assembly and cleaning is going to break your heart. Here are two places for these discussions

  • Spend ~1 hr (or more if you are new to the command line and scripting) reading these resources about how to ask for help. Don't be paranoid, this is not specifically about you and that question you asked the other day! This material has been in the course for a couple of years. This is another aspect of the culture that one has to actually learn.

cm 05 | Monday Sept 23 | Explore a quantitative variable, visuals via lattice (slides as PDF)

  • Work through

  • Reading on the lattice package in the book Lattice: Multivariate Data Visualization with R by Deepayan Sarkar (2008). Links to the eBook and other resources can be found on my resources page.

    • Ch. 1 Introduction (short! totally worth it)
    • Ch. 2 A Technical Overview of lattice (skimming is OK; at least you'll know where to come back to when you're confused)
    • Ch. 3 Visualizing Univariate Distributions (great companion to our work in class this week)

cm 06 | Wednesday Sept 25 | Explore a quantitative variable, visuals via lattice, cont'd (slides as PDF)

  • Work through

  • Reading on the lattice package in the book Lattice: Multivariate Data Visualization with R by Deepayan Sarkar (2008). Links to the eBook and other resources can be found on my resources page.

    • Ch. 5 Scatter Plots and Extensions. Most important sections (skim the rest?):
      • 5.1 The standard scatter plot
      • 5.3 Variants using the type argument
      • 5.4 Scatter-plot variants for large data
  • Submit homework 4 before class starts @ 9:30am Monday Sept. 30. Make graphical companions to data aggregation output from homework 3.

cm 07 | Monday Sept 30 | Two quantitative variables + lattice details + writing figures to file (slides as PDF)

cm 08 | Wednesday Oct 02 | ggplot2

  • Submit homework 5 before class starts @ 9:30am Monday Oct. 07.

  • Reading on the ggplot2 package in the book ggplot2: Elegant Graphics for Data Analysis. Links to the eBook and much more can be found on my resources page.

    • Ch. 3 Mastering the grammar

cm 09 | Monday Oct 07 | Colors

cm 10 | Wednesday Oct 09 | More ggplot2

Monday Oct 14 is a statutory holiday. No class. Happy Thanksgiving!

cm 11 | Wednesday Oct 16 | Coding style, project organization, version control (slides as PDF)

cm 99 | December | Course wrap-up

This work is licensed under the CC BY-NC 3.0 Creative Commons License.