Skip to content

SFOEWD/data-academy-intro-to-r

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains the course materials for Data Academy’s Introduction to R course.

Setup

  1. Clone/download the repository onto your computer. Then open data-academy-intro-to-r.Rproj in RStudio (should just be able to double-click the file).

  2. Open R/install_packages.R and click ‘Source’. This should install the required packages for the workshop.

  3. During the workshop, open R/script.R and follow along, executing each line of code.

  4. After the workshop, complete the lab exercises below in a new R script.

Lab

In a new script, load the tidyverse and complete the exercises below.

  1. Read data/flights.rds into R, naming the output flights.

  2. From flights, select the year, month, and day columns.

  3. From flights, select all columns except carrier.

  4. Sort flights by carrier.

  5. Sort flight by departure delay in descending order.

  6. Rename the tailnum column to tail_num.

  7. Find the distinct origins.

  8. Find the distinct combinations of origins and destinations.

  9. Filter flight for the first day of January or February.

  10. Find all flights that meet these conditions:

  • Had an arrival delay of two or more hours
  • Flew to Houston (IAH or HOU)
  • Were operated by United (UA), American (AA), or Delta (DL)
  • Departed in summer (July, August, and September)
  1. Count the flight destinations by origin.

  2. Create a new column ‘speed’ that is distance divided by air time multiplied by 60.

  3. Create a new column ‘flight_hours’ that is the air_time divided by 60.

  4. In a pipeline (consecutive pipes), filter flights where neither arr_delay and tailnum are not NA. Then count the destinations.

  5. In a pipeline (consecutive pipes), filter flights where the destination is “IAH”, then find the average arrival delays, grouped by year, month, and day.

  6. Which carrier has the highest average delays, both arrival and departure? Calculate both within summarize().

  7. During which month is there the highest average departure delays? The lowest?

  8. Read data/airports.rds into R, naming the output airports.

  9. Filter airports where ‘Intl’ is in the name, then rename the faa column to dest. Save the output to a data frame called intl_airports.

  10. In a single pipeline, select sched_arr_time, arr_delay, dest from flights, inner join intl_airports using dest as the ‘key’, and then count the airport names. Sort the output. Then use a left join instead of an inner join. What’s different? And why is it different?

  11. A pattern we haven’t seen yet is group_by() followed by mutate(). Compare the two outputs below:

penguins %>% 
  group_by(island) %>% 
  summarize(mean_flipper_length = mean(flipper_length_mm, na.rm = TRUE))

penguins %>% 
  group_by(island) %>% 
  mutate(mean_flipper_length = mean(flipper_length_mm, na.rm = TRUE))

How are they different? How does the output change if we add %>% ungroup() to the end? Why might we want to add ungroup() after we’ve completed a grouping operation (it’s usually a good decision)?

About

Materials for Data Academy's Introduction to R

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages