This repository contains training materials for learning the R programming language, specifically tailored towards air quality data science. The training is structured into distinct lessons, each focusing on a different aspect using R. Each lesson is contained in a separate folder, and includes a README.md file with the lesson material.
These lessons do not assume that you have any experience using R or any other programming language. It does assume that you have some familiarity with air quality data. Below is a list of software and R packages that are used throughout the lessons.
-
R: The statistical programming language. See the Introduction to R lesson for instructions on downloading.
-
RStudio: The integrated development environment. See the Introduction to R lesson for instructions on downloading.
-
region5air
: The R package with air quality data used throughout the lessons. See install instructions on GitHub.
These lessons are meant to be self-learning materials for air quality data science using R. It also provides some instructions on how to program with R. These two topics are related but it's helpful to understand the distinction.
-
Data Science is a collection of skills and methods for extracting insights from data. We are not using this phrase to include machine learning, as many do. In our use of the term, data science focuses on obtaining, storing, transforming, and visualizing data. Basic statistics and quality assurance are also touched on.
-
Programming, in our use of the term, is automating tasks using a computer language. In our case, we want to use R programming to automate air quality data science tasks to make our life easier. We also want to use programming to handle a high volume of data and a wide variety of analyses.
The topics in these lessons are not necessarily divided into data science lessons
and programming lessons. But it may be helpful to keep these two topics separate
in your mind as you progress through the material. Programming topics such as conditionals
and loops (and the R version of loops called apply
functions) are difficult
to understand at first. However, they are not essential for using R to read air
quality data, transform it, and visualize the output for use in a report or presentation.
Data science tasks will be more straight forward and they are the main topics in
these lessons.
-
Introduction to R: This lesson provides a basic introduction to R, including how to install and set up R and RStudio, an overview of R syntax, and how to perform simple operations.
-
Functions and Importing Data: This lesson covers how to use functions in R, including built-in functions and functions from packages. It also discusses how to import data such as text data in CSV files and Excel workbooks.
-
Subsetting, Sorting, and Combining Data Frames: This lesson covers how to subset data using indexing, logical operators, and the
filter( )
function fromdplyr
. It also covers how to sort and combine data frames. -
Writing Functions, Conditionals, and Loops: This lesson introduces the concept of writing functions in R, using conditionals to control the flow of execution, and implementing loops for repetitive tasks.
-
Plotting: This lesson focuses on visualizing data using various plotting techniques in R, including scatter plots, line graphs, and histograms.
-
Basic Statistics: This lesson covers the basics of statistical analysis in R, including calculating means, medians, standard deviations, and correlations.
-
Quality Assurance: This lesson discusses quality assurance in data analysis, including checking data types, handling outliers, dealing with missing data, and other common pitfalls.
Contributions to this repository are welcome. If you have a suggestion, please open an issue in this repository and let us know how we can improve the material. You can also submit a pull request.
Below is a list of helpful resources for learning R:
- R Basics Cheat Sheet
- Posit Cheatsheets
- The R courses on datacamp.com
- R for Data Science
- AI applications are helpful for answering specific R programming questions, such as ChatGPT and Perplexity.
- Using GitHub Copilot directly in RStudio will help you learn R by giving suggestions for how to write your code.
This project is licensed under the MIT License.