update README, edit YAML metadata

reikookamoto · Sep 16, 2024 · 4f2e99b · 4f2e99b
1 parent 431fe71
commit 4f2e99b
Show file tree

Hide file tree

Showing 10 changed files with 146 additions and 123 deletions.
diff --git a/README.md b/README.md
@@ -2,20 +2,36 @@
 
 Material for workshop at Département de science politique de l'Université de Montréal
 
-## Directory Structure
+## Getting Started with the Repository
+
+-   Clone or download the repository
+
+    -   Clone the repository to your local machine using Git
+
+    -   Alternatively, you can click the green "\<\> Code" button and select "Download ZIP" to download the repository as a ZIP file. After downloading, extract the contents.
+
+-   After cloning or downloading, navigate to the project folder on your computer.
 
--   intro-to-dplyr/: Contains files for the first half of the workshop, focusing on data manipulation using dplyr
+-   Double click on `2024-09-18_intro-to-tidyverse.Rproj` in the project root to open the RStudio Project.
 
--   intro-to-ggplot2/: Contains files for the second half of the workshop, covering basic data visualization with ggplot2
+-   Before running the code, make sure you have the following R packages installed:
+
+    -   tidyverse
+
+    -   here
+
+    -   RColorBrewer
+
+## Directory Structure
 
--   more-ggplot2/: Includes additional code that we probably won't have time to cover during the workshop but may be helpful for further learning
+-   `intro-to-dplyr/`: Contains files for the first half of the workshop, focusing on data manipulation using dplyr
 
-## Prerequisites
+-   `intro-to-ggplot2/`: Contains files for the second half of the workshop, covering basic data visualization with ggplot2
 
-Before running the code, make sure you have the following R packages installed:
+-   `more-ggplot2/`: Includes additional code that we probably won't have time to cover during the workshop but is helpful for further learning
 
--   tidyverse
+## During the workshop
 
--   here
+-   In each of the above folders, you'll find a `.qmd` file with `_blank` in its name. If you'd like to **code along**, you can use these files, which provide skeletons for you to fill in as we work through the material.
 
--   RColorBrewer
+-   If you prefer to **follow along without coding**, open the other `.qmd` file that already contains all the code.
diff --git a/intro-to-dplyr/intro-to-dplyr.md b/intro-to-dplyr/intro-to-dplyr.md
@@ -1,18 +1,18 @@
 # Introduction to dplyr
 Reiko Okamoto
-2024-09-13
+2024-09-16
 
 ## 👋Welcome to the tidyverse
 
-#### ***What is the tidyverse?***
+#### *What is the tidyverse?*
 
 The [tidyverse](https://tidyverse.tidyverse.org/) is a collection of R
 packages designed for data science. Arguably, two of the most popular
 packages in the tidyverse are [dplyr](https://dplyr.tidyverse.org/) for
 data manipulation and [ggplot2](https://ggplot2.tidyverse.org/) for data
 visualization. These are also two of the packages we are covering today!
 
-***Why learn it?***
+#### *Why learn it?*
 
 The skills you gain are not just limited to R. The concepts, like
 filtering data and creating plots, are applicable to other languages
@@ -21,7 +21,7 @@ languages later. Additionally, the tidyverse is in tune with open
 science practices, helping you create analyses that are more accessible,
 transparent, and reproducible.
 
-***Keep in mind…***
+#### *Keep in mind…*
 
 There’s no expectation for you to memorize everything. Even experienced
 programmers don’t have every function memorized - they’re constantly
@@ -695,22 +695,6 @@ penguins |>
 The function creates a new data frame with a single row containing the
 summary statistic.
 
-💻Calculate the minimum and maximum of body mass at the same time:
-
-``` r
-penguins |> 
-  summarise(min_body_mass = min(body_mass_g, na.rm = TRUE),
-            max_body_mass = max(body_mass_g, na.rm = TRUE))
-```
-
-    # A tibble: 1 × 2
-      min_body_mass max_body_mass
-              <int>         <int>
-    1          2700          6300
-
-Similar to what we’ve seen in other functions, we can create multiple
-summaries in a single step by separating them with commas.
-
 ## 7️⃣Group by one or more variables: group_by()
 
 In data analysis, a common task is to split our data into groups, apply
@@ -756,21 +740,9 @@ penguins |>
     2 Dream       124
     3 Torgersen    52
 
-💻Achieve this count using the
+Alternatively, use the
 [`count()`](https://dplyr.tidyverse.org/reference/count.html) function,
-which combines `group_by()` and `tally()` in one step:
-
-``` r
-penguins |> 
-  count(island)
-```
-
-    # A tibble: 3 × 2
-      island        n
-      <fct>     <int>
-    1 Biscoe      168
-    2 Dream       124
-    3 Torgersen    52
+which combines `group_by()` and `tally()` in one step.
 
 💻Calculate the mean and standard deviation of body mass for each
 combination of species and sex:
@@ -801,7 +773,7 @@ penguins |>
 By default, when we apply a grouping with multiple factors, dplyr will
 keep the last level of grouping after the summary. Here, the output is
 still grouped by `species`. To remove grouping from a data frame, use
-the `ungroup()` function or the `.groups  = "drop"` argument in the
+the `ungroup()` function or the `.groups = "drop"` argument in the
 `summarise()` function. Both methods will allow us to continue working
 with the data as a regular data frame.
 
@@ -838,6 +810,9 @@ df <- penguins |>
             .groups = "drop")
 ```
 
+Similar to what we’ve seen in other functions, we can create multiple
+summaries in a single step by separating them with commas.
+
 ## 8️⃣Sort rows and extract specific values: arrange(), slice(), pull()
 
 Sometimes, we’re interested in extracting a particular value from a data

diff --git a/intro-to-dplyr/intro-to-dplyr.qmd b/intro-to-dplyr/intro-to-dplyr.qmd
@@ -4,23 +4,21 @@ author: "Reiko Okamoto"
 date: "`r Sys.Date()`"
 format: gfm
 editor: visual
+execute:
+  echo: true
 ---
 
-```{r setup, include=FALSE}
-knitr::opts_chunk$set(echo = TRUE)
-```
-
 ## 👋Welcome to the tidyverse
 
-#### ***What is the tidyverse?***
+#### *What is the tidyverse?*
 
 The [tidyverse](https://tidyverse.tidyverse.org/) is a collection of R packages designed for data science. Arguably, two of the most popular packages in the tidyverse are [dplyr](https://dplyr.tidyverse.org/) for data manipulation and [ggplot2](https://ggplot2.tidyverse.org/) for data visualization. These are also two of the packages we are covering today!
 
-***Why learn it?***
+#### *Why learn it?*
 
 The skills you gain are not just limited to R. The concepts, like filtering data and creating plots, are applicable to other languages like SQL and Python. This makes it easier to pick up other tools and languages later. Additionally, the tidyverse is in tune with open science practices, helping you create analyses that are more accessible, transparent, and reproducible.
 
-***Keep in mind...***
+#### *Keep in mind...*
 
 There's no expectation for you to memorize everything. Even experienced programmers don't have every function memorized - they're constantly googling things! My goal today is to help you get comfortable with, and hopefully interested in, using the tidyverse for data analysis.
 
@@ -252,16 +250,6 @@ penguins |>
 
 The function creates a new data frame with a single row containing the summary statistic.
 
-💻Calculate the minimum and maximum of body mass at the same time:
-
-```{r}
-penguins |> 
-  summarise(min_body_mass = min(body_mass_g, na.rm = TRUE),
-            max_body_mass = max(body_mass_g, na.rm = TRUE))
-```
-
-Similar to what we've seen in other functions, we can create multiple summaries in a single step by separating them with commas.
-
 ## 7️⃣Group by one or more variables: group_by()
 
 In data analysis, a common task is to split our data into groups, apply a function to each group, and then combine the results. This approach is known as the split-apply-combine paradigm. The [`group_by()`](https://dplyr.tidyverse.org/reference/group_by.html) function helps us achieve this by allowing us to specify how we want to split our data into groups.
@@ -284,12 +272,7 @@ penguins |>
   tally()
 ```
 
-💻Achieve this count using the [`count()`](https://dplyr.tidyverse.org/reference/count.html) function, which combines `group_by()` and `tally()` in one step:
-
-```{r}
-penguins |> 
-  count(island)
-```
+Alternatively, use the [`count()`](https://dplyr.tidyverse.org/reference/count.html) function, which combines `group_by()` and `tally()` in one step.
 
 💻Calculate the mean and standard deviation of body mass for each combination of species and sex:
 
@@ -300,7 +283,7 @@ penguins |>
             sd_body_mass = sd(body_mass_g, na.rm = TRUE))
 ```
 
-By default, when we apply a grouping with multiple factors, dplyr will keep the last level of grouping after the summary. Here, the output is still grouped by `species`. To remove grouping from a data frame, use the `ungroup()` function or the `.groups  = "drop"` argument in the `summarise()` function. Both methods will allow us to continue working with the data as a regular data frame.
+By default, when we apply a grouping with multiple factors, dplyr will keep the last level of grouping after the summary. Here, the output is still grouped by `species`. To remove grouping from a data frame, use the `ungroup()` function or the `.groups = "drop"` argument in the `summarise()` function. Both methods will allow us to continue working with the data as a regular data frame.
 
 ```{r}
 # option 1
@@ -318,6 +301,8 @@ df <- penguins |>
             .groups = "drop")
 ```
 
+Similar to what we've seen in other functions, we can create multiple summaries in a single step by separating them with commas.
+
 ## 8️⃣Sort rows and extract specific values: arrange(), slice(), pull()
 
 Sometimes, we're interested in extracting a particular value from a data frame, like finding the largest or smallest value in a column.
@@ -392,7 +377,7 @@ penguins_wide <- penguins_long |>
 ## 📚Resources
 
 | Function                | Description                                      |
-|-------------------------|-----------------------------------------------|
+|-------------------------|--------------------------------------------------|
 | `dplyr::glimpse()`      | Get a glimpse of your data                       |
 | `dplyr::select()`       | Keep or drop columns using their names and types |
 | `dplyr::filter()`       | Keep rows that match a condition                 |

diff --git a/intro-to-dplyr/intro-to-dplyr_blank.qmd b/intro-to-dplyr/intro-to-dplyr_blank.qmd
@@ -4,23 +4,21 @@ author: "Reiko Okamoto"
 date: "`r Sys.Date()`"
 format: gfm
 editor: visual
+execute:
+  echo: true
 ---
 
-```{r setup, include=FALSE}
-knitr::opts_chunk$set(echo = TRUE)
-```
-
 ## 👋Welcome to the tidyverse
 
-#### ***What is the tidyverse?***
+#### *What is the tidyverse?*
 
 The [tidyverse](https://tidyverse.tidyverse.org/) is a collection of R packages designed for data science. Arguably, two of the most popular packages in the tidyverse are [dplyr](https://dplyr.tidyverse.org/) for data manipulation and [ggplot2](https://ggplot2.tidyverse.org/) for data visualization. These are also two of the packages we are covering today!
 
-***Why learn it?***
+#### *Why learn it?*
 
 The skills you gain are not just limited to R. The concepts, like filtering data and creating plots, are applicable to other languages like SQL and Python. This makes it easier to pick up other tools and languages later. Additionally, the tidyverse is in tune with open science practices, helping you create analyses that are more accessible, transparent, and reproducible.
 
-***Keep in mind...***
+#### *Keep in mind...*
 
 There's no expectation for you to memorize everything. Even experienced programmers don't have every function memorized - they're constantly googling things! My goal today is to help you get comfortable with, and hopefully interested in, using the tidyverse for data analysis.
 
@@ -132,9 +130,9 @@ The vertical bar acts as an OR operator, meaning a row is returned if any of the
 
 3.  Filter the data to find all penguins that are either on Biscoe Island or Torgersen Island.
 
-```{r}
-# YOUR CODE HERE
-```
+    ```{r}
+    # YOUR CODE HERE
+    ```
 
 ## 4️⃣Pipes
 
@@ -206,9 +204,9 @@ By separating the new columns with a comma, we can create multiple new variables
 
 2.  Create a new column called `flipper_size` that categorizes penguins as short, average, or long based on their `flipper_length_mm`. Hint: Define short as less than 190 mm, average as between 190 and 210 mm, and long as greater than 210 mm.
 
-```{r}
-# YOUR CODE HERE
-```
+    ```{r}
+    # YOUR CODE HERE
+    ```
 
 ## 6️⃣Compute summary statistics: summarise()
 
@@ -222,14 +220,6 @@ We often need to summarize our data to understand key characteristics (e.g., mea
 
 The function creates a new data frame with a single row containing the summary statistic.
 
-💻Calculate the minimum and maximum of body mass at the same time:
-
-```{r}
-# YOUR CODE HERE
-```
-
-Similar to what we've seen in other functions, we can create multiple summaries in a single step by separating them with commas.
-
 ## 7️⃣Group by one or more variables: group_by()
 
 In data analysis, a common task is to split our data into groups, apply a function to each group, and then combine the results. This approach is known as the split-apply-combine paradigm. The [`group_by()`](https://dplyr.tidyverse.org/reference/group_by.html) function helps us achieve this by allowing us to specify how we want to split our data into groups.
@@ -248,24 +238,22 @@ A grouped data frame has all the properties of a regular data frame but has an a
 # YOUR CODE HERE
 ```
 
-💻Achieve this count using the [`count()`](https://dplyr.tidyverse.org/reference/count.html) function, which combines `group_by()` and `tally()` in one step:
-
-```{r}
-# YOUR CODE HERE
-```
+Alternatively, use the [`count()`](https://dplyr.tidyverse.org/reference/count.html) function, which combines `group_by()` and `tally()` in one step.
 
 💻Calculate the mean and standard deviation of body mass for each combination of species and sex:
 
 ```{r}
 # YOUR CODE HERE
 ```
 
-By default, when we apply a grouping with multiple factors, dplyr will keep the last level of grouping after the summary. Here, the output is still grouped by `species`. To remove grouping from a data frame, use the `ungroup()` function or the `.groups  = "drop"` argument in the `summarise()` function. Both methods will allow us to continue working with the data as a regular data frame.
+By default, when we apply a grouping with multiple factors, dplyr will keep the last level of grouping after the summary. Here, the output is still grouped by `species`. To remove grouping from a data frame, use the `ungroup()` function or the `.groups = "drop"` argument in the `summarise()` function. Both methods will allow us to continue working with the data as a regular data frame.
 
 ```{r}
 # YOUR CODE HERE
 ```
 
+Similar to what we've seen in other functions, we can create multiple summaries in a single step by separating them with commas.
+
 ## 8️⃣Sort rows and extract specific values: arrange(), slice(), pull()
 
 Sometimes, we're interested in extracting a particular value from a data frame, like finding the largest or smallest value in a column.
@@ -307,31 +295,21 @@ Imagine our observation of interest is the measurement itself, rather than the p
 💻Use the [`pivot_longer()`](https://tidyr.tidyverse.org/reference/pivot_longer.html) function to reshape the `penguins` data so that all the measurements related to the penguins are in a single column, and another column indicates what measurement type it is:
 
 ```{r}
-penguins_long <- penguins |> 
-  mutate(id = row_number()) |> 
-  pivot_longer(
-    cols = ends_with("_mm") | ends_with("_g"),
-    names_to = "measurement_type",
-    values_to = "value"
-  )
+# YOUR CODE HERE
 ```
 
 What if we want to go back to the original wide format?
 
 💻Use the [`pivot_wider()`](https://tidyr.tidyverse.org/reference/pivot_wider.html) function to reverse the process:
 
 ```{r}
-penguins_wide <- penguins_long |> 
-  pivot_wider(
-    names_from = measurement_type,
-    values_from = value
-)
+# YOUR CODE HERE
 ```
 
 ## 📚Resources
 
 | Function                | Description                                      |
-|-------------------------|--------------------------------------------------|
+|------------------------|-----------------------------------------------|
 | `dplyr::glimpse()`      | Get a glimpse of your data                       |
 | `dplyr::select()`       | Keep or drop columns using their names and types |
 | `dplyr::filter()`       | Keep rows that match a condition                 |

diff --git a/intro-to-ggplot2/intro-to-ggplot2.md b/intro-to-ggplot2/intro-to-ggplot2.md
@@ -1,6 +1,6 @@
 # Introduction to ggplot2
 Reiko Okamoto
-2024-09-13
+2024-09-16
 
 ## 🎨Introduction to ggplot2
 

diff --git a/intro-to-ggplot2/intro-to-ggplot2.qmd b/intro-to-ggplot2/intro-to-ggplot2.qmd
@@ -4,12 +4,10 @@ author: "Reiko Okamoto"
 date: "`r Sys.Date()`"
 format: gfm
 editor: visual
+execute:
+  echo: true
 ---
 
-```{r setup, include=FALSE}
-knitr::opts_chunk$set(echo = TRUE)
-```
-
 ## 🎨Introduction to ggplot2
 
 ggplot2 helps us create a wide range of static, informative, and visually appealing graphics. Its name comes from the Grammar of Graphics, which is a framework for building plots in a structured way. We can build a plot incrementally by adding layers like data points, axes, colours, and labels.
@@ -37,7 +35,7 @@ trains_df
 🧠Explore the type and description of each variable:
 
 | Variable                  | Type      | Description                          |
-|-----------------------|------------------|-------------------------------|
+|---------------------------|-----------|--------------------------------------|
 | `year`                    | double    | Year of observation                  |
 | `month`                   | double    | Month of observation                 |
 | `service`                 | character | Type of service                      |