-
Notifications
You must be signed in to change notification settings - Fork 1
/
Meeting2Notes.Rmd
232 lines (157 loc) · 5.66 KB
/
Meeting2Notes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
---
title: "Meeting 2: R and Ebola"
author: "Brooke Anderson"
date: "November 14, 2014"
output: ioslides_presentation
---
## Last time, we read in our data
```{r}
## If necessary, use setwd() to get to the right directory
ebola <- read.table("country_timeseries.csv", sep = ",",
header = TRUE)
ebola[1:3, 1:5]
```
## Today's plan
- Dataframes and vectors
- Subsetting
- Functions
- Example of a function-- the plot function
# Dataframes and vectors
## Dataframes and vectors
A vector is a string of values:
Example 1: Start of the vector of the dates when Ebola cases were reported
```{r, echo = FALSE}
as.character(ebola$Date[1:10])
```
Example 2: Start of the vector of number of cases reported in Guinea
```{r, echo = FALSE}
ebola$Cases_Guinea[1:10]
```
## Dataframes and vectors
You can make a new vector using the concatenation vector, `c(...)`:
```{r}
x <- c(1, 5, 7, 9, 10)
x
class.names <- c("Taylor", "Maggie", "Mimi", "Brianna", "Jon")
class.names
```
## Dataframes and vectors
A dataframe is made up of a lot of vectors stuck together
*(Notice how each column is a vector)*
```{r, echo = FALSE}
ebola[1:10, 1:5]
```
## Dataframes and vectors
You can make a new dataframe using the function `data.frame()`:
```{r}
class.data <- data.frame(name = class.names,
number = x)
class.data
```
# Subsetting
## Subsetting
You can use indexing (`[...]`, `[..., ...]`) to subset from a vector or dataframe, like:
```{r, eval=FALSE}
vector[locations] ## Generic code
dataframe[row locations, column locations] ## Generic code
```
## Subsetting
A vector has one dimension, so you index without a comma (i.e., in one dimension):
```{r}
class.names[1]
class.names[c(2, 3, 4)] ## Equivalent: class.names[2:4]
```
## Subsetting
A dataframe has two dimensions (rows and columns), so you index with a column:
```{r}
class.data[1,1]
class.data[1:3, 1:2]
```
## Subsetting
To get all values in a dimension (row or column), leave that part of the index blank:
```{r}
class.data[1, ]
class.data[ , 1]
```
## Subsetting
For columns, you can use column names instead of location:
```{r}
class.data[3:4, "number"]
class.data[3:4, c("name", "number")]
```
## Subsetting
You can also pull a column (vector) from a dataframe using `$`, like:
```{r, eval=FALSE}
dataframe$column.name ## Generic code
```
For example, to get the column of `ebola` with cases from Guinea:
```{r}
head(ebola$Cases_Guinea)
```
*Note: I've used `head` to look at just the start of the vector since the whole thing would be really long.*
## Now you try...
Try to get the following vectors from the dataset:
- Date
- The ten most recent counts of cases in the US
- The earliest twenty counts of deaths in Liberia
- A dataframe of the first five observations of date, cases in Mali and deaths in Mali
*Hint: Try using `colnames(ebola)` to find out the names of all the columns in `ebola`. Also, use `dim(ebola)` to find out the dimensions of the dataframe so you can get the index numbers right for the latest ten data points.*
# Functions
## Functions
In general, functions in R take the following structure:
```{r, eval = FALSE}
function.name(required information, options) ## Generic code
```
The result of the function will be output to your R session, unless you choose to save the output in an object:
```{r, eval = FALSE}
new.object <- function.name(required information, options) ## Generic code
```
## Functions
Examples of this structure:
```{r, eval = FALSE}
head(ebola)
head(ebola, n = 3)
ebola <- read.table("country_timeseries.csv", sep = ",",
header = TRUE)
```
Find out more about a function by using `?function.name`. This will take you to the help page for the function, where you can find out all the possible arguments for the function, required and optional.
# Example of a function
## The `plot` function
The plot function has two required arguments: the x coordinates of points in the plot, and the y coordinates of points in the plot. The generic structure is:
```{r, eval=FALSE}
plot(x = x coordinates, y = y coordinate) ## Generic code
```
As long as you put the x coordinates first and the y coordinates second, you can leave out the `x = ` and `y = `:
```{r, eval = FALSE}
plot(x coordinates, y coordinate) ## Generic code
```
## The `plot` function
```{r, fig.width = 4.5, fig.height = 3.5, fig.align = "center"}
x <- c(1, 2, 3)
y <- c(4, 5, 6)
plot(x, y)
```
## The `plot` function
Now that you know how to pull out two vectors you want from the ebola dataset, you can plot them:
```{r, fig.height = 4, fig.width = 5, fig.align = 'center'}
plot(ebola$Day, ebola$Cases_Guinea)
```
## The `plot` function
The plot function also has *many* optional arguments (check `?plot`, `?plot.default`). For example,
- `type`: What do you want to plot? Points (`"p"`)? Lines (`"l"`)?
- `main`: Give a title to your plot (`"My title`")
- `xlab`, `ylab`: Give nicer labels to your x- and y-axes (`xlab = "Day in Ebola data collection"`)
- `xlim`, `ylim`: Specify the range of your x- and y- axes (`xlim = c(0, 100)`)
## The `plot` function
```{r, fig.height = 4, fig.width = 5, fig.align = 'center'}
plot(ebola$Day, ebola$Cases_Guinea, main = "Guinea ebola cases",
xlab = "Day in ebola data collection", ylab = "# of cases",
type = "l", lwd = 2, col = "gray")
```
## Now you try...
Try plotting:
- Deaths in Liberia by day
- Mortality rate in Liberia by day
- Deaths in Liberia by date
Experiment with options like `type`; `col`, `pch` and `cex` (when you're plotting points, `type = "p"`), `lwd` (when you're plotting lines, `type = "l"`), `main`, `sub`, `xlim`, and `ylim`.
*Hint: Try using `colnames(ebola)` to find out the names of all the columns in `ebola`.*