-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy path1-5_Common_Objects.qmd
286 lines (214 loc) · 10.3 KB
/
1-5_Common_Objects.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
# Common Objects
**Authors: Sten Willemsen, Karl Brand and Elizabeth Ribble**
# Objects in R
## Introduction
All the things we work with in R (such as the numbers in the calculations in the previous section) are called **objects**. All objects have a **mode** and a **class**. The **mode** describes how the data is stored in **R** and how it can be used. Text variables for example have to be handled differently than numbers, we cannot multiply two words.
Elementary objects can be combined with each other to form more complex objects this leads to several types of containers, like `lists` and `data.frames`
The **class** of an object is an attribute that can be used to further specify how an object is used in **R**. For example it can be used to indicate how it should be printed and plotted. For many objects the class will simply be equal to the mode.
Functions are special objects that can do things with other objectsl, for example the `print` function displays the contents of an object in the console. Functions are the topic of an other chapter but because we obviously want to do something with the various objects we will see we cannot avoid them altogether.
## The elementary data types
The simplest variables just have a single value of a certain data type (Or mode^[`mode`, `storage.mode` and `type` are closely related concepts, we will not discuss the differences here. See also `?mode`.]). The most important data types in R are:^[There are a few more like `complex` and `raw`
which we will not discuss.]:
------------ ---------------------------------------------------
mode description
------------ ---------------------------------------------------
numeric: Posibly fractional numerical values like 1.0, 1.2 or 1e12 (that is 10 raised to the power of 12)
character: text, for example 'man', 'woman', 'censored', etc.
logical: TRUE and FALSE
-------------------------------------------------------------------
Let's examine variables of these data types in a bit more detail:
### Numbers
```{r}
mode(1)
# for most basic data types the class is the same as the mode
class(1)
class(2.14)
# functions that start with as.* do conversion
an_integer <- as.integer(2.14)
# integers are special numeric values that only store whole
# numbers
an_integer
class(an_integer)
mode(an_integer)
# convert to numeric again
back_to_numeric <- as.numeric(an_integer)
class(back_to_numeric)
class(1L) # explicit integer
```
### Text
```{r}
# character values should be surrounded by quotes "or '
class("a")
class('a')
mode("a")
# even numbers in quotes are interpreted as text
class("1")
mode("1")
```
### Logical
It can only be a **yes** or a **no**. More specifically, a `TRUE` or a `FALSE`.
```{r}
class(TRUE)
class(FALSE)
mode(TRUE)
```
Often logical values are he result of comparisons:
| Operator | Meaning |
|----------|--------------------------|
| == | Equal to |
| != | Not equal to |
| > | Greater than |
| < | Smaller than |
| >= | Greater than or equal to |
| <= | Smaller than or equal to |
Logival values can be combined using `&` (and) and `|` or, and inverted using `!` (not).
```{r}
a <- c(TRUE, TRUE, FALSE, FALSE)
b <- c(TRUE, FALSE, TRUE, FALSE)
a & b
a | b
!a
```
## Vectors
Vectors of these data types are the most elementary data structure in R. All other structures (like the `data.table`) are constructed using these vectors. In R there is also no structure that is smaller than a vector. A single number is not treated differently from a numeric vector of length ten; In fact R sees the single number simply as a numeric vector of length 1. The `length()` of a vector can be obtained by using the function `length()`.
A vector can be created using the function `c()`. (The `c` stands for 'concatenate', 'coerce' or 'combine')
```{r}
c(1, 2, 3)
c('spam', 'ham', 'eggs')
c("double", "quotes", "work",
'like', 'single')
c(TRUE, FALSE)
length(c(25, 4))
```
In the output we see that R shows the row number of the first element of each row between straight brackets. This makes it easier to refer to a particular element, especially when the vectors are long. We can work with vectors in the same way as with single numbers. In principle all operations are carried out in an element wise fashion.
```{r}
c(1, 2, 3) * c(4, 5, 6)
```
Note that when the lengths do not match they are **recycled**.
```{r}
c(1, 2, 3, 4) * c(4, 5)
# if the larger length is not an exact multiple
# of the smaller this often indicates a mistake
# and a warning is given
c(1, 2, 3) * c(4, 5)
```
We can also give names to the elements of a vector:
```{r}
named_v <- c(foo=1, bar=2)
print(named_v)
mode(named_v)
class(named_v)
```
When we try to create a vector that consists of different data types they will be converted to a data type that is capable of containing all of them. For example:
```{r}
c("eleven", 12)
```
The second element of the resulting vector is now also of type `character`. In general it is better not to trust this implicit conversion. Instead to it explicitly, in this case by using the function `as.character()`.
An other way to create a vector is by using the function `vector`. `vector('numeric', 8)` creates a numeric vector of length 8. The `vector` function is often used to pre-allocate room where the results of future computations can be stored.
## Matrices
Vectors have just a single dimension (every element is characterized by a single number (index) that indicates its position within the vector). They can be generalized to vectors that are two dimensional, that is
they have both rows and columns. Like simple vectors all elements of a matrix have the same \emph{mode}.
```{r}
my_matrix <- matrix(data = c(1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12))
my_matrix
dim(my_matrix) # second dimension (# columns) is 1
matrix2 <- matrix(data = c(1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
nrow = 3)
matrix2
dim(matrix2)
char_mat <- matrix(data = c('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'), nrow = 2)
class(char_mat)
mode(char_mat)
length(char_mat) #just counts the number of elements
```
### Arrays
We do not have to stop with two dimensions. Arrays are even more general than matrices as they can have any number of dimensions. All elements have to be of the same type.
```{r}
letters[1:12]
my_array <- array(letters[1:12], c(2, 2, 3))
my_array
class(my_array)
mode(my_array)
```
## Lists
Elements of a vector, matrix or array are always of the same type. A `list` differs from a vector by also allowing its elements to be of a different type. We can make a `list` using the function `list`.
```{r}
list1 <- list("eleven", 12)
mode(list1)
class(list1)
list2 <- list(c(1, 2, 3), c('foo', 'bar'))
```
We can also assign a name to the elements of a list:
```{r}
list3 <- list(numbers=c(1, 2, 3), chars=c('foo', 'bar'))
```
It is also possible for a list to contain other lists.
```{r}
list4 <- list(numbers=c(1, 2, 3), chars=c('aap', 'noot'),
sublist=list(1,'a'))
```
A useful function for lists is `str`. It gives us its structure.
```{r}
str(list4)
```
### data.frames
This is a rectangular table in which every column contains a variable and every row an observation. They are similar to the spreadsheets: a
series, or `list` of equal length columns, or `vectors`. Notably, the columns (`vector`s) can be of different
classes, **unlike** a matrix or array.
```{r}
my_df <- data.frame("vec" = c(12, 48), "lets" = letters[1:12])
my_df
dim(my_df)
class(my_df)
mode(my_df) # a data.frame is a special kind of list
str(my_df)
```
Most data sets come in the form of `data.frame`s or can be converted to them so we are going to see them frequently later in the course.
## factors
A `factor`is a special kind of vector for categorical data. The factor contains different integers one for every category. Each unique value has an associated 'label' that tell us what the code means. Factors are frequently used when we model categorical data. An advantage of a factor over a `character` is that we can limit the number of possible outcomes. It is also less likely to make mistakes due to typing errors. Factors can be created by means of the function `factor()`.
```{r}
f <- c('male', 'female', 'male')
factor(f)
levels(f)
```
When a `factor` is displayed R also shows us the unique values the variable can take. These are called the 'levels' of the factor.
## functions
Functions are used to do something. We have already seen several of them.
Like `mode`, `class` and `as.integer` and you will see lots more. Note that in **R** functions are objects too.
```{r}
mode(mode) # like all objects functions have a mode
class(mode)
print(mode)
# do not worry if you do not understand the meaning of what is printed yet
```
Note that operators are functions to. When you want to use them in the same way as normal functions just put them between back-ticks ("`"):
```{r}
mode(`+`)
`+`(1, 1)
```
Base **R** has many factions but many more are come from various extension packages. These can be installed using `install.packages()` :
```{r}
#| eval: false
install.packages("packageName",
lib = "/directory/to/my custom R library",
repos = "http://cran.xl-mirror.nl")
# usually lib and repos can be omitted (left at the default)
```
The package name must be quoted when installing.
```{r}
#| eval: false
library("packageName") ## quotes are optional when loading a package
```
Functions will be discussed in more detail later.
## Missing values
Whenever the value of a variable is missing this is denoted by `NA` in R. Usually this means that the values exists however we do not know it. Sometimes the result of a calculation is not finite (for example when we define a positive or negative number by zero). In this case the result is defined to be `Inf` of `-Inf` in R. When a value cannot be computed at all (for example when we divide zero by zero) R will define the result as `NaN`, which stands for 'Not a Number'. Finally, R sometimes uses the special value `NULL` to indicate that a variable is not yet defined. Here we will mostly deal with data that is just missing, that is `NA`.
```{r}
a <- c(1, -1, 0, NA, NULL)
a/0
is.na(a)
is.finite(a)
is.null(a) # note this looks at the whole object
l <- list(foo= a, b=c('b'))
l[1] <- NULL # deletes elements from a list
l
```