forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
105 lines (89 loc) · 3.53 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
title: 'Reproducible Research: Peer Assessment 1'
author: "JE-"
date: "November 16, 2014"
output:
html_document:
fig_caption: yes
keep_md: yes
---
```{r, echo=FALSE}
rm(list=ls())
setwd("C:/Main/Data Science (Coursera 2014)/Reproducible Research/RepData_PeerAssessment1")
library(ggplot2)
library(reshape2)
library(lattice)
##-------------------------------------------------------------------
```
## Loading and preprocessing the data
```{r}
unzip("activity.zip")
data <- read.csv("activity.csv",colClasses = c("integer", "Date", "integer"))
```
## What is mean total number of steps taken per day?
- Make a histogram of the total number of steps taken each day
```{r}
totalSteps <- tapply(data$steps,data$date,sum)
hist(totalSteps, main="Histogram of total number of steps per day",
xlab="Date",col=c("blue"))
```
- Calculate and report the mean and median total number of steps taken per day
```{r}
meanSteps <- mean(totalSteps,na.rm=T)
medianSteps <- median(totalSteps,na.rm=T)
meanSteps
medianSteps
```
The mean total number of steps taken per day is `r meanSteps`, and the median total number of steps taken per day is `r medianSteps`.
## What is the average daily activity pattern?
- Make a time series plot (i.e. type = "l") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)
```{r}
stepsByInterval<-tapply(data$steps,data$interval,mean,na.rm=T)
plot(stepsByInterval,type = 'l',xlab = "5-minute interval", ylab = "Average steps taken")
```
- Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?
```{r}
maxInterval <- names(stepsByInterval[which.max(stepsByInterval)])
```
The **`r maxInterval`th** interval contains the maximum number of steps across all days.
## Imputing missing values
```{r}
sumMissing <- sum(is.na(data$steps))
```
The total number of missing values is **`r sumMissing`**.
```{r}
data2 <- data
indexes = as.integer(rownames(stepsByInterval))
for (i in 1:dim(data2)[1]) {
if (is.na(data2[i,][1])) {
data2[i,][1] <- stepsByInterval[which(data2$interval[i] == indexes)]
}
}
head(data2)
sum(is.na(data2$steps))
totalSteps2 <- tapply(data2$steps,data2$date,sum)
hist(totalSteps2, main="Histogram of total number of steps per day (no missing cases)",
xlab="Date",col=c("blue"))
```
- Calculate and report the mean and median total number of steps taken per day
```{r}
meanSteps2 <- mean(totalSteps2,na.rm=T)
medianSteps2 <- median(totalSteps2,na.rm=T)
meanSteps2
medianSteps2
```
The mean total number of steps taken per day is `r meanSteps`, and the median total number of steps taken per day is `r medianSteps`.
The mean number of steps taken per day stayed the same (because we've imputed the mean value for each missing day).
The median value changed a bit, and is now equal to the mean.
## Are there differences in activity patterns between weekdays and weekends?
```{r}
data2$dayNames <- factor(format(data2$date, "%A"))
levels(data2$dayNames) <- list(weekday = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday"),
weekend = c("Saturday", "Sunday"))
AvgStepsByInterval <- aggregate(data2$steps, list(interval = as.numeric(as.character(data2$interval)),
dayNames = data2$dayNames), FUN = "mean")
names(AvgStepsByInterval)[3] <- "meanOfSteps"
xyplot(AvgStepsByInterval$meanOfSteps ~ AvgStepsByInterval$interval | AvgStepsByInterval$dayNames,
layout = c(1, 2), type = "l",
xlab = "Interval", ylab = "Number of steps")
```