title | author | date | output |
---|---|---|---|
Getting and Cleaning Data - Course Project |
Sreekanth Jujare |
Sunday, June 21, 2015 |
html_document |
Data for this course is acquired from: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
Data was generated as part of the project: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. A Public Domain Dataset for Human Activity Recognition Using Smartphones. 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24-26 April 2013.
- Variable names with 't' prefix indicate measurements captured in time domain.
- Variable names with 'f' prefix indicate measurements captured in frequency domain.
- characters 'X', 'Y' and 'Z' in the variable names indicate the measurement along X, Y and Z axes respectively.
- Acc in the names stand for Accelerometer measurement
- Gyro in the names stand for Gyroscope measurement
- Mag in the names stand for Magnetometer measurement
- Jerk in the names stand for Jerk signals
- mean in the names indicate that the measurement is mean value.
- std in the names indicate that the measurement is standard deviation value.
- Subject indicates the identity of the subject involved in the observation.
- Activity indicates the activity performed in the observation.
- All variables except 'Subject' and 'Activity' are of numeric type. These are averages of normalized values which are bounded within [-1, 1].
- Subject is of integer type which indicates the identity of the subject involved in the observation.
- Activity is of Factor type which indicates the type of activity being performed in the observation.
Following tasks were performed in the run_analysis.R script:
- Merged the training and the test sets to create one data set.
- Extracted only the measurements on the mean and standard deviation for each measurement.
- Used descriptive activity names to name the activities in the data set
- Appropriately labeled the data set with descriptive variable names.
- Generated a second, indepedent tidy data set with the average of each variable for each activity and each subject.
Step by Step description of code:
- Loaded data from files "activity_labels.txt" and "features.txt".
- Selected only those columns that represented mean or standard deviation measurements from the data of "X_train.txt" file.
- Loaded data from "y_train.txt" file.
- Loaded data from "subject_train.txt" file.
- Generated descriptive activity labels from the data of "activity_labels.txt" and "y_train.txt"
- Clubbed the columns of the data generated in steps 2, 4 and 5 to generate trainset dataset.
- Repeated Steps 2-6 on testing data to generate testing dataset.
- Merged training and testing datasets to generate first complete dataset.
- Generated a second, indepedent tidy data set with the average of each variable for each activity and each subject.