Skip to content

Latest commit

 

History

History
16 lines (10 loc) · 3.7 KB

codebook.md

File metadata and controls

16 lines (10 loc) · 3.7 KB

Codebook for the Course Project

by Sergei Ryazansky

Study design

According to the supplementary to data arhive files (README.txt and features_info.txt from UCI HAR Dataset), the dataset is the result of the measurments of six types of activities (named as WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, and LAYING) of 30 adult volunteers. The measurments of each person (subject) were performed by accelerometer and gyroscope embedded to smartphone (Samsung Galaxy S II) on their waists. As a resut, 3-axial linear acceleration and 3-axial angular velocity have been obtained and then randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.

The signals of accelerometer and gyroscope were cleaned off by noise filter followed by sampling in 2.56 sec sliding windows (128 measurments per window with 50% overlap in the nearby windows). The acceleration signal was future separated by low-pass filter into body motion and gravitional components. Finally, the set of metrics was caluclated for each time window from the time and frequency domains (mean, standard deviation, median, maximum and minimum values etc). All metrircs were normalized and bounded from -1 to 1.

In the current Course Project the additional proccessing was performed (run_analysis.R). First, train and test parts of the data were merged. Second, only mean and standard deviation metrics were remained. Third, for each of these metrics the measurment values across the time windows for each subject and type of activity were aggregated and collapsed to their averages. So, the units of the final tidy dataset (UCI_dataset.processed.txt) are normalized (within [-1, 1] interval) time and frequency domains' metrics for linear accelaration and angular velocity.

Codebook

Variable names in the output tidy dataset reflect the type of the corresponding feature and contain several parts encoding the exact origin of the computed values. All names begin with time or frequency prefix indicating the time and frequency domains, respectively. The variables that comes from accelerometer (that measured the linear acceleration) and gyroscope (that measured the angular velocity) contain the Accelerometer and Gyroscope parts in the names. By turn, the Body and Gravity parts related to the corresponding components of the linear acceleration measures (BodyAccelerometer and GravityAccelerometer in the variable names) while angular velocity has only the body component (BodyGyroscope). Please note, that frequency domains were computed only for the body parts of the linear acceleration and angular velocity. Subsequently, the body components of the linear acceleration and angular velocity were derived in time to obtain Jerk signals (BodyAccelerometerJerk and BodyGyroscopeJerk). The magnitudes of these three-dimensional signals were calculated using the Euclidean norm (BodyAccelerometerMagnitute, BodyAccelerometerJerkMagnitute, GravityAccelerometerMagnitude, BodyGyroscopeMagnitute, BodyGyroscopeJerkMagnitude). Finally, one of XYZ letter in the end of the variable names is used to denote the dimension of the 3-axial signals in the X, Y and Z directions.

The type of metrics, the average values of which across all time windows were used to obtain the final dataset, are marked as Mean and Std. Please also note, that there are seven variables named angle(...) exist in the input dataset contained Mean in their names. These variables were are not included in the output tidy dataset since actually they are not corresponding to Mean itself but to the angle metric between different types of vectors.