Prepare CharacterTrajectories data set

ℹ️ The Time Series Classification repository has adopted this version of the data set. The data can be downloaded from here. This repository is archived to preserve the audit trail.

Prepare the CharacterTrajectories data set for modelling. The code:

removes padding
handles cases with inconsistent channel lengths
splits the data into the same training/test splits used in the Time Series Classification (TSC) repository version of the data set
saves the output in aeon .ts format

This resolves issues time-series-machine-learning/tsml-repo#92 and aeon-toolkit/aeon#853.

Instructions

Download the UCI version of the data from here, extract and save in data\uci
Set up a Python virtual environment with the dependencies in requirements.txt
Run process_data.py to save CharacterTrajectories_TRAIN.ts and CharacterTrajectories_TEST.ts in the out\ directory.

The file paths can be changed in constants.py.

Handling cases with inconsistent channel lengths

39 cases have shorter x or y channels after removing padding. The shorter channels have a final observation in the range 1e-12 to 1e-14. The data was differentiated therefore small x and y values correspond to a broadly stationary pen after drawing the character.

It is hypothesised that the smoothing carried out prior to differentiation resulted in final values that were zero (or small enough for numeric underflow). These shorter channels have been filled with trailing zero values such that all channels are the same length.

Training and test splits

The out/train_indices.csv and out/test_indices.csv files include the UCI index of each TSC training/test case. These are used to split the data consistently with the TSC repository.

The first (non-zero) x, y and z observation for each training/test case was compared with each UCI case. A match was found where a) the class label was the same and b) the value of each channel was within 1e-6 to allow for rounding differences. To reproduce the UCI indices:

Download the Univariate aeon formatted ts files from here, extract and save the contents of the Multivariate_ts\CharacterTrajectories\ directory to data\aeon
Run infer_indices.py

Licence

Made available under the MIT License. See the above repositories for licensing of the original data.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
out		out
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
constants.py		constants.py
helpers.py		helpers.py
infer_indices.py		infer_indices.py
process_data.py		process_data.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prepare CharacterTrajectories data set

Instructions

Handling cases with inconsistent channel lengths

Training and test splits

Licence

About

Releases

Languages

License

philipdarke/chartrajs-data

Folders and files

Latest commit

History

Repository files navigation

Prepare CharacterTrajectories data set

Instructions

Handling cases with inconsistent channel lengths

Training and test splits

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Languages