ℹ️ The Time Series Classification repository has adopted this version of the data set. The data can be downloaded from here. This repository is archived to preserve the audit trail.
Prepare the CharacterTrajectories data set for modelling. The code:
- removes padding
- handles cases with inconsistent channel lengths
- splits the data into the same training/test splits used in the Time Series Classification (TSC) repository version of the data set
- saves the output in aeon
.ts
format
This resolves issues time-series-machine-learning/tsml-repo#92 and aeon-toolkit/aeon#853.
- Download the UCI version of the data from here, extract and save in
data\uci
- Set up a Python virtual environment with the dependencies in
requirements.txt
- Run
process_data.py
to saveCharacterTrajectories_TRAIN.ts
andCharacterTrajectories_TEST.ts
in theout\
directory.
The file paths can be changed in constants.py
.
39 cases have shorter x or y channels after removing padding. The shorter channels have a final observation in the range 1e-12 to 1e-14. The data was differentiated therefore small x and y values correspond to a broadly stationary pen after drawing the character.
It is hypothesised that the smoothing carried out prior to differentiation resulted in final values that were zero (or small enough for numeric underflow). These shorter channels have been filled with trailing zero values such that all channels are the same length.
The out/train_indices.csv
and out/test_indices.csv
files include the UCI index of each TSC training/test case. These are used to split the data consistently with the TSC repository.
The first (non-zero) x, y and z observation for each training/test case was compared with each UCI case. A match was found where a) the class label was the same and b) the value of each channel was within 1e-6 to allow for rounding differences. To reproduce the UCI indices:
- Download the Univariate aeon formatted ts files from here, extract and save the contents of the
Multivariate_ts\CharacterTrajectories\
directory todata\aeon
- Run
infer_indices.py
Made available under the MIT License. See the above repositories for licensing of the original data.