Create and evaluate synthetic time series datasets effortlessly
Get Started โข Tutorials โข Augmentations โข Generators โข Metrics โข Datasets โข Contributing โข Citing
TSGM is an open-source framework for synthetic time series dataset generation and evaluation.
The framework can be used for creating synthetic datasets (see ๐จ Generators ), augmenting time series data (see ๐จ Augmentations ), evaluating synthetic data with respect to consistency, privacy, downstream performance, and more (see ๐ Metrics ), using common time series datasets (TSGM provides easy access to more than 140 datasets, see ๐พ Datasets ).
We provide:
- Documentation with a complete overview of the implemented methods,
- Tutorials that describe practical use-cases of the framework.
pip install tsgm
To install tsgm
on Apple M1 and M2 chips:
# Install tensorflow
conda install -c conda-forge tensorflow=2.9.1
# Install tsgm without dependencies
pip install tsgm --no-deps
# Install rest of the dependencies (separately here for clarity)
conda install tensorflow-probability scipy antropy statsmodels dtaidistance networkx optuna prettytable seaborn scikit-learn yfinance tqdm
import tsgm
# ... Define hyperparameters ...
# dataset is a tensor of shape n_samples x seq_len x feature_dim
# Zoo contains several prebuilt architectures: we choose a conditional GAN architecture
architecture = tsgm.models.architectures.zoo["cgan_base_c4_l1"](
seq_len=seq_len, feat_dim=feature_dim,
latent_dim=latent_dim, output_dim=0)
discriminator, generator = architecture.discriminator, architecture.generator
# Initialize GAN object with selected discriminator and generator
gan = tsgm.models.cgan.GAN(
discriminator=discriminator, generator=generator, latent_dim=latent_dim
)
gan.compile(
d_optimizer=keras.optimizers.Adam(learning_rate=0.0003),
g_optimizer=keras.optimizers.Adam(learning_rate=0.0003),
loss_fn=keras.losses.BinaryCrossentropy(from_logits=True),
)
gan.fit(dataset, epochs=N_EPOCHS)
# Generate 100 synthetic samples
result = gan.generate(100)
- Introductory Tutorial Getting started with TSGM
- Tutorial Datasets in TSGM
- Tutorial Time Series Augmentations
- Tutorial Time Series Generation with VAEs
- Tutorial Conditional Time Series Generation with GANs
- Tutorial Evaluation of Synthetic Time Series Data
- Tutorial Model Selection
- Tutorial Multiple GPUs or TPU with TSGM
For more examples, see our tutorials.
TSGM provides a number of time series augmentations.
Augmentation | Class in TSGM | Reference |
---|---|---|
Gaussian Noise / Jittering | tsgm.augmentations.GaussianNoise |
- |
Slice-And-Shuffle | tsgm.augmentations.SliceAndShuffle |
- |
Shuffle Features | tsgm.augmentations.Shuffle |
- |
Magnitude Warping | tsgm.augmentations.MagnitudeWarping |
Data Augmentation of Wearable Sensor Data for Parkinsonโs Disease Monitoring using Convolutional Neural Networks |
Window Warping | tsgm.augmentations.WindowWarping |
Data Augmentation for Time Series Classification using Convolutional Neural Networks |
DTW Barycentric Averaging | tsgm.augmentations.DTWBarycentricAveraging |
A global averaging method for dynamic time warping, with applications to clustering. |
TSGM implements several generative models for synthetic time series data.
Method | Link to docs | Type | Notes |
---|---|---|---|
Structural Time Series | sts.STS | Data-driven | Great for modeling time series when prior knowledge is available (e.g., trend or seasonality). |
GAN | GAN | Data-driven | A generic implementation of GAN for time series generation. It can be customized with architectures for generators and discriminators. |
WaveGAN | GAN | Data-driven | WaveGAN is the model for audio synthesis proposed in Adversarial Audio Synthesis. To use WaveGAN, set use_wgan=True when initializing the GAN class and use the zoo["wavegan"] architecture from the model zoo. |
ConditionalGAN | ConditionalGAN | Data-driven | A generic implementation of conditional GAN. It supports scalar conditioning as well as temporal one. |
BetaVAE | BetaVAE | Data-driven | A generic implementation of Beta VAE for TS. The loss function is customized to work well with multi-dimensional time series. |
cBetaVAE | cBetaVAE | Data-driven | Conditional version of BetaVAE. It supports temporal a scalar condiotioning. |
TimeGAN | TimeGAN | Data-driven | TSGM implementation of TimeGAN from paper |
SineConstSimulator | SineConstSimulator | Simulator-based | Simulator-based synthetic signal that switches between constant and periodics functions. |
Lotka Volterra | LotkaVolterraSimulator | Simulator-based | Simulator-based synthetic signal that switches between constant and periodics functions. |
PdM Simulator | PdMSimulator | Simulator-based | Simulator of predictive maintenance with multiple pieces of equipment from paper |
TSGM implements many metrics for synthetic time series evaluation. Check Section 3 from our paper for more detail on the evaluation of synthetic time series.
Metric | Link to docs | Type | Notes |
---|---|---|---|
Distance in the space of summary statistics | tsgm.metrics.DistanceMetric | Distance | Calculates a set of summary statistics in the original and synthetic data, and measures the distance between those. |
Maximum Mean Discrepancy (MMD) | tsgm.metrics.MMDMetric | Distance | This metric calculated MMD between real and synthetic samples |
Discriminative Score | tsgm.metrics.DiscriminativeMetric | Distance | The DiscriminativeMetric measures the discriminative performance of a model in distinguishing between synthetic and real datasets. |
Demographic Parity Score | tsgm.metrics.DemographicParityMetric | Fairness | This metric assesses the difference in the distributions of a target variable among different groups in two datasets. Refer to this paper to learn more. |
Predictive Parity Score | tsgm.metrics.PredictiveParityMetric | Fairness | This metric assesses the discrepancy in the predictive performance of a model among different groups in two datasets. Refer to this paper to learn more. |
Privacy Membership Inference Attack Score | tsgm.metrics.PrivacyMembershipInferenceMetric | Privacy | The metric measures the possibility of membership inference attacks. |
Spectral Entropy | tsgm.metrics.EntropyMetric | Diversity | Calculates the spectral entropy of a dataset or tensor as a sum of individual entropies. |
Shannon Entropy | tsgm.metrics.ShannonEntropyMetric | Diversity | Shannon Entropy calculated over the labels of a dataset. |
Pairwise Distance | tsgm.metrics.PairwiseDistanceMetric | Diversity | Measures pairwise distances in a set of time series. |
Downstream Effectiveness | tsgm.metrics.DownstreamPerformanceMetric | Downstream Effectiveness | The downstream performance metric evaluates the performance of a model on a downstream task. It returns performance gains achieved with the addition of synthetic data. |
Qualitative Evaluation | tsgm.utils.visualization | Qualitative | Various tools for visual assessment of a generated dataset. |
Dataset | API | Description |
---|---|---|
UCR Dataset | tsgm.utils.UCRDataManager |
https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/ |
Mauna Loa | tsgm.utils.get_mauna_loa() |
https://gml.noaa.gov/ccgg/trends/data.html |
EEG & Eye state | tsgm.utils.get_eeg() |
https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State |
Power consumption dataset | tsgm.utils.get_power_consumption() |
https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption |
Stock data | tsgm.utils.get_stock_data(ticker_name) |
Gets historical stock data from YFinance |
COVID-19 over the US | tsgm.utils.get_covid_19() |
Covid-19 distribution over the US |
Energy Data (UCI) | tsgm.utils.get_energy_data() |
https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction |
MNIST as time series | tsgm.utils.get_mnist_data() |
https://en.wikipedia.org/wiki/MNIST_database |
Samples from GPs | tsgm.utils.get_gp_samples_data() |
https://en.wikipedia.org/wiki/Gaussian_process |
Physionet 2012 | tsgm.utils.get_physionet2012() |
https://archive.physionet.org/pn3/challenge/2012/ |
Synchronized Brainwave Dataset | tsgm.utils.get_synchronized_brainwave_dataset() |
https://www.kaggle.com/datasets/berkeley-biosense/synchronized-brainwave-dataset |
TSGM provides API for convenient use of many time-series datasets (currently more than 140 datasets). The comprehensive list of the datasets in the documentation
We appreciate all contributions. To learn more, please check CONTRIBUTING.md.
git clone github.com/AlexanderVNikitin/tsgm
cd tsgm
pip install -e .
Run tests:
python -m pytest
To check static typing:
mypy
We provide two CLIs for convenient synthetic data generation:
tsgm-gd
generates data by a stored sample,tsgm-eval
evaluates the generated time series.
Use tsgm-gd --help
or tsgm-eval --help
for documentation.
If you find this repo useful, please consider citing our paper:
@article{
nikitin2023tsgm,
title={TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series},
author={Nikitin, Alexander and Iannucci, Letizia and Kaski, Samuel},
journal={arXiv preprint arXiv:2305.11567},
year={2023}
}