Unfortunately, there is no one answer to this question. Setting the window size essentially specifies the region of interest along the time axis, where the algorithm looks for changes. The naive solution is to compute multiple window sizes and scan through the all maxima. Alternatively, try to get access to domain knowledge. If you know in which time frame your changes are normally visible in the time series, we recommend setting the window size a little larger than the duration of your change (1.5x or 2x). If you have periodic signals, you could use the FFT or cross-correlation function and set the window size to a multiple of it for checking frequency changes or using the periodicity directly for finding abnormal periods.
While this is also a challenging question, there is literature for finding anomalies from anomaly scoring. These approaches also apply to our problem. Additionally, you might want to take a look at what packages like PyOD and PyThresh recommend.
For the algorithms that build up on the decomposition of time series Hankel matrices (like SST and ESST), we created an signficantly more efficient algorithm (see the code for the paper for reference).
This efficient algorithm can be used when specifying the option use_fast_hankel = True
in the
SST and
ESST. Empirically, we saw
improvements for window size larger than 200 for the SST and larger than 400 for the ESST. These boundaries depend on
your system parameters, like available libraries and cores. For Window sizes smaller than this threshold, the naive
implementation is typically faster. For tiny window size (~20) the naive implementation will outperform the
efficient algorithm by around a magnitude.
Other than that, you can always use the multiprocessing module in python, which works well for the methods in this package. Please refrain from using the threading module due to the Global interpreter lock and our JIT compiler.