Toy Example

Data

Let’s imagine that we have the following time series.

Block Cross Validation

Block cross-validation works as follows: we divide the whole time series into equal blocks counting some attributes (target and lag) before doing so. In the case of two blocks we will get the following picture:

After that we need to generate statistics using the tsfresh library. To get the statistic, we need to apply some transformation to the whole time series. To do this, we will take the last few values of the time series (window) at each time point, and apply the transformations to them. In this way we will get a large number of new features.

This is done using the function extraction_utils.bcv_extract_features().

Stats Selection

Then we need to understand which of the generated statistics are really important. To do this, we will combine the information for all blocks into one table, and measure its statistical significance for each feature. After that, we leave only uncorrelated relevant features with the highest p_value.

Statistical significance is obtained using method selection_utils.get_stats(), and the selection of the best uncorrelated features using selection_utils.stats_select_features().

Importance Selection

On the selected features and built blocks, we can train models and calculate the importance of each feature. Then we will take the features with the highest importance until we get 80% of all importance.

We got the desired set!

Remarks

To take into account the context (the presence of other time series, see the second part of the Algorithm)

To see what data tsfresh generates and how it is transformed during the selection - see the toy_example_notebook