Introduction
Overview
Hereinafter, by ts-is-fresh we will mean the idea of combining tsfresh, block cross validation, and
feature importance selection.
The ts-is-fresh combines the automatic search for significant features, which is very important
for HFT. In a situation where a huge number of trades occur every second, and with the rapidly
changing market, it is impossible to “hand-assess” the situation. It is necessary to build systems, which are able to
select the important information and use it for increasing the accuracy of forecasts. Since tsfresh can calculate
a huge number of features and many of them take quite a long time to calculate, an additional selection of features
from tsfresh has been implemented to be used within the ts-is-fresh algorithm.
Also, thanks to the block cross validation ts-is-fresh pays attention not only to the latest changes in the time
series, but also to the market behavior over the whole time range. Broadly speaking, block cross validation evenly
divides the whole time series into blocks, at each block different statistics (p_values, feature_importance,
metrics, etc.) are counted, and then these indicators are averaged. Thanks to this technique, we do not focus our
forecasting only on the last values of the time series. The block cross validation scheme takes into account the
entire structure of the time series, does not change the sequence of events, and avoids data leaks.
To learn more about how block cross-validation works, see Algorithm.
What problem does ts-is-fresh solve?
ts-is-fresh is built to construct new features for predicting cryptocurrency prices on exchanges.
Because of the high frequency of trading in this area, the built solution must work very quickly and not require
manual debugging. For this reason it is necessary not only to build additional features, which will help increase
the accuracy of the predictions, but also to limit their size! We can’t afford a long inference of models,
nor a long learning process.
How does it solve this problem?
It was decided to generate a large number of statistical features, then train a gradient-boosting model and leave only
the most important features. From time to time it will be necessary to train the models on a large number of features
to understand which subset of the features is now the most useful. But once we have selected the most useful features,
we can train the models for a long time on only that set of features. Because we chose XGBR model, we have the
ability to select features by their importance values as well.
And why exactly in this way?
Let’s understand what the solutions are in general:
A) smart feature engineering: using domain knowledge, important features are created by hand, over which a simple (e.g., linear) model is then built
easy to further train on-line
it’s interpretable
very fast model inference
domain knowledge is needed
B) semi-automatic feature engineering: using some heuristics, different kinds of statistics (medians, quantiles, etc.) are computed, over which then treebased models are built
less demanding of domain knowledge (because of the use of a more complex model, we can afford to build less expressive features)
high expressive power
fast model inference
cannot be quickly retrained on-line
it’s uninterpretable
C) statistical autoregressive approach: models like Arima, Prophet, etc.
fast model inference
correct selection of hyperparameters is necessary to build a good model
D) RNN-like approaches: recurrent neural networks like LSTM and others
very heavy models (in terms of training and inference)
can show very good results
Due to my limited knowledge of the cryptocurrency market, I am removing the A) option. Since we have a lot of data,
it will be quite hard to train high quality statistical models (we need to enumerate hyperparameters). Because of this
approach C) is also rejected. Our goal is to predict 300ms ahead, because of the fact that in approach D)
this is comparable to inference models, it is also removed.
This leaves approach B), in which we need to automatically construct good features. Moreover, because of the limitation on inference and the lack of on-line retraining, our model must work fast enough (there must not be very many features), and also have a prediction horizon comparable to the learning time of the new model (we must have a good model at every moment, if the model is built longer than its predictions become obsolete, we will not be able to trade).