Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
MogicianXD authored Dec 12, 2024
1 parent ddfe1f2 commit fbd067c
Showing 1 changed file with 47 additions and 41 deletions.
88 changes: 47 additions & 41 deletions examples/benchmarks_dynamic/incremental/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,36 @@ The paper has been accepted by KDD 2023, which is better to read in [[arXiv](htt

To get rid of dependencies on qlib, please refer to our [API](https://github.com/SJTU-Quant/DoubleAdapt) repo. (This API repo is not well maintained and may have undiscovered bugs. We still recommend our qlib repo)

## :newspaper: News
Sep 15, 2023 :hammer: Support overriding learning rates during online training (meta-valid and meta-test).
It is highly [**recommended**](#remarks) to tune offline and online learning rates.
We also **CHANGED** our argparser: the arg `--lr` now means the learning rate of forecast model, while a new arg `--lr_da` means that of data adapter.

## Organization
### Code Organization
The runner program is [./main.py](main.py).

The core implementation of the framework lies in [qlib/contrib/meta/incremental/](https://github.com/SJTU-Quant/qlib/blob/main/qlib/contrib/meta/incremental/).

The implementation of any forecast model lies in [qlib/contrib/model/](https://github.com/SJTU-Quant/qlib/blob/main/qlib/contrib/model/) (e.g., GRU is in [qlib/contrib/model/pytorch_gru.py](https://github.com/SJTU-Quant/qlib/blob/main/qlib/contrib/model/pytorch_gru.py)).

## Scripts
```bash
# Naive incremental learning
python -u main.py run_all --forecast_model GRU --market csi300 --data_dir crowd_data --rank_label False --naive True
# DoubleAdapt
python -u main.py run_all --forecast_model GRU --market csi300 --data_dir crowd_data --rank_label False \
--num_head 8 --tau 10 --lr 0.001 --lr_da 0.01 --online_lr "{'lr': 0.001, 'lr_da': 0.0001, 'lr_ma': 0.001}"
```
## IMPORTANT Suggestions before Deployment
Sorry for that our experimental settings followed prior works but are not optimal in practical usages. We provide some suggestions below to help you successfully customize DoubleAdapt for application.

### Combine incremental learning (IL) with rolling retraining (RR)
Though our paper consider RR as a comparison method against IL, they are orthogonal to each other.
For more profits, a recommended setting is to retrain DoubleAdapt from scratch every month and, during the month, perform DoubleAdapt every 2~3 trading days.

### Re-devise the data adapter
We mainly experiment on a simple dataset Alpha360, and our proposed feature adaptation only involves 6$\times$6 affine transformation with a few parameters to learn.
Since common practice in quantative investment is based on hundreds of factors (e.g., Alpha158), our fully connected layer is over-parameterized and achieves suboptimal performance. It would be better to design a new data adapter. Below are some more lightweight designs:
- Divide the factors into different groups and learn affine transformation within the same group, though ignoring interactions between factors of different groups.
- Or: Apply the same transformation on the embedding of each factor. Learning element-wise operations (e.g. normalizing flows) over all factor embeddings.

As for general multivariate time series forecasting, we empirically found that an channel-independent data adapter is desirable, which transforms the lookback/horizon window of each variable independently.

If you meet any question or issue, please let us know. We are glad to discuss with you.

### Grid search on learning rates during offline and online training
It is **necessary** to perform hyperparameter tuning for learning rates `lr_da`, `lr_ma` and `lr` (learning rate of the lower level).
Note that the learning rates during online training could be different from those during offline training.

> Fill arg `--online_lr` to set different learning rates.
> Example: `--online_lr "{'lr': 0.0005, 'lr_da': 0.0001, 'lr_ma': 0.0005}"`
## Dataset
Following DDG-DA, we run experiments on the crowd-source version of qlib data which can be downloaded by
Expand All @@ -46,14 +56,36 @@ Pay attention to the arg `--rank_label False` (or `--rank_label True`) for the t

As the current implementation is simple and may not suit rank labels, we recommend `--adapt_y False` when you have to set `--rank_label True`.

## Scripts
```bash
# Naive incremental learning
python -u main.py run_all --forecast_model GRU --market csi300 --data_dir crowd_data --rank_label False --naive True
# DoubleAdapt
python -u main.py run_all --forecast_model GRU --market csi300 --data_dir crowd_data --rank_label False \
--num_head 8 --tau 10 --lr 0.001 --lr_da 0.01 --online_lr "{'lr': 0.001, 'lr_da': 0.0001, 'lr_ma': 0.001}"
```

### Carefully select `step` according to `horizon`
Arg `--horizon` decides the target label to be `Ref($close, -horizon-1}) / Ref($close, -1) - 1` in the China A-share market.
Accordingly, there are always unknown ground-truth labels in the lasted `horizon` days of test data, and we can only use the rest for optimization of the meta-learners.
With a large `horizon` or a small `step`, the performance on the majority of the test data cannot be optimized,
and the meta-learners may well be overfitted and shortsighted.
We provide an arg `--use_extra True` to take the nearest data as additional test data, while the improvement is often little.

It is recommended to let `step` be greater than `horizon` by at least 3 or 4, e.g., `--step 5 --horizon 1`.

> The current implementation does not support `step` $\le$ `horizon` (e.g., `--step 1 --horizon 1`) during online training.
>
> As the offline training can be conducted as usual, you can freeze the meta-learners online, initialize a forecast model by the model adapter, and then incrementally update the forecast model throughout the online phase.
## Requirements

### Packages
On top of requirements in qlib, we use an additional package from [github.com/facebookresearch/higher](https://github.com/facebookresearch/higher)
```bash
conda install higher -c conda-forge
# pip install higher
```
Thanks to [github.com/facebookresearch/higher](https://github.com/facebookresearch/higher)

### RAM

Expand All @@ -79,32 +111,6 @@ If your GPU is limited, try to set a smaller `step` (e.g., 5) which may take up
> The reason why we set `step` to 20 rather than 5 is that
RR and DDG-DA bear unaffordable time costs (e.g., 3 days for 10 runs) in experiments with `step` set to 5.

## Remarks <a id="remarks"></a>
### Grid search on learning rates during offline and online training
It is **necessary** to perform hyperparameter tuning for learning rates `lr_da`, `lr_ma`, and `lr` (learning rate of the lower level).
Note that the learning rates during online training could be different from those during offline training.

> Fill arg `--online_lr` to set different learning rates.
> Example: `--online_lr "{'lr': 0.0005, 'lr_da': 0.0001, 'lr_ma': 0.0005}"`
### Carefully select `step` according to `horizon`
Arg `--horizon` decides the target label to be `Ref($close, -horizon-1}) / Ref($close, -1) - 1` in the China A-share market.
Accordingly, there are always unknown ground-truth labels in the lasted `horizon` days of test data, and we can only use the rest for optimization of the meta-learners.
With a large `horizon` or a small `step`, the performance on the majority of the test data cannot be optimized,
and the meta-learners may well be overfitted and shortsighted.
We provide an arg `--use_extra True` to take the nearest data as additional test data, while the improvement is often little.

It is recommended to let `step` be greater than `horizon` by at least 3 or 4, e.g., `--step 5 --horizon 1`.

> The current implementation does not support `step` $\le$ `horizon` (e.g., `--step 1 --horizon 1`) during online training.
>
> As the offline training can be conducted as usual, you can freeze the meta-learners online, initialize a forecast model by the model adapter, and then incrementally update the forecast model throughout the online phase.
### Re-devise the data adapter for more improvement
We mainly experiment on a simple dataset Alpha360 where the feature adaptation only involves 6$\times$6 affine transformation with a few parameters to learn.

For more complex datasets, please carefully design a new adapter to reduce overfitting risks due to high-dimensional features.

## Cite
If you find this useful for your work, please consider citing it as follows:
```bash
Expand Down

0 comments on commit fbd067c

Please sign in to comment.