Update README.md

SJTU-DMTai · Dec 12, 2024 · fbd067c · fbd067c
1 parent ddfe1f2
commit fbd067c
Showing 1 changed file with 47 additions and 41 deletions.
diff --git a/examples/benchmarks_dynamic/incremental/README.md b/examples/benchmarks_dynamic/incremental/README.md
@@ -7,26 +7,36 @@ The paper has been accepted by KDD 2023, which is better to read in [[arXiv](htt
 
 To get rid of dependencies on qlib, please refer to our [API](https://github.com/SJTU-Quant/DoubleAdapt) repo. (This API repo is not well maintained and may have undiscovered bugs. We still recommend our qlib repo)
 
-## :newspaper: News
-Sep 15, 2023 :hammer: Support overriding learning rates during online training (meta-valid and meta-test). 
-It is highly [**recommended**](#remarks) to tune offline and online learning rates.  
-We also **CHANGED** our argparser: the arg `--lr` now means the learning rate of forecast model, while a new arg `--lr_da` means that of data adapter. 
-
-## Organization
+### Code Organization
 The runner program is [./main.py](main.py).
 
 The core implementation of the framework lies in [qlib/contrib/meta/incremental/](https://github.com/SJTU-Quant/qlib/blob/main/qlib/contrib/meta/incremental/).
 
 The implementation of any forecast model lies in [qlib/contrib/model/](https://github.com/SJTU-Quant/qlib/blob/main/qlib/contrib/model/) (e.g., GRU is in [qlib/contrib/model/pytorch_gru.py](https://github.com/SJTU-Quant/qlib/blob/main/qlib/contrib/model/pytorch_gru.py)).
 
-## Scripts
-```bash
-# Naive incremental learning
-python -u main.py run_all --forecast_model GRU --market csi300 --data_dir crowd_data --rank_label False --naive True
-# DoubleAdapt
-python -u main.py run_all --forecast_model GRU --market csi300 --data_dir crowd_data --rank_label False \ 
---num_head 8 --tau 10 --lr 0.001 --lr_da 0.01 --online_lr "{'lr': 0.001, 'lr_da': 0.0001, 'lr_ma': 0.001}"
-```
+## IMPORTANT Suggestions before Deployment
+Sorry for that our experimental settings followed prior works but are not optimal in practical usages. We provide some suggestions below to help you successfully customize DoubleAdapt for application.
+
+### Combine incremental learning (IL) with rolling retraining (RR)
+Though our paper consider RR as a comparison method against IL, they are orthogonal to each other. 
+For more profits, a recommended setting is to retrain DoubleAdapt from scratch every month and, during the month, perform DoubleAdapt every 2~3 trading days.
+
+### Re-devise the data adapter
+We mainly experiment on a simple dataset Alpha360, and our proposed feature adaptation only involves 6$\times$6 affine transformation with a few parameters to learn. 
+Since common practice in quantative investment is based on hundreds of factors (e.g., Alpha158), our fully connected layer is over-parameterized and achieves suboptimal performance. It would be better to design a new data adapter. Below are some more lightweight designs:
+- Divide the factors into different groups and learn affine transformation within the same group, though ignoring interactions between factors of different groups.
+- Or: Apply the same transformation on the embedding of each factor. Learning element-wise operations (e.g. normalizing flows) over all factor embeddings.
+
+As for general multivariate time series forecasting, we empirically found that an channel-independent data adapter is desirable, which transforms the lookback/horizon window of each variable independently.
+
+If you meet any question or issue, please let us know. We are glad to discuss with you.
+
+### Grid search on learning rates during offline and online training
+It is **necessary** to perform hyperparameter tuning for learning rates `lr_da`, `lr_ma` and `lr` (learning rate of the lower level). 
+Note that the learning rates during online training could be different from those during offline training.
+
+> Fill arg `--online_lr` to set different learning rates.
+> Example: `--online_lr "{'lr': 0.0005, 'lr_da': 0.0001, 'lr_ma': 0.0005}"`
 
 ## Dataset
 Following DDG-DA, we run experiments on the crowd-source version of qlib data which can be downloaded by
@@ -46,14 +56,36 @@ Pay attention to the arg `--rank_label False` (or `--rank_label True`) for the t
 
 As the current implementation is simple and may not suit rank labels, we recommend `--adapt_y False` when you have to set `--rank_label True`.  
 
+## Scripts
+```bash
+# Naive incremental learning
+python -u main.py run_all --forecast_model GRU --market csi300 --data_dir crowd_data --rank_label False --naive True
+# DoubleAdapt
+python -u main.py run_all --forecast_model GRU --market csi300 --data_dir crowd_data --rank_label False \ 
+--num_head 8 --tau 10 --lr 0.001 --lr_da 0.01 --online_lr "{'lr': 0.001, 'lr_da': 0.0001, 'lr_ma': 0.001}"
+```
+
+### Carefully select `step` according to `horizon`
+Arg `--horizon` decides the target label to be `Ref($close, -horizon-1}) / Ref($close, -1) - 1` in the China A-share market. 
+Accordingly, there are always unknown ground-truth labels in the lasted `horizon` days of test data, and we can only use the rest for optimization of the meta-learners.
+With a large `horizon` or a small `step`, the performance on the majority of the test data cannot be optimized, 
+and the meta-learners may well be overfitted and shortsighted.
+We provide an arg `--use_extra True` to take the nearest data as additional test data, while the improvement is often little.
+
+It is recommended to let `step` be greater than `horizon` by at least 3 or 4, e.g., `--step 5 --horizon 1`.
+
+> The current implementation does not support `step` $\le$ `horizon` (e.g., `--step 1 --horizon 1`) during online training.
+> 
+> As the offline training can be conducted as usual, you can freeze the meta-learners online, initialize a forecast model by the model adapter, and then incrementally update the forecast model throughout the online phase.
+
 ## Requirements
 
 ### Packages
+On top of requirements in qlib, we use an additional package from [github.com/facebookresearch/higher](https://github.com/facebookresearch/higher)
 ```bash
 conda install higher -c conda-forge
 # pip install higher
 ```
-Thanks to [github.com/facebookresearch/higher](https://github.com/facebookresearch/higher)
 
 ### RAM
 
@@ -79,32 +111,6 @@ If your GPU is limited, try to set a smaller `step` (e.g., 5) which may take up
 > The reason why we set `step` to 20 rather than 5 is that 
 RR and DDG-DA bear unaffordable time costs (e.g., 3 days for 10 runs) in experiments with `step` set to 5.   
 
-## Remarks <a id="remarks"></a>
-### Grid search on learning rates during offline and online training
-It is **necessary** to perform hyperparameter tuning for learning rates `lr_da`, `lr_ma`, and `lr` (learning rate of the lower level). 
-Note that the learning rates during online training could be different from those during offline training.
-
-> Fill arg `--online_lr` to set different learning rates.
-> Example: `--online_lr "{'lr': 0.0005, 'lr_da': 0.0001, 'lr_ma': 0.0005}"`
-
-### Carefully select `step` according to `horizon`
-Arg `--horizon` decides the target label to be `Ref($close, -horizon-1}) / Ref($close, -1) - 1` in the China A-share market. 
-Accordingly, there are always unknown ground-truth labels in the lasted `horizon` days of test data, and we can only use the rest for optimization of the meta-learners.
-With a large `horizon` or a small `step`, the performance on the majority of the test data cannot be optimized, 
-and the meta-learners may well be overfitted and shortsighted.
-We provide an arg `--use_extra True` to take the nearest data as additional test data, while the improvement is often little.
-
-It is recommended to let `step` be greater than `horizon` by at least 3 or 4, e.g., `--step 5 --horizon 1`.
-
-> The current implementation does not support `step` $\le$ `horizon` (e.g., `--step 1 --horizon 1`) during online training.
-> 
-> As the offline training can be conducted as usual, you can freeze the meta-learners online, initialize a forecast model by the model adapter, and then incrementally update the forecast model throughout the online phase.
-
-### Re-devise the data adapter for more improvement
-We mainly experiment on a simple dataset Alpha360 where the feature adaptation only involves 6$\times$6 affine transformation with a few parameters to learn.
-
-For more complex datasets, please carefully design a new adapter to reduce overfitting risks due to high-dimensional features.
-
 ## Cite
 If you find this useful for your work, please consider citing it as follows:
 ```bash