This is the code for AdaLomo: Low-memory Optimization with Adaptive Learning Rate.
In this work, we examined the distinctions between the LOMO and Adam optimization techniques and introduce AdaLomo, which provides an adaptive learning rate for each parameter and utilizes grouped update normalization while maintaining memory efficiency. AdaLomo achieves results comparable to AdamW in both instruction-tuning and further pre-training with less memory footprint.
collie-lm
AdaLomo is implemented at https://github.com/OpenLMLab/collie/blob/dev/collie/optim/adalomo.py.
We use Alpaca-GPT4 as our training dataset, which is available at https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/alpaca_gpt4_data.json.
cd instruction-tuning
wget https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/blob/main/data/alpaca_gpt4_data.json
torchrun --nproc_per_node=8 train.py --optim adalomo --model_size 7b
The evaluation is based on opencompass. Below are the steps for quick installation.
conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass
git clone https://github.com/KaiLv69/opencompass opencompass
cd opencompass
pip install -e .
Below are the steps for evaluation.
python run.py configs/eval_collie.py -r
-r
is for resuming the previous evaluation process.
You may refer to opencompass/configs/eval_collie.py
for more details.
Download python subset of StarCoder and set the path in the get_dataset()
in further-pretraining/train.py
.
torchrun --nproc_per_node=8 train.py --optim adalomo --model_size 7b