This repo includes instructions for running Resformer and SVformer introduced in the following paper: Value Residual Learning For Alleviating Attention Concentration In Transformers.
pip install transformers=4.44.2
.
- Download the tokenizer and place it in the "data/tokenizer/RedPajama-INCITE-Base-7B".
- Follow the instructions in the "README.md" located in "src_data/" to prepare "processed_slimpajama_20B" and place it in the "data/".
The code for entropy analysis and token similarity analysis can be found in "analyze/get_entropy.py" and "analyze/get_simlarity.py" respectively.
mkdir logs
, mkdir output
Modify the "CACHE" and "CODE_DIR" in the "*.sh" file, then run bash scripts/run_llama_baseline_82M.sh
and bash scripts/run_llama_resformer_82M.sh
.
Run analyze/plot_relative_loss.py
.
-
modded nanogpt project
-
rwkv7