Value Residual Learning

This repo includes instructions for running Resformer and SVformer introduced in the following paper: Value Residual Learning For Alleviating Attention Concentration In Transformers.

Requirement

pip install transformers=4.44.2.

Data

Download the tokenizer and place it in the "data/tokenizer/RedPajama-INCITE-Base-7B".
Follow the instructions in the "README.md" located in "src_data/" to prepare "processed_slimpajama_20B" and place it in the "data/".

Analysis

The code for entropy analysis and token similarity analysis can be found in "analyze/get_entropy.py" and "analyze/get_simlarity.py" respectively.

Train

mkdir logs, mkdir output

Modify the "CACHE" and "CODE_DIR" in the "*.sh" file, then run bash scripts/run_llama_baseline_82M.sh and bash scripts/run_llama_resformer_82M.sh.

Relative Loss Analysis

Run analyze/plot_relative_loss.py.

Notable attempts and variants:

modded nanogpt project
rwkv7
- twitter
- code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Value Residual Learning

Requirement

Data

Analysis

Train

Relative Loss Analysis

Notable attempts and variants:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Value Residual Learning

Requirement

Data

Analysis

Train

Relative Loss Analysis

Notable attempts and variants: