Value Residual Learning

This repo includes instructions for running Resformer and SVformer introduced in the following paper: Value Residual Learning For Alleviating Attention Concentration In Transformers.

Requirement

pip install transformers=4.44.2.

Data

Download the tokenizer and place it in the "data/tokenizer/RedPajama-INCITE-Base-7B".
Follow the instructions in the "README.md" located in "src_data/" to prepare "processed_slimpajama_20B" and place it in the "data/".

Analysis

The code for entropy analysis and token similarity analysis can be found in "analyze/get_entropy.py" and "analyze/get_simlarity.py" respectively.

Train

mkdir logs, mkdir output

Modify the "CACHE" and "CODE_DIR" in the "*.sh" file, then run bash scripts/run_llama_baseline_82M.sh and bash scripts/run_llama_resformer_82M.sh.

Relative Loss Analysis

Run analyze/plot_relative_loss.py.

Notable attempts and variants:

modded nanogpt project
rwkv7
- twitter
- code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Value Residual Learning

Requirement

Data

Analysis

Train

Relative Loss Analysis

Notable attempts and variants:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
analyze		analyze
config		config
scripts		scripts
src		src
src_data		src_data
LICENSE		LICENSE
README.md		README.md

License

Zcchill/Value-Residual-Learning

Folders and files

Latest commit

History

Repository files navigation

Value Residual Learning

Requirement

Data

Analysis

Train

Relative Loss Analysis

Notable attempts and variants:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages