Skip to content

Zcchill/Value-Residual-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Value Residual Learning

This repo includes instructions for running Resformer and SVformer introduced in the following paper: Value Residual Learning For Alleviating Attention Concentration In Transformers.

Requirement

pip install transformers=4.44.2.

Data

  1. Download the tokenizer and place it in the "data/tokenizer/RedPajama-INCITE-Base-7B".
  2. Follow the instructions in the "README.md" located in "src_data/" to prepare "processed_slimpajama_20B" and place it in the "data/".

Analysis

The code for entropy analysis and token similarity analysis can be found in "analyze/get_entropy.py" and "analyze/get_simlarity.py" respectively.

Train

mkdir logs, mkdir output

Modify the "CACHE" and "CODE_DIR" in the "*.sh" file, then run bash scripts/run_llama_baseline_82M.sh and bash scripts/run_llama_resformer_82M.sh.

Relative Loss Analysis

Run analyze/plot_relative_loss.py.

Notable attempts and variants:

  1. modded nanogpt project

  2. rwkv7

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published