Skip to content

Latest commit

 

History

History
31 lines (22 loc) · 1.57 KB

README.md

File metadata and controls

31 lines (22 loc) · 1.57 KB

Value Residual Learning

This repo includes instructions for running Resformer and SVformer introduced in the following paper: Value Residual Learning For Alleviating Attention Concentration In Transformers.

Requirement

pip install transformers=4.44.2.

Data

  1. Download the tokenizer and place it in the "data/tokenizer/RedPajama-INCITE-Base-7B".
  2. Follow the instructions in the "README.md" located in "src_data/" to prepare "processed_slimpajama_20B" and place it in the "data/".

Analysis

The code for entropy analysis and token similarity analysis can be found in "analyze/get_entropy.py" and "analyze/get_simlarity.py" respectively.

Train

mkdir logs, mkdir output

Modify the "CACHE" and "CODE_DIR" in the "*.sh" file, then run bash scripts/run_llama_baseline_82M.sh and bash scripts/run_llama_resformer_82M.sh.

Relative Loss Analysis

Run analyze/plot_relative_loss.py.

Notable attempts and variants:

  1. modded nanogpt project

  2. rwkv7