Unpaired RLHF

This codebase explores Reinforcement Learning with Human Feedback (RLHF) using unpaired preferences. Unlike the standard approach, where preferences are determined by comparing two prompt completions, we fine-tune the model based on preferences expressed as thumbs-up or thumbs-down ratings.
To do so, we implement a pointwise reward model and an unpaired RLOO trainer.

Usages

Install the requirements from requirements.txt.
All scripts can run on a single A100-80GB GPU and have two variants, using full training or QLoRA.

RLHF with Paired Preferences:

Supervised fine-tuning (SFT):

# Full SFT training of Qwen2.5-1.5B
bash jobs_local/train_sft_full.sh

# QLoRA SFT training of Zephyr-7b
bash jobs_local/train_sft_qlora.sh

Pairwise reward model (RM) training:

# Full pairwise RM training of Qwen2.5-1.5B
bash jobs_local/train_reward_pairwise_full.sh

# QLoRA pairwise RM training of Zephyr-7b
bash jobs_local/train_reward_pairwise_qlora.sh

RLOO training with paired preferences:

# Full training of standard RLOO of Qwen2.5-1.5B
bash jobs_local/train_rloo_paired_full.sh

# QLoRA training of standard RLOO of Zephyr-7b
bash jobs_local/train_rloo_paired_qlora.sh

RLHF with Unpaired Preferences:

Use the same SFT scripts as above.
Pointwise reward model (RM) training:

# Full training of pointwise RM of Qwen2.5-1.5B
bash jobs_local/train_reward_pointwise_full.sh

# QLoRA training of pointwise RM of Zephyr-7b
bash jobs_local/train_reward_pointwise_qlora.sh

RLOO training with unpaired preferences:

# Full training of unpaired RLOO of Qwen2.5-1.5B
bash jobs_local/train_rloo_unpaired_full.sh

# QLoRA training of unpaired RLOO of Zephyr-7b
bash jobs_local/train_rloo_unpaired_qlora.sh

KTO:

# Full training of KTO of Qwen2.5-1.5B
bash jobs_local/train_kto_full.sh

# QLoRA training of KTO of 
bash jobs_local/train_kto_full.sh

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
jobs_local		jobs_local
llm_judge		llm_judge
unpaired_rlhf		unpaired_rlhf
.gitignore		.gitignore
README.md		README.md
kto.py		kto.py
pairwise_reward_model.py		pairwise_reward_model.py
pointwise_reward_model.py		pointwise_reward_model.py
preprocess_unpaired_rlhf_data.py		preprocess_unpaired_rlhf_data.py
requirements.txt		requirements.txt
rloo.py		rloo.py
sft.py		sft.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unpaired RLHF

Usages

RLHF with Paired Preferences:

RLHF with Unpaired Preferences:

About

Releases

Packages

Contributors 2

Languages

sahandrez/unpaired_rlhf

Folders and files

Latest commit

History

Repository files navigation

Unpaired RLHF

Usages

RLHF with Paired Preferences:

RLHF with Unpaired Preferences:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages