Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

Hasan Abed Al Kader Hammoud · Umberto Michieli · Fabio Pizzati · Philip Torr · Adel Bibi · Bernard Ghanem · Mete Ozay

📌 This work was completed during an internship of Hasan Abed Al Kader Hammoud at Samsung Research UK.

📖 Abstract

Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. This work investigates the effects of model merging on alignment. We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment. We propose a simple two-step approach to address this problem: (i) generating synthetic safety and domain-specific data, and (ii) incorporating these generated data into the optimization process of existing data-aware model merging techniques. This allows us to treat alignment as a skill that can be maximized in the resulting merged LLM. Our experiments illustrate the effectiveness of integrating alignment-related data during merging, resulting in models that excel in both domain expertise and alignment.

✨ Overview

Welcome to the official repository for our EMNLP 2024 paper, "Model Merging and Safety Alignment: One Bad Model Spoils the Bunch"! 🎉 We introduce a novel approach to merging Large Language Models (LLMs) while prioritizing safety alignment. Our research demonstrates that existing merging methods can inadvertently propagate misalignment and offers robust solutions to this critical challenge.

🗂️ Repository Structure

Our repository is organized into the following key components:

Data Generation (gen_data/): Scripts and tools for generating synthetic task and alignment data to be used later for the data aware merging.
Model Merging (merging/): Implementations of data aware model merging techniques, including LM Cocktail and Evolutionary Merge.
Evaluation (evaluation/): Tools and scripts for evaluating merged models using LLaMA Guard (alignment) and LM Harness (task).

⚙️ Usage

1. Data Generation (`gen_data/`)

Our data generation pipeline consists of three main steps:

a. Generate Misaligned Questions

cd gen_data
bash generate_misaligned_synthetic.sh

b. Generate Synthetic MMLU Questions

bash generate_mmlu_questions.sh

Note: By default, the script generates 10 samples for testing. For production use (as in our paper), we generated ~2,000 samples. Adjust the number of samples in the script as needed.

c. Generate Model Responses

bash generate_answers.sh

This step also tags the data as either 'alignment' or 'task' for guided merging.

2. Model Merging Methods

a. LM Cocktail (`merging/lm_cocktail/`)

Our implementation of the LM Cocktail approach with alignment considerations:

Setup:
```
cd merging/lm_cocktail
```

Configuration:

Update utils.py to handle specific models:

def preprocess_data_for_llm(example_data, tokenizer, device, batch_size:int=2, max_input_length:int=2048):
    batch_input_ids = []
    batch_labels = []
    batch_max_length = max_input_length
    tokenizer.pad_token_id = tokenizer.eos_token_id  # Required for some models

Run Merging:
```
python lmcocktail_merge_align.py --all_types --temp 1.0 --max_per_task 5
```
Parameters:
- --all_types: Enable all merging types
- --temp: Sampling temperature (default: 1.0)
- --max_per_task: Maximum tasks per run (default: 5)

b. Evolutionary Merge (`merging/evo_merge/`)

Evolutionary-based model merging with safety constraints:

cd merging/evo_merge
# Run evolutionary merging with different configurations
mergekit-evolve ./examples/genomic_1.yml --storage-path ./merged_model_1 --task-search-path workspace/eval_tasks/ --merge-cuda --max-fevals 100

Configuration:

Use example configs in examples/ (e.g., genomic_1.yml)
Modify parameters in config files for different merging strategies

3. Evaluation Tools

a. LLaMA Guard Evaluation

The evaluation uses accelerate for multi-GPU inference:

cd evaluation/llama_guard_eval

# The evaluation script supports multiple models
# Models are specified in JSON format (e.g., "beaver_tails_aaditya_Llama3_OpenBioLLM_8B.json")
./runner.sh

# Under the hood, it runs:
accelerate launch --multi_gpu infer.py $responses_path

b. LM Harness Evaluation

We evaluate models on three main categories:

Biology Tasks:

lm_eval --model hf --model_args pretrained=$MODEL_PATH,trust_remote_code=True \
        --tasks 'mmlu_college_medicine,mmlu_professional_medicine,mmlu_anatomy,mmlu_clinical_knowledge,mmlu_medical_genetics,medqa_4options,pubmedqa,medmcqa,mmlu_college_biology' \
        --batch_size 16

MMLU STEM:

lm_eval --model hf --model_args pretrained=$MODEL_PATH,trust_remote_code=True \
        --tasks 'mmlu' \
        --batch_size 16

Three-Model Merging Tasks:

lm_eval --model hf --model_args pretrained=$MODEL_PATH,trust_remote_code=True \
        --tasks 'winogrande,arc_challenge' \
        --batch_size 16

Note: Replace $MODEL_PATH with your model's path and adjust the batch size based on available resources.

📝 Citation

If you find this work useful in your research, please cite our paper:

@inproceedings{hammoud-etal-2024-model,
    title = "Model Merging and Safety Alignment: One Bad Model Spoils the Bunch",
    author = "Hammoud, Hasan Abed Al Kader  and
      Michieli, Umberto  and
      Pizzati, Fabio  and
      Torr, Philip  and
      Bibi, Adel  and
      Ghanem, Bernard  and
      Ozay, Mete",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.762",
    doi = "10.18653/v1/2024.findings-emnlp.762",
    pages = "13033--13046",
}

Acknowledgements

We would like to acknowledge the following resources used in our work:

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.vscode		.vscode
assets		assets
evaluation		evaluation
gen_data		gen_data
merging		merging
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

📖 Abstract

✨ Overview

🗂️ Repository Structure

⚙️ Usage

1. Data Generation (`gen_data/`)

a. Generate Misaligned Questions

b. Generate Synthetic MMLU Questions

c. Generate Model Responses

2. Model Merging Methods

a. LM Cocktail (`merging/lm_cocktail/`)

b. Evolutionary Merge (`merging/evo_merge/`)

3. Evaluation Tools

a. LLaMA Guard Evaluation

b. LM Harness Evaluation

📝 Citation

Acknowledgements

About

Releases

Packages

Languages

License

hammoudhasan/MergeAlign

Folders and files

Latest commit

History

Repository files navigation

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

📖 Abstract

✨ Overview

🗂️ Repository Structure

⚙️ Usage

1. Data Generation (gen_data/)

a. Generate Misaligned Questions

b. Generate Synthetic MMLU Questions

c. Generate Model Responses

2. Model Merging Methods

a. LM Cocktail (merging/lm_cocktail/)

b. Evolutionary Merge (merging/evo_merge/)

3. Evaluation Tools

a. LLaMA Guard Evaluation

b. LM Harness Evaluation

📝 Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Data Generation (`gen_data/`)

a. LM Cocktail (`merging/lm_cocktail/`)

b. Evolutionary Merge (`merging/evo_merge/`)

Packages