Hasan Abed Al Kader Hammoud · Umberto Michieli · Fabio Pizzati · Philip Torr · Adel Bibi · Bernard Ghanem · Mete Ozay
📌 This work was completed during an internship of Hasan Abed Al Kader Hammoud at Samsung Research UK.
Merging Large Language Models (LLMs) is a cost-effective technique for combining multiple expert LLMs into a single versatile model, retaining the expertise of the original ones. However, current approaches often overlook the importance of safety alignment during merging, leading to highly misaligned models. This work investigates the effects of model merging on alignment. We evaluate several popular model merging techniques, demonstrating that existing methods do not only transfer domain expertise but also propagate misalignment. We propose a simple two-step approach to address this problem: (i) generating synthetic safety and domain-specific data, and (ii) incorporating these generated data into the optimization process of existing data-aware model merging techniques. This allows us to treat alignment as a skill that can be maximized in the resulting merged LLM. Our experiments illustrate the effectiveness of integrating alignment-related data during merging, resulting in models that excel in both domain expertise and alignment.
Welcome to the official repository for our EMNLP 2024 paper, "Model Merging and Safety Alignment: One Bad Model Spoils the Bunch"! 🎉 We introduce a novel approach to merging Large Language Models (LLMs) while prioritizing safety alignment. Our research demonstrates that existing merging methods can inadvertently propagate misalignment and offers robust solutions to this critical challenge.
Our repository is organized into the following key components:
- Data Generation (
gen_data/
): Scripts and tools for generating synthetic task and alignment data to be used later for the data aware merging. - Model Merging (
merging/
): Implementations of data aware model merging techniques, including LM Cocktail and Evolutionary Merge. - Evaluation (
evaluation/
): Tools and scripts for evaluating merged models using LLaMA Guard (alignment) and LM Harness (task).
Our data generation pipeline consists of three main steps:
cd gen_data
bash generate_misaligned_synthetic.sh
bash generate_mmlu_questions.sh
Note: By default, the script generates 10 samples for testing. For production use (as in our paper), we generated ~2,000 samples. Adjust the number of samples in the script as needed.
bash generate_answers.sh
This step also tags the data as either 'alignment' or 'task' for guided merging.
Our implementation of the LM Cocktail approach with alignment considerations:
-
Setup:
cd merging/lm_cocktail
-
Configuration:
Update
utils.py
to handle specific models:def preprocess_data_for_llm(example_data, tokenizer, device, batch_size:int=2, max_input_length:int=2048): batch_input_ids = [] batch_labels = [] batch_max_length = max_input_length tokenizer.pad_token_id = tokenizer.eos_token_id # Required for some models
-
Run Merging:
python lmcocktail_merge_align.py --all_types --temp 1.0 --max_per_task 5
Parameters:
--all_types
: Enable all merging types--temp
: Sampling temperature (default: 1.0)--max_per_task
: Maximum tasks per run (default: 5)
Evolutionary-based model merging with safety constraints:
cd merging/evo_merge
# Run evolutionary merging with different configurations
mergekit-evolve ./examples/genomic_1.yml --storage-path ./merged_model_1 --task-search-path workspace/eval_tasks/ --merge-cuda --max-fevals 100
Configuration:
- Use example configs in
examples/
(e.g.,genomic_1.yml
) - Modify parameters in config files for different merging strategies
The evaluation uses accelerate
for multi-GPU inference:
cd evaluation/llama_guard_eval
# The evaluation script supports multiple models
# Models are specified in JSON format (e.g., "beaver_tails_aaditya_Llama3_OpenBioLLM_8B.json")
./runner.sh
# Under the hood, it runs:
accelerate launch --multi_gpu infer.py $responses_path
We evaluate models on three main categories:
-
Biology Tasks:
lm_eval --model hf --model_args pretrained=$MODEL_PATH,trust_remote_code=True \ --tasks 'mmlu_college_medicine,mmlu_professional_medicine,mmlu_anatomy,mmlu_clinical_knowledge,mmlu_medical_genetics,medqa_4options,pubmedqa,medmcqa,mmlu_college_biology' \ --batch_size 16
-
MMLU STEM:
lm_eval --model hf --model_args pretrained=$MODEL_PATH,trust_remote_code=True \ --tasks 'mmlu' \ --batch_size 16
-
Three-Model Merging Tasks:
lm_eval --model hf --model_args pretrained=$MODEL_PATH,trust_remote_code=True \ --tasks 'winogrande,arc_challenge' \ --batch_size 16
Note: Replace
$MODEL_PATH
with your model's path and adjust the batch size based on available resources.
If you find this work useful in your research, please cite our paper:
@inproceedings{hammoud-etal-2024-model,
title = "Model Merging and Safety Alignment: One Bad Model Spoils the Bunch",
author = "Hammoud, Hasan Abed Al Kader and
Michieli, Umberto and
Pizzati, Fabio and
Torr, Philip and
Bibi, Adel and
Ghanem, Bernard and
Ozay, Mete",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-emnlp.762",
doi = "10.18653/v1/2024.findings-emnlp.762",
pages = "13033--13046",
}
We would like to acknowledge the following resources used in our work: