DICE: Detecting In-distribution Data Contamination with LLM's Internal State Data and Code for the paper.
git clone https://github.com/THU-KEG/DICE.git
cd DICE
Our code to fine-tune contaminated model is stored in the OOD_test/scripts
folder.
python scripts/rewrite.py --dataset_name gsm8k
The paraphrased dataset we used in the paper is available in the OOD_test/scripts/data
folder.
-
You can fine-tune a contaminated model as follows. Change the base model by
--model_name
. -
Change the contaminated benchmark by changing the
--train_dataset_name
and--dataset_name
. -
The parameter
--epoch 1
represents the 2% contamination setting in the paper. Omitting it represents the 10% setting.
cd OOD_test
CUDA_VISIBLE_DEVICES=0 python scripts/contaminated_finetune.py \
--model_name microsoft/phi-2 \
--generative_batch_size 32 \
--dataset_name gsm8k \
--train_dataset_name gsm8k \
--epochs 1
You can also use the following script to directly reproduce the contaminated model of the main experiment in our paper.
CUDA_VISIBLE_DEVICES=0 bash scripts/contaminated_finetune.sh
Similar to the fine-tuning process above, you can use the following scripts to test OOD performance.
The parameter settings are the same as above. The only thing to note is that --dataset_name
is the OOD dataset to be tested, and --train_dataset_name
is the contaminated dataset.
cd OOD_test
CUDA_VISIBLE_DEVICES=0 python OOD_generate_inf.py \
--model_name microsoft/phi-2 \
--generative_batch_size 32 \
--dataset_name math \
--train_dataset_name gsm8k \
--epochs 1
Code of this part is stored in the Locate
folder.
CUDA_VISIBLE_DEVICES=0 python DICE_locate.py \
--edited_model=meta-llama/Llama-2-7b-hf \
--hparams_dir=../hparams/DICE/llama-7b
Code of this part is stored in the contamination_classifier
folder.
make data (hidden states of contaminated layer)
You can use the following script to get the data.
-
You can fine-tune a contaminated model as follows. You can change the base model by
--model_name
. -
Change the detect benchmark by
--test_dataset
. -
--is_contaminated
shows whether the model is contaminated. -
--model_type
indicates whether the uncontaminated model is the vanilla model or the model fine-tuned only on orca. -
--contaminated_type
indicates whether the contaminated model is a fine-tuned version of the original benchmark (open) or a paraphrased benchmark (Evasive).
cd contamination_classifier
CUDA_VISIBLE_DEVICES=0 python data_maker.py \
--edited_model=meta-llama/Llama-2-7b-hf \
--hparams_dir=../hparams/DICE/llama-7b \
--test_dataset=GSM8K_seen \
--is_contaminated=True \
--model_type=vanilla \
--contaminated_type=open
You can also use the following script to directly reproduce test data of the main experiment in our paper.
CUDA_VISIBLE_DEVICES=0 bash scripts/make_test_data.sh
Use train_test.py
to train and test a DICE.
You can simply use the following script to directly reproduce test results of the main experiment in our paper.
CUDA_VISIBLE_DEVICES=0 bash scripts/Test_DICE.sh
The contamination_classifier
folder contains the code for the main experiments in the paper, including the performance_vs_score
subfolder that stores the code for the experiment to test the relationship between contaminated probability and model performance, draw_OOD.py
is the code for drawing the detection distribution of the OOD dataset, and so on.
Our implementation is based on the repository of the paper "Evading Data Contamination Detection for Language Models is (too) Easy" by Jasper Dekoninck, Mark Niklas Müller, Maximilian Baader, Marc Fischer, and Martin Vechev. The original repository can be found here. Their LICENSE file can be found in the OOD_test
folder as well. We have made some modifications to the code to adapt it to our needs.
We wish to express our appreciation to the pioneers in the field of evasive data contamination. Our work was developed as a way to address the attack presented in the evasive data contamination.