This repository contains an extensible codebase to measure stereotypical bias on new pretrained models, as well as code to replicate our results. We encourage the community to use this as a springboard for further evaluation of bias in pretrained language models, and to submit attempts to mitigate bias to the leaderboard.
Note: This repository is currently not actively maintained. For updated code and the full test set, see the Bias Bench repository.
- Clone the repository:
git clone https://github.com/moinnadeem/stereoset.git
- Install the requirements:
cd stereoset && pip install -r requirements.txt
To reproduce our results for the bias in each model:
- Run
make
from thecode
folder. This step evaluates the biases on each model. - Run the scoring script with respect to each model:
python3 evaluation.py --gold-file ../data/dev.json --predictions-dir predictions/
.
We have provided our predictions in the predictions/
folder, and the output of the evaluation script in predictions.txt
. We have also included code to replicate our numbers on each table in the tables/
folder. Please feel free to file an issue if anything is off; we strongly believe in reproducible research and extensible codebases.
To cite StereoSet:
@misc{nadeem2020stereoset,
title={StereoSet: Measuring stereotypical bias in pretrained language models},
author={Moin Nadeem and Anna Bethke and Siva Reddy},
year={2020},
eprint={2004.09456},
archivePrefix={arXiv},
primaryClass={cs.CL}
}