Official implementation of LossVal - Efficient Data Valuation for Neural Networks.
Data Valuation is the process if assigning an importance score to each data point in a dataset. This importance score can be used to improve the performance of a machine learning model by focusing on the most important data points or for better explaining your model. LossVal is a novel method for data valuation that is based on the idea of optimizing the importance scores as weights that are part of the loss function. LossVal is efficient, scalable, and can be used with any differentiable loss function.
In our experiments, we show that LossVal achieves state-of-the-art performance on a range of data valuation tasks, without needing any additional training run.
In general, loss functions used with LossVal are of the form:
The model's prediction is denoted by
Weighted cross-entropy loss:
Weighted mean-squared error loss:
Weighted optimal transport distance:
You can find a basic reference implementation in src/lossval.py
. Feel free to use this implementation as a starting point for your own experiments and modify to your needs.
All the data from the experiments can be found in the results
folder.
If you use LossVal in your research, please cite our paper:
@misc{wibiral2024lossvalefficientdatavaluation,
title={{L}oss{V}al: {E}fficient Data Valuation for Neural Networks},
author={Tim Wibiral and Mohamed Karim Belaid and Maximilian Rabus and Ansgar Scherp},
year={2024},
eprint={2412.04158},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.04158},
}