Skip to content

Commit

Permalink
Added more candidate datasets and benchmarks from a ticket that was u…
Browse files Browse the repository at this point in the history
…nder the user guide (#2).

Signed-off-by: Dean Wampler <[email protected]>
  • Loading branch information
deanwampler committed Jan 4, 2025
1 parent 7b4f71f commit 490dd29
Showing 1 changed file with 23 additions and 2 deletions.
25 changes: 23 additions & 2 deletions docs/evaluators/evaluators.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,28 @@ For now, see the [`unitxt` catalog](https://www.unitxt.ai/en/latest/catalog/cata

A list of possible candidates to incorporate in our catalog.

* NeurIPS 2024 [Datasets Benchmarks](https://neurips.cc/virtual/2024/events/datasets-benchmarks-2024)

_More Coming Soon_

### NeurIPS 2024 Datasets Benchmarks

The NeurIPS 2024 [Datasets Benchmarks](https://neurips.cc/virtual/2024/events/datasets-benchmarks-2024) is a list of recently-created datasets of interest for evaluation.

### `do-not-answer`

Developed by the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), [do-not-answer](https://github.com/Libr-AI/do-not-answer) is an open-source dataset to evaluate LLMs' safety mechanism at a low cost. The dataset is curated and filtered to consist only of prompts to which responsible language models do not answer. Besides human annotations, Do not answer also implements model-based evaluation, where a 600M fine-tuned BERT-like evaluator achieves comparable results with human and GPT-4.

### Human-Centric Face Representations

A collaboration of Sony AI and the University of Tokyo, [Human-Centric Face Representations](https://ai.sony/publications/A-View-From-Somewhere-Human-Centric-Face-Representations/) is a collaboration to generate a dataset of 638,180 human judgments on face similarity. Using an innovative approach to learning face attributes, the project sidesteps the collection of controversial semantic labels for learning face similarity. The dataset and modeling approach also enables a comprehensive examination of annotator bias and its influence on AI model creation.

Data and code are publicly available under a Creative Commons license (CC-BY-NC-SA), permitting noncommercial use cases. See the [GitHub repo](https://github.com/SonyAI/a_view_from_somewhere).

### Social Stigma Q&A

TODO - description

[Arxiv:2312.07492](http://arxiv.org/abs/2312.07492)

### Kepler

[Kepler](https://github.com/sustainable-computing-io/kepler) ([paper](https://dl.acm.org/doi/10.1145/3604930.3605715)) measures resource utilitization for sustainable computing purposes.

0 comments on commit 490dd29

Please sign in to comment.