The role of LLMs-as-Judges is a rapidly developing research area, with a wealth of significant papers emerging at NeurIPS 2024. This repository organizes key papers from NeurIPS 2024, covering multiple directions, including the latest advancements in optimizing human preferences, model self-improvement, and the application of LLM judges in solving complex problems. Our aim is to provide researchers and developers with the latest theories and methods to drive the deeper development of the LLMs-as-Judges.

Oral Paper (1)

LLM Evaluators Recognize and Favor Their Own Generations

NeurIPS 2024. [Paper]

Spotlight Paper (2)

Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences

NeurIPS 2024. [Paper]
Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

NeurIPS 2024. [Paper]

Poster Paper (26)

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

NeurIPS 2024. [Paper]
Detecting Bugs with Substantial Monetary Consequences by LLM and Rule-based Reasoning

NeurIPS 2024. [Paper]
A Critical Evaluation of AI Feedback for Aligning Large Language Models

NeurIPS 2024. [Paper]
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

NeurIPS 2024. [Paper]
Verified Code Transpilation with LLMs

NeurIPS 2024. [Paper]
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

NeurIPS 2024. [Paper]
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search

NeurIPS 2024. [Paper]
Self-Discover: Large Language Models Self-Compose Reasoning Structures

NeurIPS 2024. [Paper]
Self-Retrieval: End-to-End Information Retrieval with One Large Language Model

NeurIPS 2024. [Paper]
LLM-AutoDA: Large Language Model-Driven Automatic Data Augmentation for Long-tailed Problems

NeurIPS 2024. [Paper]
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning

NeurIPS 2024. [Paper]
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

NeurIPS 2024. [Paper]
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness

NeurIPS 2024. [Paper]
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

NeurIPS 2024. [Paper]
CriticEval: Evaluating Large Language Model as Critic

NeurIPS 2024. [Paper]
AlphaMath Almost Zero: Process Supervision without Process

NeurIPS 2024. [Paper]
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

NeurIPS 2024. [Paper]
On scalable oversight with weak LLMs judging strong LLMs

NeurIPS 2024. [Paper]
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation

NeurIPS 2024. [Paper]
StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving

NeurIPS 2024. [Paper]
Reflective Multi-Agent Collaboration based on Large Language Models

NeurIPS 2024. [Paper]
A Theoretical Understanding of Self-Correction through In-context Alignment

NeurIPS 2024. [Paper]
Training LLMs to Better Self-Debug and Explain Code

NeurIPS 2024. [Paper]
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

NeurIPS 2024. [Paper]
Recursive Introspection: Teaching Language Model Agents How to Self-Improve

NeurIPS 2024. [Paper]
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

NeurIPS 2024. [Paper]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NeurIPS.md

NeurIPS.md

Oral Paper (1)

Spotlight Paper (2)

Poster Paper (26)

Files

NeurIPS.md

Latest commit

History

NeurIPS.md

File metadata and controls

Oral Paper (1)

Spotlight Paper (2)

Poster Paper (26)