Skip to content

Latest commit

 

History

History
125 lines (63 loc) · 4.77 KB

NeurIPS.md

File metadata and controls

125 lines (63 loc) · 4.77 KB

The role of LLMs-as-Judges is a rapidly developing research area, with a wealth of significant papers emerging at NeurIPS 2024. This repository organizes key papers from NeurIPS 2024, covering multiple directions, including the latest advancements in optimizing human preferences, model self-improvement, and the application of LLM judges in solving complex problems. Our aim is to provide researchers and developers with the latest theories and methods to drive the deeper development of the LLMs-as-Judges.

Oral Paper (1)

  • LLM Evaluators Recognize and Favor Their Own Generations

    NeurIPS 2024. [Paper]

Spotlight Paper (2)

  • Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences

    NeurIPS 2024. [Paper]

  • Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

    NeurIPS 2024. [Paper]

Poster Paper (26)

  • Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

    NeurIPS 2024. [Paper]

  • Detecting Bugs with Substantial Monetary Consequences by LLM and Rule-based Reasoning

    NeurIPS 2024. [Paper]

  • A Critical Evaluation of AI Feedback for Aligning Large Language Models

    NeurIPS 2024. [Paper]

  • Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

    NeurIPS 2024. [Paper]

  • Verified Code Transpilation with LLMs

    NeurIPS 2024. [Paper]

  • JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

    NeurIPS 2024. [Paper]

  • ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search

    NeurIPS 2024. [Paper]

  • Self-Discover: Large Language Models Self-Compose Reasoning Structures

    NeurIPS 2024. [Paper]

  • Self-Retrieval: End-to-End Information Retrieval with One Large Language Model

    NeurIPS 2024. [Paper]

  • LLM-AutoDA: Large Language Model-Driven Automatic Data Augmentation for Long-tailed Problems

    NeurIPS 2024. [Paper]

  • Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning

    NeurIPS 2024. [Paper]

  • RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

    NeurIPS 2024. [Paper]

  • INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness

    NeurIPS 2024. [Paper]

  • DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

    NeurIPS 2024. [Paper]

  • CriticEval: Evaluating Large Language Model as Critic

    NeurIPS 2024. [Paper]

  • AlphaMath Almost Zero: Process Supervision without Process

    NeurIPS 2024. [Paper]

  • Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

    NeurIPS 2024. [Paper]

  • On scalable oversight with weak LLMs judging strong LLMs

    NeurIPS 2024. [Paper]

  • ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation

    NeurIPS 2024. [Paper]

  • StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving

    NeurIPS 2024. [Paper]

  • Reflective Multi-Agent Collaboration based on Large Language Models

    NeurIPS 2024. [Paper]

  • A Theoretical Understanding of Self-Correction through In-context Alignment

    NeurIPS 2024. [Paper]

  • Training LLMs to Better Self-Debug and Explain Code

    NeurIPS 2024. [Paper]

  • SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

    NeurIPS 2024. [Paper]

  • Recursive Introspection: Teaching Language Model Agents How to Self-Improve

    NeurIPS 2024. [Paper]

  • RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

    NeurIPS 2024. [Paper]