Skip to content

A comprehensive benchmark to evaluate and improve the fundamental numerical reasoning abilities of large language models using diverse synthetic and real-world datasets.

Notifications You must be signed in to change notification settings

TreeAI-Lab/NumericBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models

📖 NumericBench Paper • 🤗 NumericBench Dataset

Introduction

NumericBench is a comprehensive benchmark designed to evaluate the numerical reasoning capabilities of Large Language Models, addressing their limitations in tasks like arithmetic, number recognition, contextual retrieval, comparison, summarization, and logical reasoning. By incorporating diverse datasets ranging from synthetic number lists to real-world domains like stock trends and weather patterns, NumericBench systematically tests LLMs in both structured and noisy contexts. Experiments on models such as GPT-4o and DeepSeek-V3 reveal significant weaknesses, emphasizing the need for numerically-aware modeling to enhance LLMs' real-world applicability.

Experiment Result

Citation

@misc{li2025exposingnumeracygapsbenchmark,
      title={Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models}, 
      author={Haoyang Li and Xuejia Chen and Zhanchao XU and Darian Li and Nicole Hu and Fei Teng and Yiming Li and Luyu Qiu and Chen Jason Zhang and Qing Li and Lei Chen},
      year={2025},
      eprint={2502.11075},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.11075}, 
}

About

A comprehensive benchmark to evaluate and improve the fundamental numerical reasoning abilities of large language models using diverse synthetic and real-world datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published