NeedleInAHaystack-PLUS

To assess the longtext capabilities more comprehensively, we propose Needle-in-a-Haystack PLUS, which shifts the focus from simple fact retrieval to more challenging single-document/multi-document question answering tasks.

How to evaluate on NeedleInAHaystack-PLUS

Load Data

Our test data can be download in NeedleInAHaystack-PLUS.

Data Format

All datas in NeedleInAHaystack-PLUS are standardized to the following format:

Single-document QA

{
    "id": "The unique identifier for each test data.",
    "context": "The long context of the single-document question answering task.",
    "context_length": "The length of haystack ranges from 1,000 to 128,000 tokens with equal intervals, totaling 15 different lengths.",
    "depth_percent": "The position of the needle in the haystack.",
    "input": "The questions of the question single-document answering task.",
    "dataset": "needle_squad",
    "answers": "A List of all true answers.",
}

Multi-document QA

{
    "id": "The unique identifier for each test data.",
    "context": "The long context of the single-document question answering task.",
    "context_length": "The length of haystack ranges from 1,000 to 128,000 tokens with equal intervals, totaling 15 different lengths.",
    "depth_percent1": "The position of the first needle in the haystack.",
    "depth_percent2": "The position of the second needle in the haystack.",
    "input": "The questions of the question single-document answering task.",
    "dataset": "needle_hotpotqa",
    "answers": "A List of all true answers.",
}

Results Visualization

The invocation time of the APIs:

OpenAI's GPT-4-128K (Run 2024-01-31)
Anthropic's Claude 2.1 (Run 2024-02-08)

Single-document QA

Multi-document QA

Acknowledgement

NeedleInAHaystack-PLUS is based on the datasets proposed by previous researchers, including NeedleInAHaystack, Squad, HotpotQA.

Citation

@misc{zhao2024longagent,
      title={LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration}, 
      author={Jun Zhao and Can Zu and Hao Xu and Yi Lu and Wei He and Yiwen Ding and Tao Gui and Qi Zhang and Xuanjing Huang},
      year={2024},
      eprint={2402.11550},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README.md		README.md
multiQA.jpg		multiQA.jpg
singleQA.jpg		singleQA.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeedleInAHaystack-PLUS

How to evaluate on NeedleInAHaystack-PLUS

Load Data

Data Format

Single-document QA

Multi-document QA

Results Visualization

Single-document QA

Multi-document QA

Acknowledgement

Citation

About

Releases

Packages

zuucan/NeedleInAHaystack-PLUS

Folders and files

Latest commit

History

Repository files navigation

NeedleInAHaystack-PLUS

How to evaluate on NeedleInAHaystack-PLUS

Load Data

Data Format

Single-document QA

Multi-document QA

Results Visualization

Single-document QA

Multi-document QA

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages