You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ref to: #THUDM/LongBench#67
Longbench dataset contains a sub-dataset of "qasper". But I found that the "answers" of several examples are confusing. I want to know whether it is my misunderstanding of the dataset or an issue with the data annotatation.
{"pred": "No", "answers": ["Yes", "No"], "all_classes": null, "length": 2317, "input": "Does this method help in sentiment classification task improvement?", "_id": "bcfe56efad9715cc714ffd2e523eaa9ad796a453e7da77a6"}
{"pred": "unanswerable", "answers": ["Yes", "Unanswerable"], "all_classes": null, "length": 2284, "actual_length": 3533, "input": "Is jiant compatible with models in any programming language?", "_id": "e5d1d589ddb30f43547012f04b06ac2924a1f4fdcf56daab"}
{"pred": "BERTBase", "answers": ["BERTbase", "BERTbase"], "all_classes": null, "length": 3852, "actual_length": 5701, "input": "What BERT model do they test?", "_id": "2a51c07e65a9214ed2cd3c04303afa205e005f4e1ccb172a"}
The text was updated successfully, but these errors were encountered:
@Zcchill Can you elaborate what is confusing about the answers? If it is that there are multiple answers which sometimes contradict with each other, that is because the annotators do not always agree with each other as is expected in difficult task requiring expert knowledge. The disagreements are quantified in the paper associated with the dataset. Also, the prescribed evaluation method is to consider a prediction to be correct if it matches any of the ground truth answers.
Ref to: #THUDM/LongBench#67
Longbench dataset contains a sub-dataset of "qasper". But I found that the "answers" of several examples are confusing. I want to know whether it is my misunderstanding of the dataset or an issue with the data annotatation.
{"pred": "No", "answers": ["Yes", "No"], "all_classes": null, "length": 2317, "input": "Does this method help in sentiment classification task improvement?", "_id": "bcfe56efad9715cc714ffd2e523eaa9ad796a453e7da77a6"}
{"pred": "unanswerable", "answers": ["Yes", "Unanswerable"], "all_classes": null, "length": 2284, "actual_length": 3533, "input": "Is jiant compatible with models in any programming language?", "_id": "e5d1d589ddb30f43547012f04b06ac2924a1f4fdcf56daab"}
{"pred": "BERTBase", "answers": ["BERTbase", "BERTbase"], "all_classes": null, "length": 3852, "actual_length": 5701, "input": "What BERT model do they test?", "_id": "2a51c07e65a9214ed2cd3c04303afa205e005f4e1ccb172a"}
The text was updated successfully, but these errors were encountered: