Skip to content
This repository has been archived by the owner on Jan 26, 2025. It is now read-only.

Unfair to skip questions in test dataset #56

Open
xhuang31 opened this issue Jul 11, 2018 · 5 comments
Open

Unfair to skip questions in test dataset #56

xhuang31 opened this issue Jul 11, 2018 · 5 comments

Comments

@xhuang31
Copy link

In augment_process_dataset.py, some questions in the test dataset are skipped. It is unfair to use such results to compare to the state-of-the-art. Nothing should be changed in the test data.

@Impavidity
Copy link
Member

@xhuang31 Thank you so much for your comments! I rechecked the code and found that we indeed skipped some questions for whole dataset, not considering their sources. I recalculated and found that we ignore 54(0.2%) questions in test data. We will change our denominator when calculating the final results.
Thank you again for spotting this !

@fwzlaughing
Copy link

Do you know any methods that could prevent this problem (reserve these dropped questions)? Does it requires a larger KB? I am a beginner of this area and have few related knowledge. Looking forward to your reply, thanks!

@Impavidity
Copy link
Member

@fwzlaughing Hi just have a quick fix here 73b5a42.

@fwzlaughing
Copy link

Why some entities' names can't be found in FB5M.name.txt? Is there a complete dataset that contains all the entities' names of FB2M?

@Impavidity
Copy link
Member

@fwzlaughing FB5M is a subset of freebase. If you are interested in the full freebase, you could download the dump from https://developers.google.com/freebase/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants