-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluation indicators need to be updated #119
Comments
thanks for your suggestion, but I still don't understand what's the detail about the " new database_ts " , we are followed the same link as you provide. And the dev dataset which include 1034 examples is the linked github gived . If possible , may I get your wechat or other contact way ? And may be we could discuss face to face on weekends . |
|
If you have better eval code , Could you contribute a pr for us ,thanks a lot ~ |
According to what you pointed out, the database in evaluation method is updated,which changed from the database downloaded from the original official yale website(spider dataset 95Mb) to the database pointed to downloaded from the author's github linked to(1.27Gb) , and the index has dropped . The indicators after re-forecasting and evaluating based on the weights we uploaded to HF exe acc is 0.742 as follows, which are slightly different from yours. Thanks again for your reminder. |
@wangzaistone Thanks for your nice work, could you add these two different dataset links (spider dataset 95mb and anther 1.27gb), i can't find the 1.27gb version, thanks. 0.825 (95mb) and 0.764(1.27gb) are test on the spider dev dataset or spider test dataset? |
test on dev. ts eval dataset is here:https://github.com/taoyds/test-suite-sql-eval @AlphaNext |
Hi,
Thanks for this good project! However, the evaluation procedure is incorrect leading to an overestimated result. Specifically, your project uses the test-suit evaluation over the database which is used in original execution accuracy. According to the official evaluation project, you should use the new database_ts instead of the database. Therefore, the results will be lower! Here are my evaluation results of CodeLLama-13B-instruct-lora (the parameter config is same with your provided config) on the original database (78.1) and the correct database_ts (70.9).
The text was updated successfully, but these errors were encountered: