You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was running the evaluation script on my predicted SQL queries on Spider dataset, and I've noticed some for some examples, the evaluation script returns an Exact-Match score of 1 instead of 0.
For example:
Pred: select students.first_name from students where students.permanent_address_id != students.permanent_address_id
Gold: select first_name from students where current_address_id != permanent_address_id
In this example, one can notice the in the gold query, the where clause is using the current_address_id column in the left expression while in the predicted query the column is permanent_address_id. This should lead to an EM score of 0 in the where clause, thus leading to a overall EM score of 0, while your script return 1.
Another example:
Pred: select count(*) from flights where flights.destairport = 'terminal'
Gold: select count(*) from flights where sourceairport = "apg"
Here, the problem is the same, but with the columns destairport and sourceairport.
If you change the variable DISABLE_VALUE = True to False in evaluation.py, you should see that the Exact-Match score is 0.
I think this is because enabling allows the evaluation of the actual variables instead of only the syntax.
Hi,
I was running the evaluation script on my predicted SQL queries on Spider dataset, and I've noticed some for some examples, the evaluation script returns an Exact-Match score of 1 instead of 0.
For example:
In this example, one can notice the in the gold query, the
where
clause is using thecurrent_address_id
column in the left expression while in the predicted query the column ispermanent_address_id
. This should lead to an EM score of 0 in thewhere
clause, thus leading to a overall EM score of 0, while your script return 1.Another example:
Here, the problem is the same, but with the columns
destairport
andsourceairport
.I looked into the code, and my guess is that it relates to the foreign key mapping that is performed right at the beginning of the evaluation of each sample. Lines 621-627 in here: https://github.com/taoyds/test-suite-sql-eval/blob/master/evaluation.py#L621
Would love to hear your thoughts on that. @taoyds
Thanks,
Moshe
The text was updated successfully, but these errors were encountered: