Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation returns 1 when it should return 0 #7

Open
Hazoom opened this issue May 24, 2021 · 1 comment
Open

Evaluation returns 1 when it should return 0 #7

Hazoom opened this issue May 24, 2021 · 1 comment

Comments

@Hazoom
Copy link

Hazoom commented May 24, 2021

Hi,

I was running the evaluation script on my predicted SQL queries on Spider dataset, and I've noticed some for some examples, the evaluation script returns an Exact-Match score of 1 instead of 0.

For example:

Pred: select students.first_name from students where students.permanent_address_id != students.permanent_address_id
Gold: select first_name from students where current_address_id != permanent_address_id

In this example, one can notice the in the gold query, the where clause is using the current_address_id column in the left expression while in the predicted query the column is permanent_address_id. This should lead to an EM score of 0 in the where clause, thus leading to a overall EM score of 0, while your script return 1.

Another example:

Pred: select count(*) from flights where flights.destairport = 'terminal'
Gold: select count(*) from flights where sourceairport = "apg"

Here, the problem is the same, but with the columns destairport and sourceairport.

I looked into the code, and my guess is that it relates to the foreign key mapping that is performed right at the beginning of the evaluation of each sample. Lines 621-627 in here: https://github.com/taoyds/test-suite-sql-eval/blob/master/evaluation.py#L621

Would love to hear your thoughts on that. @taoyds

Thanks,
Moshe

@ReinierKoops
Copy link

ReinierKoops commented Jun 18, 2021

If you change the variable
DISABLE_VALUE = True to False in evaluation.py, you should see that the Exact-Match score is 0.
I think this is because enabling allows the evaluation of the actual variables instead of only the syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants