Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with evaluating SQLs containing "value" identifiers #21

Open
BugMaker-Boyan opened this issue Jan 15, 2024 · 1 comment
Open

Bug with evaluating SQLs containing "value" identifiers #21

BugMaker-Boyan opened this issue Jan 15, 2024 · 1 comment

Comments

@BugMaker-Boyan
Copy link

I download spider dataset from Spider Leaderboard.

When I run the command:
python evaluation.py --db spider/test_database --etype exec --gold spider/test_data/dev_gold.sql --pred spider/test_data/dev_gold.sql

Then returns:
easy medium hard extra all
count 470 857 463 357 2147
===================== EXECUTION ACCURACY =====================
execution 0.996 0.977 0.983 0.994 0.985

However, in above command, the gold file and pred file are the same.

I find the problem in evaluation.py, code link (line# 569):
https://github.com/taoyds/test-suite-sql-eval/blob/e97acc546ecbee8fa27fa8dbf025ef61493a876c/evaluation.py#L569C13-L569C48

When predicted sql contains "value" identifiers (e.g. a column called "value"), p_str = p_str.replace("value", "1") replace string "value" in predicted sql incorrectly.

There are such sqls in Spider-test dataset:

SELECT CONTENTS FROM boxes WHERE Value > 150 warehouse_1
SELECT CONTENTS FROM boxes WHERE Value > 150 warehouse_1
SELECT warehouse , avg(value) FROM boxes GROUP BY warehouse warehouse_1
SELECT warehouse , avg(value) FROM boxes GROUP BY warehouse warehouse_1
SELECT avg(value) , sum(value) FROM boxes warehouse_1
SELECT avg(value) , sum(value) FROM boxes warehouse_1
SELECT avg(capacity) , sum(capacity) FROM warehouses warehouse_1
SELECT avg(capacity) , sum(capacity) FROM warehouses warehouse_1
SELECT avg(value) , max(value) , CONTENTS FROM boxes GROUP BY CONTENTS warehouse_1
SELECT avg(value) , max(value) , CONTENTS FROM boxes GROUP BY CONTENTS warehouse_1
SELECT CONTENTS FROM boxes ORDER BY value DESC LIMIT 1 warehouse_1
SELECT CONTENTS FROM boxes ORDER BY value DESC LIMIT 1 warehouse_1
SELECT avg(value) FROM boxes warehouse_1
SELECT avg(value) FROM boxes warehouse_1
......

When I delete p_str = p_str.replace("value", "1"), the evaluation returns 100% execution score as expected.

@BugMaker-Boyan
Copy link
Author

@taoyds @rizar @ruiqi-zhong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant