Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with tables aliases #19

Open
klima7 opened this issue Oct 17, 2023 · 2 comments
Open

Bug with tables aliases #19

klima7 opened this issue Oct 17, 2023 · 2 comments

Comments

@klima7
Copy link

klima7 commented Oct 17, 2023

Currently creating mapping between aliases and actual tables names is performed globally for whole SQL, what is not correct for more complicated once.

Lets consider following SQL taken directly from SPIDER (there are more queries like this):

SELECT T1.asset_id ,  T1.asset_details FROM Assets AS T1 JOIN Asset_Parts AS T2 ON T1.asset_id  =  T2.asset_id GROUP BY T1.asset_id HAVING count(*)  =  2 
INTERSECT 
SELECT T1.asset_id ,  T1.asset_details FROM Assets AS T1 JOIN Fault_Log AS T2 ON T1.asset_id  =  T2.asset_id GROUP BY T1.asset_id HAVING count(*)  <  2

First subquery maps T2 to Asset_Parts, but second subquery maps T2 to Fault_Log. Current evaluation script is not able to handle this. It may allow false positives to pass and throw exceptions because script is searching for column in wrong table.

Relevant code fragment:

def scan_alias(toks):
"""Scan the index of 'as' and build the map for all alias"""
as_idxs = [idx for idx, tok in enumerate(toks) if tok == 'as']
alias = {}
for idx in as_idxs:
alias[toks[idx+1]] = toks[idx-1]
return alias

@Locietta
Copy link

Locietta commented Jan 12, 2024

Hi, I just write a table alias expander with sqlglot to eliminate all table aliases in SQL before feeding them into this test suite for EM.

import sqlglot
from sqlglot import exp

def expand_alias(expr: exp.Expression) -> exp.Expression:
    for iue in expr.find_all(exp.Union): # match all INTERSECT, UNION, EXCEPT
        assert isinstance(iue.left, exp.Expression) and isinstance(iue.right, exp.Expression)
        iue.set("left", expand_alias(iue.left))
        iue.set("right", expand_alias(iue.right))

    alias_to_table: dict[str, str] = {}

    for table in expr.find_all(exp.Table):
        if not table.alias:
            continue
        alias = table.alias.lower()
        tablename = table.name.lower()
        alias_to_table[alias] = tablename
        table.set("this", tablename)
        table.set("alias", None)

    for column in expr.find_all(exp.Column):
        if column.table:
            column_tablename = column.table.lower()
            if column_tablename in alias_to_table:
                actual_table = alias_to_table[column_tablename]
                column.set("table", actual_table)

    return expr

sql = """
select T1.aaa from table1 as T1 join table2 as T2 on T1.aaa = T2.aaa
intersect
select T1.aaa from table1 as T1 join table3 as T2 on T1.aaa = T2.aaa
"""
expr = sqlglot.parse_one(sql)
refined_sql = expand_alias(expr).sql()
print(refined_sql)
# OUTPUT:
# SELECT table1.aaa FROM table1 JOIN table2 ON table1.aaa = table2.aaa
# INTERSECT
# SELECT table1.aaa FROM table1 JOIN table3 ON table1.aaa = table3.aaa

NOTES:

  • sqlglot prefer <> over !=, but this repo prefer !=
    you would like p_str_new = expand_alias(sqlglot.parse_one(p_str)).sql().replace(" <> ", " != ")
  • sqlglot prefer WHERE NOT aaa IN bbb, but this repo only supports WHERE aaa NOT IN bbb
    To add the support, check 'not' before call parse_val_unit here:
    while idx < len_:
    idx, val_unit = parse_val_unit(toks, idx, tables_with_alias, schema, default_tables)

It also supports table aliases without AS keywords, so it's helpful for #17 as well.

Maybe someone should refactor this repo with sqlglot for better readability and extensibility...

@klima7
Copy link
Author

klima7 commented Jan 15, 2024

Thanks for the nice piece of code! It seems to be the easiest solution.
I'll keep this issue open to show the presence of this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants