Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a safe alternative to LogicalPlan::using_columns() #7

Open
edmondop opened this issue Jan 21, 2025 · 1 comment
Open

Find a safe alternative to LogicalPlan::using_columns() #7

edmondop opened this issue Jan 21, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@edmondop
Copy link
Owner

Describe the bug

Describe the bug
Background
Select with wildcard over a USING/NATURAL JOIN should deduplicate join columns.
For example:

with t as(select 1 as a, 2 as b) select * from t t1 join t t2 using(a)
This query above should output the column 'a' only once.

LogicalPlan::using_columns() is used to find these join columns and to help exclude duplicated columns when expanding wildcards.

Problem
using_columns() works by traversing the plan tree. This manner might be unsafe as it could incorrectly find columns that are not relevant to the current SQL context. This may lead to some output columns being incorrectly excluded.

For example, the result of the query below is different from other databases.

create table t(a int);
insert into t values(1),(2),(3);
select * from (select t.a+2 as a from t join t t2 using(a)) as t2;
To Reproduce
Run query in CLI (compiled from the latest main: apache@722307f)

create table t(a int);
insert into t values(1),(2),(3);
select * from (select t.a+2 as a from t join t t2 using(a)) as t2;
0 row(s) fetched.
Elapsed 0.008 seconds.

+-------+
| count |
+-------+
| 3 |
+-------+
1 row(s) fetched.
Elapsed 0.015 seconds.

++
++
++
3 row(s) fetched.
Elapsed 0.013 seconds.
It outputs no columns.

Expected behavior
In PostgreSQL it does output one column.

psql (16.6 (Ubuntu 16.6-0ubuntu0.24.04.1))
Type "help" for help.

psql=> create table t(a int);
insert into t values(1),(2),(3);
select * from (select t.a+2 as a from t join t t2 using(a)) as t2;
CREATE TABLE
INSERT 0 3
a

3
4
5
(3 rows)
Additional context
No response

To Reproduce

No response

Expected behavior

No response

Additional context

No response

@edmondop edmondop added the bug Something isn't working label Jan 21, 2025
Copy link

dosubot bot commented Jan 21, 2025

To address the issue with LogicalPlan::using_columns() incorrectly identifying irrelevant columns, you might consider the following approaches:

  1. Refine Column Identification Logic: Modify the logic within using_columns() to ensure it only considers columns relevant to the current SQL context. This could involve more precise checks on the context in which columns are used, possibly by maintaining additional metadata about the query structure.

  2. Context-Aware Traversal: Implement a traversal method that is aware of the context of each node in the logical plan. This could help in distinguishing between columns that are genuinely part of the join condition and those that are not.

  3. Comparison with Other Databases: Analyze how databases like PostgreSQL handle similar queries. This might provide insights into alternative methods for deduplicating columns in USING or NATURAL JOIN scenarios.

  4. Testing and Validation: Develop comprehensive test cases that cover various join scenarios, including edge cases. This will help ensure that any changes made to the using_columns() function produce the expected results across different query types.

  5. Alternative Data Structures: Consider using alternative data structures that might better capture the relationships between columns in joins, potentially reducing the risk of incorrectly excluding relevant columns.

These approaches aim to make the identification of join columns more robust and aligned with expected SQL behavior, particularly in comparison to other databases like PostgreSQL.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant