Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce scope to semantic analyzer and update correlation in Apply node #5302

Closed
wants to merge 8 commits into from

Conversation

kokosing
Copy link
Contributor

No description provided.

@ghost ghost added the CLA Signed label May 18, 2016
@kokosing kokosing force-pushed the feature_gko_scopes branch from 5220e36 to 111dd05 Compare May 18, 2016 12:19
@kokosing
Copy link
Contributor Author

@martint To make this PR be easier to review I separated refactorings into separate review (#5303). That way it will be easier to merge it partially.

The most important commit is ab27e14

@kokosing kokosing force-pushed the feature_gko_scopes branch from 111dd05 to 752c4a4 Compare May 19, 2016 10:29
{
// expressions at this point can not have sub queries so deny all access checks
// in the future, we will need a full access controller here to verify access to functions
ExpressionAnalyzer analyzer = create(new Analysis(), session, metadata, sqlParser, new DenyAllAccessControl(), false);
Node root = Iterables.size(expressions) > 0 ? ExpressionUtils.and(expressions) : TRUE_LITERAL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we create a boolean expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When somebody calls Analysis.getScope(node) and such node has not have mapping to scope. Then to find its scope a visitor is used, it finds the nearest parent node of this node which has a scope set and returns it as a node scope. This search is started from the root node.

Here I wanted to create such artificial root for all the expressions. But now, when you have asked I realized that is not needed and what is worse it introduces unnecessary confusion. So I removed that. Instead I create multiple roots, one per each expression.

@kokosing kokosing force-pushed the feature_gko_scopes branch 2 times, most recently from d52ea1f to afda16c Compare May 20, 2016 05:39
@kokosing
Copy link
Contributor Author

@martint Comments applied and rebased.

@kokosing kokosing force-pushed the feature_gko_scopes branch from afda16c to 836af32 Compare May 20, 2016 08:21
private final Optional<Scope> parent;

private RelationType relation;
private Optional<Scope> lateralScope = Optional.empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to call out the lateral scope explicitly? Unless I'm mistaken, when lateral is used, the lateral derived table is behaves almost like a subquery, and the relation on the left of it acts as the "parent" scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion has worked. I used the "parent" scope instead.

@kokosing
Copy link
Contributor Author

@martint All the comments applied and rebased. Does spliting pull requests help you reviewing? If so I can extract query rewites as a separate pull request.

@kokosing kokosing assigned martint and unassigned kokosing May 25, 2016
private final Session session;
private final SqlParser sqlParser;

public Visitor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put args on the same line

@kokosing
Copy link
Contributor Author

kokosing commented Jun 6, 2016

@martint rebased, comments applied, ping :)

// Input fields == Output fields
analysis.setOutputExpressions(node, descriptorToFields(queryBodyScope));

queryScope = Scope.builder().withParent(withScope).withRelationType(queryBodyScope.getRelationType()).withApproximate(approximate).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put each .withXXX in a separate line for readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@kokosing
Copy link
Contributor Author

kokosing commented Jul 6, 2016

I extracted StatementAnalyzer refactoring to #5585. So it should be easier to review.

@kokosing
Copy link
Contributor Author

kokosing commented Jul 8, 2016

@martint Rebased and applied all the comments. Also, please notice that I extracted some refactoring commits as a separate review.

@kokosing
Copy link
Contributor Author

kokosing commented Jul 9, 2016

Travis errored with same failures as for master branch.

@ghost ghost added the CLA Signed label Jul 12, 2016
@martint
Copy link
Contributor

martint commented Jul 13, 2016

@kokosing, I think there are some comments/questions that still need to be addressed. Can you check?

Also, this doesn't rebase cleanly after I merged the EXISTS and statement rewriter PRs.

@ghost ghost added the CLA Signed label Jul 14, 2016
@kokosing kokosing force-pushed the feature_gko_scopes branch from 320fc53 to d0a5482 Compare July 15, 2016 13:04
kokosing added 8 commits July 15, 2016 15:04
Inner and outer `with` queries had the same query body. So it was
possible that test would pass while having wrong query executed (false
positive test result).
Scope is an object which contains information about entities available
for current SQL clauses. It contains current RelationType as well as
information about outer queries RelationTypes and available named queries.

Semantic analyzer was changed in a way that each visit method gets and
returns Scope objects. The input scope gives information about the context of
given node. Contains information about outer relation types and
accessible RelationType.
Returned Scope contains information about produced entities
(RelationType).

Scopes are stored in Analysis per AST nodes.
Each to find a Scope for a AST node (expression) we find the closest
parent which has Scope set in Analysis.
All the column references of subquery plan which are not resolved (do
not come from subquery relations) are collected. SubqueryPlanner check
if it is able to rewrite such references to symbol produced by its
relation, if it is able then add replace column reference with that
symbol and that this symbol to the correlation list of Apply node. If
SubqueryPlanner is not able to rewrite that column reference, then it is
assumed the outer SubqueryPlanner execution will do that. If there is no
outer SubqueryPlanner then SemanticException should be thrown.
When symbols produced by subquery in ApplyNode are not consumed by any
of the outer nodes then such subquery and hence ApplyNode can be
optimized out. In that case ApplyNode is replaced by its input.
I just removed tests whose the main purpose was to check if logical plan
assertions are working. Instead I left test whose the main purpose is to
check if logical planner return correct plan.

Logical plan assertions are tested anyway, I do think we need special
tests for that.
In order to that some modification to logical plan assertions had to be
applied:
 - Adding CorrelationMatcher
 - refactoring SymbolAliases to ExpressionAliases
 - extending ExpressionVerifier to support more expressions
From:
```
anyTree PlanMatchPattern {
    PlanMatchPattern {
        PlanNodeMatcher{nodeClass=class com.facebook.presto.sql.planner.plan.FilterNode}
        FilterMatcher{predicate=(3 IN ("c"))}
        PlanMatchPattern {
            PlanNodeMatcher{nodeClass=class com.facebook.presto.sql.planner.plan.ApplyNode}
            CorrelationMatcher{correlation=[C, O]}
            PlanMatchPattern {
                PlanNodeMatcher{nodeClass=class com.facebook.presto.sql.planner.plan.ProjectNode}
                PlanMatchPattern {
                    PlanNodeMatcher{nodeClass=class com.facebook.presto.sql.planner.plan.TableScanNode}
                    TableScanMatcher{expectedTableName=orders}
                    SymbolMatcher{alias=O, pattern=orderkey}
                    SymbolMatcher{alias=C, pattern=custkey}
                }
            }
            anyTree PlanMatchPattern {
                PlanMatchPattern {
                    PlanNodeMatcher{nodeClass=class com.facebook.presto.sql.planner.plan.ApplyNode}
                    CorrelationMatcher{correlation=[L]}
                    PlanMatchPattern {
                        PlanNodeMatcher{nodeClass=class com.facebook.presto.sql.planner.plan.TableScanNode}
                        TableScanMatcher{expectedTableName=lineitem}
                        SymbolMatcher{alias=L, pattern=orderkey}
                    }
                    PlanMatchPattern {
                        PlanNodeMatcher{nodeClass=class com.facebook.presto.sql.planner.plan.EnforceSingleRowNode}
                        PlanMatchPattern {
                            PlanNodeMatcher{nodeClass=class com.facebook.presto.sql.planner.plan.ProjectNode}
                            PlanMatchPattern {
                                PlanNodeMatcher{nodeClass=class com.facebook.presto.sql.planner.plan.ValuesNode}
                            }
                        }
                    }
                }
            }
        }
    }
}
```

to:
```
anyTree
    node(FilterNode)
        FilterMatcher{predicate=(3 IN ("c"))}
        node(ApplyNode)
            CorrelationMatcher{correlation=[C, O]}
            node(ProjectNode)
                node(TableScanNode)
                    TableScanMatcher{expectedTableName=orders}
                    SymbolMatcher{alias=O, pattern=orderkey}
                    SymbolMatcher{alias=C, pattern=custkey}
            anyTree
                node(ApplyNode)
                    CorrelationMatcher{correlation=[L]}
                    node(TableScanNode)
                        TableScanMatcher{expectedTableName=lineitem}
                        SymbolMatcher{alias=L, pattern=orderkey}
                    node(EnforceSingleRowNode)
                        node(ProjectNode)
                            node(ValuesNode)
```
@kokosing kokosing force-pushed the feature_gko_scopes branch from d0a5482 to f7b02b0 Compare July 15, 2016 13:12
@ghost ghost added the CLA Signed label Jul 15, 2016
@kokosing
Copy link
Contributor Author

@martint Sorry about missing comments. I had to do something wrong with the git and I lost some changes.

Please notice that I restored the lateralScope in the Scope class. I had no better idea how to handle JOIN UNNEST. Moreover I think it will be needed for LATERAL JOIN. If you have a better idea then I will be glad to change that.

This pull request still need some work. I need to add unit tests for correlated EXISTS subqueries, because queries for correlated EXISTS we had so far are passing as it was possible to prune correlation and in some cases whole subquery was eligible to be pruned.

Anyway, it would be good to merge some of the commits as this pull request is getting bigger and bigger. I have heard that you are going on vacation so I don't if it is possible in near future.

@ghost ghost added the CLA Signed label Jul 15, 2016
@kokosing
Copy link
Contributor Author

I split this pull request into two smaller ones.

Superseded by #5686 and #5687

@kokosing kokosing closed this Jul 18, 2016
@kokosing kokosing deleted the feature_gko_scopes branch July 18, 2016 08:30
@ghost ghost added the CLA Signed label Jul 18, 2016
@martint
Copy link
Contributor

martint commented Aug 9, 2016

Please notice that I restored the lateralScope in the Scope class. I had no better idea how to handle JOIN UNNEST. Moreover I think it will be needed for LATERAL JOIN. If you have a better idea then I will be glad to change that.

Why can't we model it as:

  • A JOIN B: A and B have the outer scope as parents
  • A JOIN LATERAL B: A has outer scope as parent, B has A's scope as parent
  • A, UNNEST(...): A has outer scope as parent, UNNEST has A's scope as parent

Semantically, UNNEST implies "CROSS JOIN LATERAL", so I don't see why we'd treat it any different from common LATERAL.

@kokosing
Copy link
Contributor Author

kokosing commented Aug 9, 2016

According to my understanding, what you have proposed is the semantic of that SQL constructs. It it is the only way it can be modeled to make this construct work correctly. Moreover, in this PR it scopes are implemented that way, but maybe it is not obvious because of the usage of lateralScope.

I will try to remove lateralScope by checking explicitly in StatementAnalzyer#visitJoin the relation type of the right side of the join.

@kokosing
Copy link
Contributor Author

kokosing commented Aug 9, 2016

It worked, please see #5686 and its recent commit: f7764d2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants