-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: allow subqueries in UDFs #87291
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
Comments
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Jan 9, 2023
Previously, statements with subqueries that invoked UDFs with their own subqueries would error. This was due to a limitation in the subquery execution machinery which only supports running subqueries for top-level statements. This commit works around this limitation by planning subqueries within UDFs as lazily evaluated routines, which have their own machinery for evaluation. Subqueries for the top-level statement are still planned as eagerly evaluated subqueries for two reasons: 1. These new routine-based subqueries do not cache their results, if they are uncorrelated, so they are not as performant. 2. Queries with eager subqueries can be distributed while routines cannot be distributed. Future work includes caching the results of these lazily evaluated subqueries to improve performance (addressing (1) above). We should also be able use the same routine-based machinery to evaluate correlated subqueries and eliminate all decorrelation errors. Informs cockroachdb#87291 Epic CRDB-19257 Release note (sql change): User-defined functions with subqueries in the body of the function are now supported.
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Jan 10, 2023
Previously, statements with subqueries that invoked UDFs with their own subqueries would error. This was due to a limitation in the subquery execution machinery which only supports running subqueries for top-level statements. This commit works around this limitation by planning subqueries within UDFs as lazily evaluated routines, which have their own machinery for evaluation. Subqueries for the top-level statement are still planned as eagerly evaluated subqueries for two reasons: 1. These new routine-based subqueries do not cache their results, if they are uncorrelated, so they are not as performant. 2. Queries with eager subqueries can be distributed while routines cannot be distributed. Future work includes caching the results of these lazily evaluated subqueries to improve performance (addressing (1) above). We should also be able use the same routine-based machinery to evaluate correlated subqueries and eliminate all decorrelation errors. Informs cockroachdb#87291 Epic CRDB-19257 Release note (sql change): User-defined functions with subqueries in the body of the function are now supported.
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Jan 10, 2023
Previously, statements with subqueries that invoked UDFs with their own subqueries would error. This was due to a limitation in the subquery execution machinery which only supports running subqueries for top-level statements. This commit works around this limitation by planning subqueries within UDFs as lazily evaluated routines, which have their own machinery for evaluation. Subqueries for the top-level statement are still planned as eagerly evaluated subqueries for two reasons: 1. These new routine-based subqueries do not cache their results, if they are uncorrelated, so they are not as performant. 2. Queries with eager subqueries can be distributed while routines cannot be distributed. Future work includes caching the results of these lazily evaluated subqueries to improve performance (addressing (1) above). We should also be able use the same routine-based machinery to evaluate correlated subqueries and eliminate all decorrelation errors. Informs cockroachdb#87291 Epic CRDB-19257 Release note (sql change): User-defined functions with subqueries in the body of the function are now supported.
This was referenced Jan 19, 2023
craig bot
pushed a commit
that referenced
this issue
Feb 1, 2023
95883: opt: execute correlated Exists as Routines and and allow Exists in UDFs r=mgartner a=mgartner #### opt: add SubqueryPrivate.Ordering check to CheckExpr Expressions with `SubqueryPrivate` fields are not optimized with respect to any ordering, except for `ArrayFlatten` expressions. This commit adds a check to `CheckExpr` to ensure that `SubqueryPrivate.Ordering` is only set for `ArrayFlatten`. Release note: None #### opt: convert uncorrelated Exists to Coalesce+Subquery expressions This commit adds the `ConvertUncorrelatedExistsToCoalesceSubquery` normalization rule which transforms uncorrelated Exists into Coalesce+Subquery expressions. Below is an example of the transformation made. SELECT EXISTS ( SELECT * FROM b ) FROM a => SELECT COALESCE( (SELECT true FROM (SELECT * FROM b) LIMIT 1), false ) FROM a By eliminating the uncorrelated Exists, we now support execution of uncorrelated Exists that are within correlated subqueries or UDFs. These are supported without execbuilder changes because execbuilder already knows how to build correlated Subquery expressions within correlated subqueries or UDFs. This change also makes the `exists` subquery execution mode, which only works for uncorrelated subqueries, obsolete. The related code paths should never be executed. I plan on removing them in a future commit. Release note: None #### opt: build correlated Exists as Coalesce+Routine expressions This commit allows for the execution of correlated Exists subqueries by building them as Coalesce+Routine expressions in execbuilder. Routines do not have a special mode for Exists subqueries, like the legacy, eager-evaluation subquery machinery does, so we must transform the Exists expression. The transformation is modelled after the ConvertUncorrelatedExistsToCoalesceSubquery normalization rule. The transformation is effectively: EXISTS (<input>) => COALESCE((SELECT true FROM (<input>) LIMIT 1), false) We don't implement this as a normalization rule for correlated Exists subqueries because the transformation would prevent decorrelation rules from turning the Exists expression into a join, if it is possible. Marking the rule as LowPriority would not be sufficient because the rule would operate on the Exists scalar expression, while the decorrelation rules operate on relational expressions that contain Exists expresions. The Exists would always be converted to a Coalesce before the decorrelation rules can match. Informs: #87291 Epic: CRDB-19257 Release note (sql change): Some queries with `EXISTS` subqueries which previously resulted in the error "could not decorrelate subquery" now succeed. 96278: sql: rename cpuNanos to cpuSQLNanos r=maryliag a=maryliag Note to reviewers: only second commit is relevant Fixes #96238 This commit clarifies the name on statement and transaction statistics tables to use `cpuSQLNanos` instead of `cpuNanos`. Release note (sql change): Update `cpuNanos` name to `cpuSQLNanos` on `crdb_internal.statement_statistics` and `system.statement_statistics` Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: maryliag <[email protected]>
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Mar 10, 2023
To support execution of ANY expressions within UDFs, the optimizer builds them as regular subqueries with a novel transformation. The transformation simulates the peculiar behavior of ANY expressions. For example, consider the expression: i <comp> ANY (<subquery>) The logic for evaluating this comparison is: 1. If the subquery results in zero rows, then the expression evaluates to false, even if i is NULL. 2. Otherwise, if the comparison between i and any value returned by the subquery is true, then the expression evaluates to true. 3. Otherwise, if any values returned by the subquery are NULL, then the expression evaluates to NULL. 4. Otherwise, if i is NULL, then the expression evaluates to NULL. 5. Otherwise, the expression evaluates to false. We use the following transformation to express this logic: i = ANY (SELECT a FROM t) => SELECT count > 0 AND (bool_or OR (null_count > 0 AND NULL)) FROM ( SELECT count(*) AS count, bool_or(cmp) AS bool_or, count(*) FILTER (is_null) AS null_count FROM ( SELECT a = i AS cmp, a IS NULL AS is_null FROM ( SELECT a FROM t ) ) ) By constructing ANY expressions within UDFs as subqueries, we avoid adding complexity to the execution engine, which is not currently capable of evaluating ANY expressions within the context of a UDF. We only perform this transformation when building ANY expressions within a UDF because this transformation may lead to query plans with slow apply-joins. In the future, if we can eliminate the apply-joins we may be able to apply this transformation universally, and remove existing logic for ANY expression evaluation in the execution engine. Fixes cockroachdb#87291 Release note: None
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Mar 13, 2023
To support execution of ANY expressions within UDFs, the optimizer builds them as regular subqueries with a novel transformation. The transformation simulates the peculiar behavior of ANY expressions. For example, consider the expression: i <comp> ANY (<subquery>) The logic for evaluating this comparison is: 1. If the subquery results in zero rows, then the expression evaluates to false, even if i is NULL. 2. Otherwise, if the comparison between i and any value returned by the subquery is true, then the expression evaluates to true. 3. Otherwise, if any values returned by the subquery are NULL, then the expression evaluates to NULL. 4. Otherwise, if i is NULL, then the expression evaluates to NULL. 5. Otherwise, the expression evaluates to false. We use the following transformation to express this logic: i = ANY (SELECT a FROM t) => SELECT count > 0 AND (bool_or OR (bool_or_null AND NULL)) FROM ( SELECT count(*) AS count, bool_or(cmp) AS bool_or, bool_or(is_null) AS bool_or_null FROM ( SELECT a = i AS cmp, a IS NULL AS is_null FROM ( SELECT a FROM t ) ) ) By constructing ANY expressions within UDFs as subqueries, we avoid adding complexity to the execution engine, which is not currently capable of evaluating ANY expressions within the context of a UDF. We only perform this transformation when building ANY expressions within a UDF because this transformation may lead to query plans with slow apply-joins. In the future, if we can eliminate the apply-joins we may be able to apply this transformation universally, and remove existing logic for ANY expression evaluation in the execution engine. Fixes cockroachdb#87291 Release note: None
mgartner
added a commit
to mgartner/cockroach
that referenced
this issue
Mar 15, 2023
To support execution of ANY expressions within UDFs, the optimizer builds them as subqueries with GroupBy expressions instead. The transformation simulates the peculiar behavior of ANY expressions. See `CustomFuncs.ConstructGroupByAny` for more details on this transformation. By constructing ANY expressions within UDFs as subqueries, we avoid adding complexity to the execution engine, which is not currently capable of evaluating ANY expressions within the context of a UDF. Fixes cockroachdb#87291 Release note: None
craig bot
pushed a commit
that referenced
this issue
Mar 16, 2023
98375: opt: build ANY expressions as regular subqueries within UDFs r=mgartner a=mgartner #### opt: build ANY expressions as regular subqueries within UDFs To support execution of ANY expressions within UDFs, the optimizer builds them as subqueries with GroupBy expressions instead. The transformation simulates the peculiar behavior of ANY expressions. See `CustomFuncs.ConstructGroupByAny` for more details on this transformation. By constructing ANY expressions within UDFs as subqueries, we avoid adding complexity to the execution engine, which is not currently capable of evaluating ANY expressions within the context of a UDF. Fixes #87291 Release note: None #### opt: fix GenerateIndexScans comment Release note: None 98758: sqlccl: disable default test tenant for explicit tenant tests r=michae2 a=herkolategan The tests modified in this change start tenants explicitly, and fails when the default test tenant is started as it is not expected. This change disables the starting of the default test tenant. Epic: None Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Herko Lategan <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
We should lift the restriction that disallows subqueries within user-defined function bodies.
Epic: CRDB-20370
Jira issue: CRDB-19257
The text was updated successfully, but these errors were encountered: