sql: allow subqueries in UDFs #87291

mgartner · 2022-09-01T17:15:20Z

We should lift the restriction that disallows subqueries within user-defined function bodies.

Epic: CRDB-20370
Jira issue: CRDB-19257

Previously, statements with subqueries that invoked UDFs with their own subqueries would error. This was due to a limitation in the subquery execution machinery which only supports running subqueries for top-level statements. This commit works around this limitation by planning subqueries within UDFs as lazily evaluated routines, which have their own machinery for evaluation. Subqueries for the top-level statement are still planned as eagerly evaluated subqueries for two reasons: 1. These new routine-based subqueries do not cache their results, if they are uncorrelated, so they are not as performant. 2. Queries with eager subqueries can be distributed while routines cannot be distributed. Future work includes caching the results of these lazily evaluated subqueries to improve performance (addressing (1) above). We should also be able use the same routine-based machinery to evaluate correlated subqueries and eliminate all decorrelation errors. Informs cockroachdb#87291 Epic CRDB-19257 Release note (sql change): User-defined functions with subqueries in the body of the function are now supported.

95883: opt: execute correlated Exists as Routines and and allow Exists in UDFs r=mgartner a=mgartner #### opt: add SubqueryPrivate.Ordering check to CheckExpr Expressions with `SubqueryPrivate` fields are not optimized with respect to any ordering, except for `ArrayFlatten` expressions. This commit adds a check to `CheckExpr` to ensure that `SubqueryPrivate.Ordering` is only set for `ArrayFlatten`. Release note: None #### opt: convert uncorrelated Exists to Coalesce+Subquery expressions This commit adds the `ConvertUncorrelatedExistsToCoalesceSubquery` normalization rule which transforms uncorrelated Exists into Coalesce+Subquery expressions. Below is an example of the transformation made. SELECT EXISTS ( SELECT * FROM b ) FROM a => SELECT COALESCE( (SELECT true FROM (SELECT * FROM b) LIMIT 1), false ) FROM a By eliminating the uncorrelated Exists, we now support execution of uncorrelated Exists that are within correlated subqueries or UDFs. These are supported without execbuilder changes because execbuilder already knows how to build correlated Subquery expressions within correlated subqueries or UDFs. This change also makes the `exists` subquery execution mode, which only works for uncorrelated subqueries, obsolete. The related code paths should never be executed. I plan on removing them in a future commit. Release note: None #### opt: build correlated Exists as Coalesce+Routine expressions This commit allows for the execution of correlated Exists subqueries by building them as Coalesce+Routine expressions in execbuilder. Routines do not have a special mode for Exists subqueries, like the legacy, eager-evaluation subquery machinery does, so we must transform the Exists expression. The transformation is modelled after the ConvertUncorrelatedExistsToCoalesceSubquery normalization rule. The transformation is effectively: EXISTS (<input>) => COALESCE((SELECT true FROM (<input>) LIMIT 1), false) We don't implement this as a normalization rule for correlated Exists subqueries because the transformation would prevent decorrelation rules from turning the Exists expression into a join, if it is possible. Marking the rule as LowPriority would not be sufficient because the rule would operate on the Exists scalar expression, while the decorrelation rules operate on relational expressions that contain Exists expresions. The Exists would always be converted to a Coalesce before the decorrelation rules can match. Informs: #87291 Epic: CRDB-19257 Release note (sql change): Some queries with `EXISTS` subqueries which previously resulted in the error "could not decorrelate subquery" now succeed. 96278: sql: rename cpuNanos to cpuSQLNanos r=maryliag a=maryliag Note to reviewers: only second commit is relevant Fixes #96238 This commit clarifies the name on statement and transaction statistics tables to use `cpuSQLNanos` instead of `cpuNanos`. Release note (sql change): Update `cpuNanos` name to `cpuSQLNanos` on `crdb_internal.statement_statistics` and `system.statement_statistics` Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: maryliag <[email protected]>

To support execution of ANY expressions within UDFs, the optimizer builds them as regular subqueries with a novel transformation. The transformation simulates the peculiar behavior of ANY expressions. For example, consider the expression: i <comp> ANY (<subquery>) The logic for evaluating this comparison is: 1. If the subquery results in zero rows, then the expression evaluates to false, even if i is NULL. 2. Otherwise, if the comparison between i and any value returned by the subquery is true, then the expression evaluates to true. 3. Otherwise, if any values returned by the subquery are NULL, then the expression evaluates to NULL. 4. Otherwise, if i is NULL, then the expression evaluates to NULL. 5. Otherwise, the expression evaluates to false. We use the following transformation to express this logic: i = ANY (SELECT a FROM t) => SELECT count > 0 AND (bool_or OR (null_count > 0 AND NULL)) FROM ( SELECT count(*) AS count, bool_or(cmp) AS bool_or, count(*) FILTER (is_null) AS null_count FROM ( SELECT a = i AS cmp, a IS NULL AS is_null FROM ( SELECT a FROM t ) ) ) By constructing ANY expressions within UDFs as subqueries, we avoid adding complexity to the execution engine, which is not currently capable of evaluating ANY expressions within the context of a UDF. We only perform this transformation when building ANY expressions within a UDF because this transformation may lead to query plans with slow apply-joins. In the future, if we can eliminate the apply-joins we may be able to apply this transformation universally, and remove existing logic for ANY expression evaluation in the execution engine. Fixes cockroachdb#87291 Release note: None

To support execution of ANY expressions within UDFs, the optimizer builds them as regular subqueries with a novel transformation. The transformation simulates the peculiar behavior of ANY expressions. For example, consider the expression: i <comp> ANY (<subquery>) The logic for evaluating this comparison is: 1. If the subquery results in zero rows, then the expression evaluates to false, even if i is NULL. 2. Otherwise, if the comparison between i and any value returned by the subquery is true, then the expression evaluates to true. 3. Otherwise, if any values returned by the subquery are NULL, then the expression evaluates to NULL. 4. Otherwise, if i is NULL, then the expression evaluates to NULL. 5. Otherwise, the expression evaluates to false. We use the following transformation to express this logic: i = ANY (SELECT a FROM t) => SELECT count > 0 AND (bool_or OR (bool_or_null AND NULL)) FROM ( SELECT count(*) AS count, bool_or(cmp) AS bool_or, bool_or(is_null) AS bool_or_null FROM ( SELECT a = i AS cmp, a IS NULL AS is_null FROM ( SELECT a FROM t ) ) ) By constructing ANY expressions within UDFs as subqueries, we avoid adding complexity to the execution engine, which is not currently capable of evaluating ANY expressions within the context of a UDF. We only perform this transformation when building ANY expressions within a UDF because this transformation may lead to query plans with slow apply-joins. In the future, if we can eliminate the apply-joins we may be able to apply this transformation universally, and remove existing logic for ANY expression evaluation in the execution engine. Fixes cockroachdb#87291 Release note: None

To support execution of ANY expressions within UDFs, the optimizer builds them as subqueries with GroupBy expressions instead. The transformation simulates the peculiar behavior of ANY expressions. See `CustomFuncs.ConstructGroupByAny` for more details on this transformation. By constructing ANY expressions within UDFs as subqueries, we avoid adding complexity to the execution engine, which is not currently capable of evaluating ANY expressions within the context of a UDF. Fixes cockroachdb#87291 Release note: None

98375: opt: build ANY expressions as regular subqueries within UDFs r=mgartner a=mgartner #### opt: build ANY expressions as regular subqueries within UDFs To support execution of ANY expressions within UDFs, the optimizer builds them as subqueries with GroupBy expressions instead. The transformation simulates the peculiar behavior of ANY expressions. See `CustomFuncs.ConstructGroupByAny` for more details on this transformation. By constructing ANY expressions within UDFs as subqueries, we avoid adding complexity to the execution engine, which is not currently capable of evaluating ANY expressions within the context of a UDF. Fixes #87291 Release note: None #### opt: fix GenerateIndexScans comment Release note: None 98758: sqlccl: disable default test tenant for explicit tenant tests r=michae2 a=herkolategan The tests modified in this change start tenants explicitly, and fails when the default test tenant is started as it is not expected. This change disables the starting of the default test tenant. Epic: None Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Herko Lategan <[email protected]>

mgartner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Sep 1, 2022

blathers-crl bot added the T-sql-queries SQL Queries Team label Sep 1, 2022

mgartner added the A-sql-routine UDFs and Stored Procedures label Sep 1, 2022

exalate-issue-sync bot removed the A-sql-routine UDFs and Stored Procedures label Sep 1, 2022

mgartner self-assigned this Sep 1, 2022

mgartner mentioned this issue Dec 2, 2022

Add new known limitations for 22.2 / Remove resolved limitations cockroachdb/docs#15715

Merged

mgartner mentioned this issue Jan 9, 2023

sql: support subqueries in UDFs #94962

Merged

This was referenced Jan 19, 2023

udf: unimplemented: SELECT usage inside a function definition (when using EXISTS) #88373

Closed

opt: execute correlated Exists as Routines and and allow Exists in UDFs #95883

Merged

mgartner mentioned this issue Mar 10, 2023

opt: build ANY expressions as regular subqueries within UDFs #98375

Merged

craig bot closed this as completed in 443df88 Mar 16, 2023

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: allow subqueries in UDFs #87291

sql: allow subqueries in UDFs #87291

mgartner commented Sep 1, 2022 •

edited by rickystewart

Loading

sql: allow subqueries in UDFs #87291

sql: allow subqueries in UDFs #87291

Comments

mgartner commented Sep 1, 2022 • edited by rickystewart Loading

mgartner commented Sep 1, 2022 •

edited by rickystewart

Loading