Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: allow subqueries in UDFs #87291

Closed
mgartner opened this issue Sep 1, 2022 · 0 comments · Fixed by #98375
Closed

sql: allow subqueries in UDFs #87291

mgartner opened this issue Sep 1, 2022 · 0 comments · Fixed by #98375
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Sep 1, 2022

We should lift the restriction that disallows subqueries within user-defined function bodies.

Epic: CRDB-20370
Jira issue: CRDB-19257

@mgartner mgartner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Sep 1, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Sep 1, 2022
@mgartner mgartner added the A-sql-routine UDFs and Stored Procedures label Sep 1, 2022
@exalate-issue-sync exalate-issue-sync bot removed the A-sql-routine UDFs and Stored Procedures label Sep 1, 2022
@mgartner mgartner self-assigned this Sep 1, 2022
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 9, 2023
Previously, statements with subqueries that invoked UDFs with their own
subqueries would error. This was due to a limitation in the subquery
execution machinery which only supports running subqueries for top-level
statements.

This commit works around this limitation by planning subqueries within
UDFs as lazily evaluated routines, which have their own machinery for
evaluation. Subqueries for the top-level statement are still planned as
eagerly evaluated subqueries for two reasons:

  1. These new routine-based subqueries do not cache their results, if
     they are uncorrelated, so they are not as performant.
  2. Queries with eager subqueries can be distributed while routines
     cannot be distributed.

Future work includes caching the results of these lazily evaluated
subqueries to improve performance (addressing (1) above). We should also
be able use the same routine-based machinery to evaluate correlated
subqueries and eliminate all decorrelation errors.

Informs cockroachdb#87291

Epic CRDB-19257

Release note (sql change): User-defined functions with subqueries in the
body of the function are now supported.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 10, 2023
Previously, statements with subqueries that invoked UDFs with their own
subqueries would error. This was due to a limitation in the subquery
execution machinery which only supports running subqueries for top-level
statements.

This commit works around this limitation by planning subqueries within
UDFs as lazily evaluated routines, which have their own machinery for
evaluation. Subqueries for the top-level statement are still planned as
eagerly evaluated subqueries for two reasons:

  1. These new routine-based subqueries do not cache their results, if
     they are uncorrelated, so they are not as performant.
  2. Queries with eager subqueries can be distributed while routines
     cannot be distributed.

Future work includes caching the results of these lazily evaluated
subqueries to improve performance (addressing (1) above). We should also
be able use the same routine-based machinery to evaluate correlated
subqueries and eliminate all decorrelation errors.

Informs cockroachdb#87291

Epic CRDB-19257

Release note (sql change): User-defined functions with subqueries in the
body of the function are now supported.
mgartner added a commit to mgartner/cockroach that referenced this issue Jan 10, 2023
Previously, statements with subqueries that invoked UDFs with their own
subqueries would error. This was due to a limitation in the subquery
execution machinery which only supports running subqueries for top-level
statements.

This commit works around this limitation by planning subqueries within
UDFs as lazily evaluated routines, which have their own machinery for
evaluation. Subqueries for the top-level statement are still planned as
eagerly evaluated subqueries for two reasons:

  1. These new routine-based subqueries do not cache their results, if
     they are uncorrelated, so they are not as performant.
  2. Queries with eager subqueries can be distributed while routines
     cannot be distributed.

Future work includes caching the results of these lazily evaluated
subqueries to improve performance (addressing (1) above). We should also
be able use the same routine-based machinery to evaluate correlated
subqueries and eliminate all decorrelation errors.

Informs cockroachdb#87291

Epic CRDB-19257

Release note (sql change): User-defined functions with subqueries in the
body of the function are now supported.
craig bot pushed a commit that referenced this issue Feb 1, 2023
95883: opt: execute correlated Exists as Routines and and allow Exists in UDFs r=mgartner a=mgartner

#### opt: add SubqueryPrivate.Ordering check to CheckExpr

Expressions with `SubqueryPrivate` fields are not optimized with respect
to any ordering, except for `ArrayFlatten` expressions. This commit adds
a check to `CheckExpr` to ensure that `SubqueryPrivate.Ordering` is only
set for `ArrayFlatten`.

Release note: None

#### opt: convert uncorrelated Exists to Coalesce+Subquery expressions

This commit adds the `ConvertUncorrelatedExistsToCoalesceSubquery`
normalization rule which transforms uncorrelated Exists into
Coalesce+Subquery expressions. Below is an example of the transformation
made.

    SELECT EXISTS (
      SELECT * FROM b
    ) FROM a
    =>
    SELECT COALESCE(
      (SELECT true FROM (SELECT * FROM b) LIMIT 1),
      false
    ) FROM a

By eliminating the uncorrelated Exists, we now support execution of
uncorrelated Exists that are within correlated subqueries or UDFs. These
are supported without execbuilder changes because execbuilder already
knows how to build correlated Subquery expressions within correlated
subqueries or UDFs.

This change also makes the `exists` subquery execution mode, which only
works for uncorrelated subqueries, obsolete. The related code paths
should never be executed. I plan on removing them in a future commit.

Release note: None

#### opt: build correlated Exists as Coalesce+Routine expressions

This commit allows for the execution of correlated Exists subqueries by
building them as Coalesce+Routine expressions in execbuilder.

Routines do not have a special mode for Exists subqueries, like the
legacy, eager-evaluation subquery machinery does, so we must transform
the Exists expression. The transformation is modelled after the
ConvertUncorrelatedExistsToCoalesceSubquery normalization rule. The
transformation is effectively:

    EXISTS (<input>)
    =>
    COALESCE((SELECT true FROM (<input>) LIMIT 1), false)

We don't implement this as a normalization rule for correlated Exists
subqueries because the transformation would prevent decorrelation rules
from turning the Exists expression into a join, if it is possible.
Marking the rule as LowPriority would not be sufficient because the rule
would operate on the Exists scalar expression, while the decorrelation
rules operate on relational expressions that contain Exists expresions.
The Exists would always be converted to a Coalesce before the
decorrelation rules can match.

Informs: #87291

Epic: CRDB-19257

Release note (sql change): Some queries with `EXISTS` subqueries which
previously resulted in the error "could not decorrelate subquery" now
succeed.


96278: sql: rename cpuNanos to cpuSQLNanos r=maryliag a=maryliag

Note to reviewers: only second commit is relevant
Fixes #96238

This commit clarifies the name on statement and
transaction statistics tables to use `cpuSQLNanos`
instead of `cpuNanos`.

Release note (sql change): Update `cpuNanos` name to `cpuSQLNanos`
on `crdb_internal.statement_statistics` and `system.statement_statistics`

Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: maryliag <[email protected]>
mgartner added a commit to mgartner/cockroach that referenced this issue Mar 10, 2023
To support execution of ANY expressions within UDFs, the optimizer
builds them as regular subqueries with a novel transformation. The
transformation simulates the peculiar behavior of ANY expressions.

For example, consider the expression:

    i <comp> ANY (<subquery>)

The logic for evaluating this comparison is:

  1. If the subquery results in zero rows, then the expression evaluates
     to false, even if i is NULL.
  2. Otherwise, if the comparison between i and any value returned by
     the subquery is true, then the expression evaluates to true.
  3. Otherwise, if any values returned by the subquery are NULL, then
     the expression evaluates to NULL.
  4. Otherwise, if i is NULL, then the expression evaluates to NULL.
  5. Otherwise, the expression evaluates to false.

We use the following transformation to express this logic:

    i = ANY (SELECT a FROM t)
    =>
    SELECT count > 0 AND (bool_or OR (null_count > 0 AND NULL))
    FROM (
      SELECT
        count(*) AS count,
        bool_or(cmp) AS bool_or,
        count(*) FILTER (is_null) AS null_count
      FROM (
        SELECT a = i AS cmp, a IS NULL AS is_null
        FROM (
          SELECT a FROM t
        )
      )
    )

By constructing ANY expressions within UDFs as subqueries, we avoid
adding complexity to the execution engine, which is not currently
capable of evaluating ANY expressions within the context of a UDF.

We only perform this transformation when building ANY expressions within
a UDF because this transformation may lead to query plans with slow
apply-joins. In the future, if we can eliminate the apply-joins we may
be able to apply this transformation universally, and remove existing
logic for ANY expression evaluation in the execution engine.

Fixes cockroachdb#87291

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this issue Mar 13, 2023
To support execution of ANY expressions within UDFs, the optimizer
builds them as regular subqueries with a novel transformation. The
transformation simulates the peculiar behavior of ANY expressions.

For example, consider the expression:

    i <comp> ANY (<subquery>)

The logic for evaluating this comparison is:

  1. If the subquery results in zero rows, then the expression evaluates
     to false, even if i is NULL.
  2. Otherwise, if the comparison between i and any value returned by
     the subquery is true, then the expression evaluates to true.
  3. Otherwise, if any values returned by the subquery are NULL, then
     the expression evaluates to NULL.
  4. Otherwise, if i is NULL, then the expression evaluates to NULL.
  5. Otherwise, the expression evaluates to false.

We use the following transformation to express this logic:

    i = ANY (SELECT a FROM t)
    =>
    SELECT count > 0 AND (bool_or OR (bool_or_null AND NULL))
    FROM (
      SELECT
        count(*) AS count,
        bool_or(cmp) AS bool_or,
        bool_or(is_null) AS bool_or_null
      FROM (
        SELECT a = i AS cmp, a IS NULL AS is_null
        FROM (
          SELECT a FROM t
        )
      )
    )

By constructing ANY expressions within UDFs as subqueries, we avoid
adding complexity to the execution engine, which is not currently
capable of evaluating ANY expressions within the context of a UDF.

We only perform this transformation when building ANY expressions within
a UDF because this transformation may lead to query plans with slow
apply-joins. In the future, if we can eliminate the apply-joins we may
be able to apply this transformation universally, and remove existing
logic for ANY expression evaluation in the execution engine.

Fixes cockroachdb#87291

Release note: None
mgartner added a commit to mgartner/cockroach that referenced this issue Mar 15, 2023
To support execution of ANY expressions within UDFs, the optimizer
builds them as subqueries with GroupBy expressions instead. The
transformation simulates the peculiar behavior of ANY expressions. See
`CustomFuncs.ConstructGroupByAny` for more details on this
transformation.

By constructing ANY expressions within UDFs as subqueries, we avoid
adding complexity to the execution engine, which is not currently
capable of evaluating ANY expressions within the context of a UDF.

Fixes cockroachdb#87291

Release note: None
craig bot pushed a commit that referenced this issue Mar 16, 2023
98375: opt: build ANY expressions as regular subqueries within UDFs r=mgartner a=mgartner

#### opt: build ANY expressions as regular subqueries within UDFs

To support execution of ANY expressions within UDFs, the optimizer
builds them as subqueries with GroupBy expressions instead. The
transformation simulates the peculiar behavior of ANY expressions. See
`CustomFuncs.ConstructGroupByAny` for more details on this
transformation.

By constructing ANY expressions within UDFs as subqueries, we avoid
adding complexity to the execution engine, which is not currently
capable of evaluating ANY expressions within the context of a UDF.

Fixes #87291

Release note: None

#### opt: fix GenerateIndexScans comment

Release note: None


98758: sqlccl: disable default test tenant for explicit tenant tests r=michae2 a=herkolategan

The tests modified in this change start tenants explicitly, and fails when the default test tenant is started as it is not expected. This change disables the starting of the default test tenant.

Epic: None

Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Herko Lategan <[email protected]>
@craig craig bot closed this as completed in 443df88 Mar 16, 2023
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant