Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: implement strict UDFs with CASE #94797

Merged
merged 1 commit into from
Jan 19, 2023

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented Jan 5, 2023

This commit removes the field CalledOnNullInput of tree.RoutineExpr.
Strict UDFs are now wrapped in a CASE expression so that they are not
invoked when any of the arguments are NULL.

This fixes a bug that caused strict functions be evaluated when
presented with NULL arguments after being inlined. This also allows
for the removal of special logic in evaluation of tree.RoutineExpr.

A positive side effect of this change is that strict UDFs will be folded
to NULL at optimization-time if they are presented with constant NULL
arguments. This is made possible by existing normalization rules.

Fixes #95240

There is no release note because the bug is not present in any releases.

Release note: None

@mgartner mgartner requested review from a team as code owners January 5, 2023 22:04
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@mgartner
Copy link
Collaborator Author

mgartner commented Jan 5, 2023

Don't review this yet, there's a fundamental issue I need to work out here.

@mgartner
Copy link
Collaborator Author

Only the last commit should be reviewed. The first two are from #95234.

@mgartner mgartner changed the title sql: remove CalledOnNullInput field of tree.RoutineExpr sql: implement strict UDFs with CASE Jan 13, 2023
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification! :lgtm:

Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 8 of 8 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)


pkg/sql/opt/optbuilder/scalar.go line 755 at r3 (raw file):

			// true. For example:
			//
			//   SELECT strict_fn(1, (NULL, NULL)) -- the UDF will not be called

[nit] Should it be the UDF will be called?


pkg/sql/logictest/testdata/logic_test/udf line 2521 at r3 (raw file):


query I
SELECT strict_fn(1, 'foo', true)

[nit] Assuming we fold away the CASE when all arguments are non-null constants, it would probably be more interesting to have non-constant arguments (if that isn't the case, disregard this).


pkg/sql/opt/norm/testdata/rules/udf line 33 at r3 (raw file):

# argument.
norm format=show-scalars
SELECT strict_fn(1, 'foo', NULL)

[nit] Can we also fold the case with all non-null constant arguments? + Can you add a test for that?

Copy link
Collaborator

@rharding6373 rharding6373 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work!

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/logictest/testdata/logic_test/udf line 2545 at r3 (raw file):

$$

# A tuple with all NULL elements is not concerned "NULL INPUT" for a UDF, even

nit: s/concerned/considered?

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball and @rharding6373)


pkg/sql/opt/optbuilder/scalar.go line 755 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Should it be the UDF will be called?

Good catch! Fixed.


pkg/sql/logictest/testdata/logic_test/udf line 2521 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Assuming we fold away the CASE when all arguments are non-null constants, it would probably be more interesting to have non-constant arguments (if that isn't the case, disregard this).

I've added a test with non-constant arguments. I don't think it's all that different though - whether the CASE is folded during optimization or not isn't really relevant to these changes - it's being evaluated one way or another.


pkg/sql/logictest/testdata/logic_test/udf line 2545 at r3 (raw file):

Previously, rharding6373 (Rachael Harding) wrote…

nit: s/concerned/considered?

Good catch! Fixed.


pkg/sql/opt/norm/testdata/rules/udf line 33 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Can we also fold the case with all non-null constant arguments? + Can you add a test for that?

Done.

@mgartner
Copy link
Collaborator Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 19, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jan 19, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jan 19, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jan 19, 2023

Build failed:

This commit removes the field `CalledOnNullInput` of `tree.RoutineExpr`.
Strict UDFs are now wrapped in a `CASE` expression so that they are not
invoked when any of the arguments are NULL.

This fixes a bug that caused strict functions be evaluated when
presented with `NULL` arguments after being inlined. This also allows
for the removal of special logic in evaluation of `tree.RoutineExpr`.

A positive side effect of this change is that strict UDFs will be folded
to NULL at optimization-time if they are presented with constant NULL
arguments. This is made possible by existing normalization rules.

Fixes cockroachdb#95240

There is no release note because the bug is not present in any releases.

Release note: None
@mgartner mgartner requested a review from a team January 19, 2023 14:32
5,ORMQueries/django_table_introspection_1_table
5,ORMQueries/django_table_introspection_8_tables
3,ORMQueries/django_table_introspection_1_table
3,ORMQueries/django_table_introspection_8_tables
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafiss LGTY?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!

@mgartner
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 19, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jan 19, 2023

Build succeeded:

@craig craig bot merged commit 9735954 into cockroachdb:master Jan 19, 2023
@mgartner mgartner deleted the generalize-routine branch January 19, 2023 19:46
mgartner added a commit to mgartner/cockroach that referenced this pull request Apr 19, 2023
Previously, we wrapped all strict UDFs with a `CASE` expression to
prevent the UDF from being invoked when any of the arguments were
`NULL`. This was required in order to inline strict UDFs (see cockroachdb#94797).

Unfortunately, wrapping the arguments in a `CASE` expression caused
problems when the arguments were subqueries. The subqueries would be
duplicated in the optimizer expression, one copy used in the `CASE`
expression and another as an argument to the UDF`, with each copy
containing the same table and column IDs. Normalization rules could then
transform these subqueries into apply-joins that would have intersecting
columns in their left and right children. This caused optimizer
assertion failures in debug builds and could cause incorrect results in
release builds.

This commit implements a temporary fix which removes the synthesized
`CASE` expression entirely. The logic for returning `NULL` immediately
when a strict UDF is invoked with a `NULL` argument has been added back
into `planner.EvalRoutineExpr`, and strict UDFs are not longer inlined.

We should be able to inline strict UDFs by adding the `CASE` expression
during the `InlineUDF` normalization rule, as long as the arguments are
not complex expressions (e.g., only variables, constants, or
placeholders). This is left for a future commit.

Fixes cockroachdb#101516

Release note (bug fix): A bug has been fix that could cause incorrect
results for queries invoking `STRICT` user-defined functions. This bug
was only present in pre-release versions of 23.1.
mgartner added a commit to mgartner/cockroach that referenced this pull request Apr 20, 2023
Previously, we wrapped all strict UDFs with a `CASE` expression to
prevent the UDF from being invoked when any of the arguments were
`NULL`. This was required in order to inline strict UDFs (see cockroachdb#94797).

Unfortunately, wrapping the arguments in a `CASE` expression caused
problems when the arguments were subqueries. The subqueries would be
duplicated in the optimizer expression, one copy used in the `CASE`
expression and another as an argument to the UDF`, with each copy
containing the same table and column IDs. Normalization rules could then
transform these subqueries into apply-joins that would have intersecting
columns in their left and right children. This caused optimizer
assertion failures in debug builds and could cause incorrect results in
release builds.

This commit implements a temporary fix which removes the synthesized
`CASE` expression entirely. The logic for returning `NULL` immediately
when a strict UDF is invoked with a `NULL` argument has been added back
into `planner.EvalRoutineExpr`, and strict UDFs are not longer inlined.

We should be able to inline strict UDFs by adding the `CASE` expression
during the `InlineUDF` normalization rule, as long as the arguments are
not complex expressions (e.g., only variables, constants, or
placeholders). This is left for a future commit.

Fixes cockroachdb#101516

Release note (bug fix): A bug has been fixed that could cause incorrect
results for queries invoking `STRICT` user-defined functions. This bug
was only present in pre-release versions of 23.1.
craig bot pushed a commit that referenced this pull request Apr 20, 2023
101867: opt: fix strict UDFs with subquery arguments r=mgartner a=mgartner

#### opt: fix strict UDFs with subquery arguments

Previously, we wrapped all strict UDFs with a `CASE` expression to
prevent the UDF from being invoked when any of the arguments were
`NULL`. This was required in order to inline strict UDFs (see #94797).

Unfortunately, wrapping the arguments in a `CASE` expression caused
problems when the arguments were subqueries. The subqueries would be
duplicated in the optimizer expression, one copy used in the `CASE`
expression and another as an argument to the UDF`, with each copy
containing the same table and column IDs. Normalization rules could then
transform these subqueries into apply-joins that would have intersecting
columns in their left and right children. This caused optimizer
assertion failures in debug builds and could cause incorrect results in
release builds.

This commit implements a temporary fix which removes the synthesized
`CASE` expression entirely. The logic for returning `NULL` immediately
when a strict UDF is invoked with a `NULL` argument has been added back
into `planner.EvalRoutineExpr`, and strict UDFs are not longer inlined.

We should be able to inline strict UDFs by adding the `CASE` expression
during the `InlineUDF` normalization rule, as long as the arguments are
not complex expressions (e.g., only variables, constants, or
placeholders). This is left for a future commit.

Fixes #101516

Release note (bug fix): A bug has been fixed that could cause incorrect
results for queries invoking `STRICT` user-defined functions. This bug
was only present in pre-release versions of 23.1.

#### opt: add strict field to UDF expression format

Release note: None

#### opt: simplify description of ExprFmtFlags

The description of ExprFmtFlags has been simplified. It previously
contained a very detailed description of bitmasks and iota, which
probably isn't warranted because it is such a common pattern in the
codebase.

#### opt: inline strict UDFs

This commit inlines strict UDFs by wrapping the generated subquery
expression with a `CASE` expression. The `CASE` expression results in
`NULL` if any of the arguments of the UDF are `NULL`.

Note that even though the subquery is wrapped in a `CASE` expression, it
may still be evaluated due to the fact that non-correlated subqueries
are eagerly evaluated, see #20298. This should be fine because these
inlined UDFs shouldn't have side-effects - we don't inline volatile
UDFs.

Release note: None


101925: c2c: add 30 second timeout to producer ingestion clean up during cutover r=lidorcarmel a=msbutler

Epic: none

Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Apr 20, 2023
Previously, we wrapped all strict UDFs with a `CASE` expression to
prevent the UDF from being invoked when any of the arguments were
`NULL`. This was required in order to inline strict UDFs (see #94797).

Unfortunately, wrapping the arguments in a `CASE` expression caused
problems when the arguments were subqueries. The subqueries would be
duplicated in the optimizer expression, one copy used in the `CASE`
expression and another as an argument to the UDF`, with each copy
containing the same table and column IDs. Normalization rules could then
transform these subqueries into apply-joins that would have intersecting
columns in their left and right children. This caused optimizer
assertion failures in debug builds and could cause incorrect results in
release builds.

This commit implements a temporary fix which removes the synthesized
`CASE` expression entirely. The logic for returning `NULL` immediately
when a strict UDF is invoked with a `NULL` argument has been added back
into `planner.EvalRoutineExpr`, and strict UDFs are not longer inlined.

We should be able to inline strict UDFs by adding the `CASE` expression
during the `InlineUDF` normalization rule, as long as the arguments are
not complex expressions (e.g., only variables, constants, or
placeholders). This is left for a future commit.

Fixes #101516

Release note (bug fix): A bug has been fixed that could cause incorrect
results for queries invoking `STRICT` user-defined functions. This bug
was only present in pre-release versions of 23.1.
blathers-crl bot pushed a commit that referenced this pull request Apr 20, 2023
Previously, we wrapped all strict UDFs with a `CASE` expression to
prevent the UDF from being invoked when any of the arguments were
`NULL`. This was required in order to inline strict UDFs (see #94797).

Unfortunately, wrapping the arguments in a `CASE` expression caused
problems when the arguments were subqueries. The subqueries would be
duplicated in the optimizer expression, one copy used in the `CASE`
expression and another as an argument to the UDF`, with each copy
containing the same table and column IDs. Normalization rules could then
transform these subqueries into apply-joins that would have intersecting
columns in their left and right children. This caused optimizer
assertion failures in debug builds and could cause incorrect results in
release builds.

This commit implements a temporary fix which removes the synthesized
`CASE` expression entirely. The logic for returning `NULL` immediately
when a strict UDF is invoked with a `NULL` argument has been added back
into `planner.EvalRoutineExpr`, and strict UDFs are not longer inlined.

We should be able to inline strict UDFs by adding the `CASE` expression
during the `InlineUDF` normalization rule, as long as the arguments are
not complex expressions (e.g., only variables, constants, or
placeholders). This is left for a future commit.

Fixes #101516

Release note (bug fix): A bug has been fixed that could cause incorrect
results for queries invoking `STRICT` user-defined functions. This bug
was only present in pre-release versions of 23.1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: do not inline strict UDFs
5 participants