-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: implement strict UDFs with CASE #94797
Conversation
Don't review this yet, there's a fundamental issue I need to work out here. |
23fe534
to
57ed4ed
Compare
Only the last commit should be reviewed. The first two are from #95234. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 8 of 8 files at r3, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rharding6373)
pkg/sql/opt/optbuilder/scalar.go
line 755 at r3 (raw file):
// true. For example: // // SELECT strict_fn(1, (NULL, NULL)) -- the UDF will not be called
[nit] Should it be the UDF will be called
?
pkg/sql/logictest/testdata/logic_test/udf
line 2521 at r3 (raw file):
query I SELECT strict_fn(1, 'foo', true)
[nit] Assuming we fold away the CASE
when all arguments are non-null constants, it would probably be more interesting to have non-constant arguments (if that isn't the case, disregard this).
pkg/sql/opt/norm/testdata/rules/udf
line 33 at r3 (raw file):
# argument. norm format=show-scalars SELECT strict_fn(1, 'foo', NULL)
[nit] Can we also fold the case with all non-null constant arguments? + Can you add a test for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/logictest/testdata/logic_test/udf
line 2545 at r3 (raw file):
$$ # A tuple with all NULL elements is not concerned "NULL INPUT" for a UDF, even
nit: s/concerned/considered?
57ed4ed
to
b9323a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball and @rharding6373)
pkg/sql/opt/optbuilder/scalar.go
line 755 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] Should it be
the UDF will be called
?
Good catch! Fixed.
pkg/sql/logictest/testdata/logic_test/udf
line 2521 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] Assuming we fold away the
CASE
when all arguments are non-null constants, it would probably be more interesting to have non-constant arguments (if that isn't the case, disregard this).
I've added a test with non-constant arguments. I don't think it's all that different though - whether the CASE is folded during optimization or not isn't really relevant to these changes - it's being evaluated one way or another.
pkg/sql/logictest/testdata/logic_test/udf
line 2545 at r3 (raw file):
Previously, rharding6373 (Rachael Harding) wrote…
nit: s/concerned/considered?
Good catch! Fixed.
pkg/sql/opt/norm/testdata/rules/udf
line 33 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] Can we also fold the case with all non-null constant arguments? + Can you add a test for that?
Done.
b9323a0
to
d7dbfb9
Compare
TFTRs! bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed: |
This commit removes the field `CalledOnNullInput` of `tree.RoutineExpr`. Strict UDFs are now wrapped in a `CASE` expression so that they are not invoked when any of the arguments are NULL. This fixes a bug that caused strict functions be evaluated when presented with `NULL` arguments after being inlined. This also allows for the removal of special logic in evaluation of `tree.RoutineExpr`. A positive side effect of this change is that strict UDFs will be folded to NULL at optimization-time if they are presented with constant NULL arguments. This is made possible by existing normalization rules. Fixes cockroachdb#95240 There is no release note because the bug is not present in any releases. Release note: None
d7dbfb9
to
c2c6239
Compare
5,ORMQueries/django_table_introspection_1_table | ||
5,ORMQueries/django_table_introspection_8_tables | ||
3,ORMQueries/django_table_introspection_1_table | ||
3,ORMQueries/django_table_introspection_8_tables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rafiss LGTY?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep!
bors r+ |
Build failed (retrying...): |
Build succeeded: |
Previously, we wrapped all strict UDFs with a `CASE` expression to prevent the UDF from being invoked when any of the arguments were `NULL`. This was required in order to inline strict UDFs (see cockroachdb#94797). Unfortunately, wrapping the arguments in a `CASE` expression caused problems when the arguments were subqueries. The subqueries would be duplicated in the optimizer expression, one copy used in the `CASE` expression and another as an argument to the UDF`, with each copy containing the same table and column IDs. Normalization rules could then transform these subqueries into apply-joins that would have intersecting columns in their left and right children. This caused optimizer assertion failures in debug builds and could cause incorrect results in release builds. This commit implements a temporary fix which removes the synthesized `CASE` expression entirely. The logic for returning `NULL` immediately when a strict UDF is invoked with a `NULL` argument has been added back into `planner.EvalRoutineExpr`, and strict UDFs are not longer inlined. We should be able to inline strict UDFs by adding the `CASE` expression during the `InlineUDF` normalization rule, as long as the arguments are not complex expressions (e.g., only variables, constants, or placeholders). This is left for a future commit. Fixes cockroachdb#101516 Release note (bug fix): A bug has been fix that could cause incorrect results for queries invoking `STRICT` user-defined functions. This bug was only present in pre-release versions of 23.1.
Previously, we wrapped all strict UDFs with a `CASE` expression to prevent the UDF from being invoked when any of the arguments were `NULL`. This was required in order to inline strict UDFs (see cockroachdb#94797). Unfortunately, wrapping the arguments in a `CASE` expression caused problems when the arguments were subqueries. The subqueries would be duplicated in the optimizer expression, one copy used in the `CASE` expression and another as an argument to the UDF`, with each copy containing the same table and column IDs. Normalization rules could then transform these subqueries into apply-joins that would have intersecting columns in their left and right children. This caused optimizer assertion failures in debug builds and could cause incorrect results in release builds. This commit implements a temporary fix which removes the synthesized `CASE` expression entirely. The logic for returning `NULL` immediately when a strict UDF is invoked with a `NULL` argument has been added back into `planner.EvalRoutineExpr`, and strict UDFs are not longer inlined. We should be able to inline strict UDFs by adding the `CASE` expression during the `InlineUDF` normalization rule, as long as the arguments are not complex expressions (e.g., only variables, constants, or placeholders). This is left for a future commit. Fixes cockroachdb#101516 Release note (bug fix): A bug has been fixed that could cause incorrect results for queries invoking `STRICT` user-defined functions. This bug was only present in pre-release versions of 23.1.
101867: opt: fix strict UDFs with subquery arguments r=mgartner a=mgartner #### opt: fix strict UDFs with subquery arguments Previously, we wrapped all strict UDFs with a `CASE` expression to prevent the UDF from being invoked when any of the arguments were `NULL`. This was required in order to inline strict UDFs (see #94797). Unfortunately, wrapping the arguments in a `CASE` expression caused problems when the arguments were subqueries. The subqueries would be duplicated in the optimizer expression, one copy used in the `CASE` expression and another as an argument to the UDF`, with each copy containing the same table and column IDs. Normalization rules could then transform these subqueries into apply-joins that would have intersecting columns in their left and right children. This caused optimizer assertion failures in debug builds and could cause incorrect results in release builds. This commit implements a temporary fix which removes the synthesized `CASE` expression entirely. The logic for returning `NULL` immediately when a strict UDF is invoked with a `NULL` argument has been added back into `planner.EvalRoutineExpr`, and strict UDFs are not longer inlined. We should be able to inline strict UDFs by adding the `CASE` expression during the `InlineUDF` normalization rule, as long as the arguments are not complex expressions (e.g., only variables, constants, or placeholders). This is left for a future commit. Fixes #101516 Release note (bug fix): A bug has been fixed that could cause incorrect results for queries invoking `STRICT` user-defined functions. This bug was only present in pre-release versions of 23.1. #### opt: add strict field to UDF expression format Release note: None #### opt: simplify description of ExprFmtFlags The description of ExprFmtFlags has been simplified. It previously contained a very detailed description of bitmasks and iota, which probably isn't warranted because it is such a common pattern in the codebase. #### opt: inline strict UDFs This commit inlines strict UDFs by wrapping the generated subquery expression with a `CASE` expression. The `CASE` expression results in `NULL` if any of the arguments of the UDF are `NULL`. Note that even though the subquery is wrapped in a `CASE` expression, it may still be evaluated due to the fact that non-correlated subqueries are eagerly evaluated, see #20298. This should be fine because these inlined UDFs shouldn't have side-effects - we don't inline volatile UDFs. Release note: None 101925: c2c: add 30 second timeout to producer ingestion clean up during cutover r=lidorcarmel a=msbutler Epic: none Release note: None Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Michael Butler <[email protected]>
Previously, we wrapped all strict UDFs with a `CASE` expression to prevent the UDF from being invoked when any of the arguments were `NULL`. This was required in order to inline strict UDFs (see #94797). Unfortunately, wrapping the arguments in a `CASE` expression caused problems when the arguments were subqueries. The subqueries would be duplicated in the optimizer expression, one copy used in the `CASE` expression and another as an argument to the UDF`, with each copy containing the same table and column IDs. Normalization rules could then transform these subqueries into apply-joins that would have intersecting columns in their left and right children. This caused optimizer assertion failures in debug builds and could cause incorrect results in release builds. This commit implements a temporary fix which removes the synthesized `CASE` expression entirely. The logic for returning `NULL` immediately when a strict UDF is invoked with a `NULL` argument has been added back into `planner.EvalRoutineExpr`, and strict UDFs are not longer inlined. We should be able to inline strict UDFs by adding the `CASE` expression during the `InlineUDF` normalization rule, as long as the arguments are not complex expressions (e.g., only variables, constants, or placeholders). This is left for a future commit. Fixes #101516 Release note (bug fix): A bug has been fixed that could cause incorrect results for queries invoking `STRICT` user-defined functions. This bug was only present in pre-release versions of 23.1.
Previously, we wrapped all strict UDFs with a `CASE` expression to prevent the UDF from being invoked when any of the arguments were `NULL`. This was required in order to inline strict UDFs (see #94797). Unfortunately, wrapping the arguments in a `CASE` expression caused problems when the arguments were subqueries. The subqueries would be duplicated in the optimizer expression, one copy used in the `CASE` expression and another as an argument to the UDF`, with each copy containing the same table and column IDs. Normalization rules could then transform these subqueries into apply-joins that would have intersecting columns in their left and right children. This caused optimizer assertion failures in debug builds and could cause incorrect results in release builds. This commit implements a temporary fix which removes the synthesized `CASE` expression entirely. The logic for returning `NULL` immediately when a strict UDF is invoked with a `NULL` argument has been added back into `planner.EvalRoutineExpr`, and strict UDFs are not longer inlined. We should be able to inline strict UDFs by adding the `CASE` expression during the `InlineUDF` normalization rule, as long as the arguments are not complex expressions (e.g., only variables, constants, or placeholders). This is left for a future commit. Fixes #101516 Release note (bug fix): A bug has been fixed that could cause incorrect results for queries invoking `STRICT` user-defined functions. This bug was only present in pre-release versions of 23.1.
This commit removes the field
CalledOnNullInput
oftree.RoutineExpr
.Strict UDFs are now wrapped in a
CASE
expression so that they are notinvoked when any of the arguments are NULL.
This fixes a bug that caused strict functions be evaluated when
presented with
NULL
arguments after being inlined. This also allowsfor the removal of special logic in evaluation of
tree.RoutineExpr
.A positive side effect of this change is that strict UDFs will be folded
to NULL at optimization-time if they are presented with constant NULL
arguments. This is made possible by existing normalization rules.
Fixes #95240
There is no release note because the bug is not present in any releases.
Release note: None