-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: propagate result types to inputs during set ops with nulls and tuples #60827
Conversation
de86750
to
cfd7881
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the optimizer add casts when one side is Unknown and one side isn't? At the very least, the quoted example SELECT NULL UNION SELECT 1
doesn't have a problem:
[email protected]:26257/defaultdb> explain (types) select a from (select null) as t(a) union select 1;
info
-------------------------------------------------------------
distribution: local
vectorized: false
• union
│ columns: (a int)
│ estimated row count: 2
│
├── • values
│ columns: (a int)
│ size: 1 column, 1 row
│ row 0, expr 0: (CAST((NULL)[unknown] AS INT8))[int]
│
└── • values
columns: ("?column?" int)
size: 1 column, 1 row
row 0, expr 0: (1)[int]
The problem in the example is that we can't plan casts to tuple:
// TODO(radu): casts to Tuple are not supported (they can't be serialized |
If we want to make this fix until that problem is solved, that's ok, but it should be clearly documented that it only addresses this narrow case.
Reviewable status: complete! 0 of 0 LGTMs obtained
Release note: None
f7abd39
to
7b72e9a
Compare
The physical planning for set operations is special because it allows for the result types from its inputs to be of different type family in some cases. Namely, it is ok if one input has `types.Unknown` whereas the other input is of some "known" type. In order to handle such case the execbuilder plans casts; however, currently it is not possible to plan casts to `Tuple` type. As a result, the vectorized engine is not able to execute the query because it expects the data coming from both inputs to be of the same type. This commit modifies the logic of reconciling the result types from both inputs to additionally update unknown types on the inputs if the other input is of a Tuple type. Such behavior is acceptable because in the original plan we only expected to have NULL values, and the Tuple type is able to handle such case too. The problem can also be reproduced in a logic test with the row-by-row engine in the `fakedist` setting. This bug was introduced in a76ee31, and before that change we had the following behavior: overwrite `plan.ResultTypes` to the merged types and then plan a stage of distinct processors if we have UNION. This resulted in the input spec for the input stream to distinct having correctly set types. After the change, that was no longer the case. Before that change the vectorized engine couldn't handle such queries due to type mismatch during `SupportsVectorized` check, so we fell back to the row-by-row engine. However, this commit makes it so that the vectorized engine is able to execute such queries. Release note (bug fix): CockroachDB previously could encounter an internal error when performing UNION operation when the first input resulted only in NULL values and consequent inputs produce tuples, and this is now fixed. Only 21.1 alpha versions are affected.
7b72e9a
to
4208a25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I think you're right, my initial explanation was too wide, updated.
Reviewable status: complete! 0 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained
TFTR! bors r+ |
Build failed (retrying...): |
Build succeeded: |
sql: move MergeResultTypes from physicalplan package and unexport
Release note: None
sql: propagate result types to inputs in set ops with nulls and tuples
The physical planning for set operations is special because it allows
for the result types from its inputs to be of different type family in
some cases. Namely, it is ok if one input has
types.Unknown
whereasthe other input is of some "known" type. In order to handle such case
the execbuilder plans casts; however, currently it is not possible to
plan casts to
Tuple
type. As a result, the vectorized engine is notable to execute the query because it expects the data coming from both
inputs to be of the same type.
This commit modifies the logic of reconciling the result types from both
inputs to additionally update unknown types on the inputs if the other
input is of a Tuple type. Such behavior is acceptable because in the
original plan we only expected to have NULL values, and the Tuple type
is able to handle such case too.
The problem can also be reproduced in a logic test with the row-by-row
engine in the
fakedist
setting. This bug was introduced ina76ee31, and before that change we had
the following behavior: overwrite
plan.ResultTypes
to the merged typesand then plan a stage of distinct processors if we have UNION. This
resulted in the input spec for the input stream to distinct having
correctly set types. After the change, that was no longer the case.
Before that change the vectorized engine couldn't handle such queries
due to type mismatch during
SupportsVectorized
check, so we fell backto the row-by-row engine. However, this commit makes it so that the
vectorized engine is able to execute such queries.
Fixes: #59611.
Release note (bug fix): CockroachDB previously could encounter an
internal error when performing UNION operation when the first input
resulted only in NULL values and consequent inputs produce tuples, and
this is now fixed. Only 21.1 alpha versions are affected.