Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IS [NOT] DISTINCT FROM to SQL and join matchers. #14976

Merged
merged 8 commits into from
Sep 20, 2023

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Sep 13, 2023

Changes:

  1. Add "isdistinctfrom" and "notdistinctfrom" native expressions.

  2. Add "IS [NOT] DISTINCT FROM" to SQL. It uses the new native expressions
    when generating expressions, and is treated the same as equals and
    not-equals when generating native filters on literals.

  3. Update join matchers to have an "includeNull" parameter that determines
    whether we are operating in "equals" mode or "is not distinct from"
    mode.

Changes:

1) Add "isdistinctfrom" and "notdistinctfrom" native expressions.

2) Add "IS [NOT] DISTINCT FROM" to SQL. It uses the new native expressions
   when generating expressions, and is treated the same as equals and
   not-equals when generating native filters on literals.

3) Update join matchers to have an "includeNull" parameter that determines
   whether we are operating in "equals" mode or "is not distinct from"
   mode.
Comment on lines 2228 to 2229
* SQL function "x IS NOT DISTINCT FROM y". Very similar to "x = y", i.e. {@link BinEqExpr}, except this function
* never returns null, and this function considers NULL itself to be not-distinct-from NULL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function considers NULL itself to be not-distinct-from NULL - this part is a bit unclear. How about
this function considers NULL as a value

Copy link
Contributor Author

@gianm gianm Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe adding an example helps too. I changed it to this:

  /**
   * SQL function "x IS NOT DISTINCT FROM y". Very similar to "x = y", i.e. {@link BinEqExpr}, except this function
   * never returns null, and this function considers NULL as a value, so NULL itself is not-distinct-from NULL. For
   * example: `x == null` returns `null` in SQL-compatible null handling mode, but `notdistinctfrom(x, null)` is
   * true if `x` is null.
   */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

@Override
public String name()
{
return "notdistinctfrom";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "notdistinctfrom";
return "is_not_distinct_from";

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, it is consistent with isnull/notnull

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's consistent with the SQL expression though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was modeling the names after isnull and notnull. Like,

  • SQL IS NULL -> native isnull
  • SQL IS NOT NULL -> native notnull (drop the "is")

I would prefer to keep this whole family of functions consistent with each other. IMO if we also want consistency with SQL then we should make new aliases for isnull and notnull too. That'd be fine by me, fwiw, although I don't personally feel it's necessary.

Comment on lines +2257 to +2258
if (leftVal.isNumericNull() || rightVal.isNumericNull()) {
return ExprEval.ofLongBoolean(leftVal.isNumericNull() && rightVal.isNumericNull());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why this wasn't done if the comparison type was Long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it doesn't really matter because its covered by the leftVal.value() == null || rightVal.value == null check that happens before we get here.

This default case is also handling array and complex types, which .. is probably not super cool since the former always claims to be numeric null unless it contains only a single element which can be converted to or is a number, and the latter always claims to be numeric null, but maybe ok if we document the behavior doesn't work with these types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this part was weird too, but didn't change it (just copied and adapted the code for ==). I added this comment to make it clear where the code comes from:

      // Code copied and adapted from BinaryBooleanOpExprBase and BinEqExpr.
      // The code isn't shared due to differences in code structure: BinaryBooleanOpExprBase + BinEqExpr have logic
      // interleaved between parent and child class, but we can't use BinaryBooleanOpExprBase as a parent here, because
      // (a) this is a function, not an expr; and (b) our logic for handling and returning nulls is different from most
      // binary exprs, where null in means null out.

@Override
public String name()
{
return "isdistinctfrom";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "isdistinctfrom";
return "is_distinct_from";

@Override
public String name()
{
return "notdistinctfrom";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, it is consistent with isnull/notnull

Comment on lines +2257 to +2258
if (leftVal.isNumericNull() || rightVal.isNumericNull()) {
return ExprEval.ofLongBoolean(leftVal.isNumericNull() && rightVal.isNumericNull());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it doesn't really matter because its covered by the leftVal.value() == null || rightVal.value == null check that happens before we get here.

This default case is also handling array and complex types, which .. is probably not super cool since the former always claims to be numeric null unless it contains only a single element which can be converted to or is a number, and the latter always claims to be numeric null, but maybe ok if we document the behavior doesn't work with these types.

case LONG:
return ExprEval.ofLongBoolean(leftVal.asLong() == rightVal.asLong());
case DOUBLE:
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, the comparison expressions all handle ARRAY types correctly, should we add it here too? See evalArray methods on the various BinaryBooleanOpExprBase implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add.

- Add ARRAY handling to "notdistinctfrom" and "isdistinctfrom".
- Include null in pushed-down filters when using "notdistinctfrom" in a join.

Other changes:
- Adjust join filter analyzer to more explicitly use InDimFilter's ValuesSets,
  relying less on remembering to get it right to avoid copies.
@gianm
Copy link
Contributor Author

gianm commented Sep 14, 2023

Pushed a new commit that does:

@LakshSingla LakshSingla removed the MSQ label Sep 15, 2023
@vogievetsky vogievetsky added the Needs web console change Backend API changes that would benefit from frontend support in the web console label Sep 15, 2023
@gianm gianm added this to the 28.0 milestone Sep 19, 2023
@gianm
Copy link
Contributor Author

gianm commented Sep 19, 2023

As mentioned on #14978, we'll either need this patch merged prior to the next release, or we'll need the IS NOT DISTINCT FROM doc stuff in #14978 reverted.

@github-actions github-actions bot added the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label Sep 19, 2023
@gianm gianm merged commit 823f620 into apache:master Sep 20, 2023
@gianm gianm deleted the sql-is-not-distinct-from branch September 20, 2023 17:44
@vogievetsky vogievetsky removed the Needs web console change Backend API changes that would benefit from frontend support in the web console label Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants