Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set druid.expressions.useStrictBooleans to true by default #14734

Merged
merged 7 commits into from
Aug 22, 2023

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Aug 2, 2023

Description

Part of #14154, this PR sets 'strict booleans' to be on by default, which homogenizes Druid boolean handling to consistently use LONG types internally and behave in a more SQL compliant manner, particularly with how nulls are handled in correct tri-state logic. This config was introduced in #11184.

Release note

druid.expressions.useStrictBooleans is now enabled by default in the code in Druid 28. If this setting is not explicitly configured in runtime.properties, upon upgrade clusters will take on new behavior with how Druid handles boolean types during ingestion and query processing, consistently using LONG types internally for any boolean values which are ingested and as the output of boolean functions for ingestion transformations and query time operations.

Previous behavior:

  • 100 && 11 -> 11
  • 0.7 || 0.3 -> 0.7
  • 100 && 0 -> 0
  • 'troo' && 'true' -> 'troo'
  • 'troo' || 'true' -> 'true'

New behavior:

  • 100 && 11 -> 1
  • 0.7 || 0.3 -> 1
  • 100 && 0 -> 0
  • 'troo' && 'true' -> 0
  • 'troo' || 'true' -> 1

Logical operators will also behave in an SQL compliant manner, treating nulls as 'unknown' instead of false.

For the "or" operator:

  • true || null, null || true -> 1
  • false || null, null || false, null || null-> null

For the "and" operator:

  • true && null, null && true, null && null -> null
  • false && null, null && false -> 0

To revert to the default behavior present in Druid 27 and older, druid.expressions.useStrictBooleans may be set to false in the runtime.properties of cluster configuration.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • been tested in a test Druid cluster.

@@ -2271,8 +2271,8 @@ Supported query contexts:

|Key|Description|Default|
|---|-----------|-------|
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes)|false|
|`druid.expressions.allowNestedArrays`|If enabled, Druid array expressions can create nested arrays.|false|
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes)|true|
Copy link
Contributor

@ektravel ektravel Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes)|true|
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions. If enabled, all boolean values are either `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes) for more information.|true|

native boolean types, Druid ingests these values as strings if `druid.expressions.useStrictBooleans` is set to `false`
(the default), or longs if set to `true` (for more SQL compatible behavior). Array typed columns can be queried using
native boolean types, Druid ingests these values as longs if `druid.expressions.useStrictBooleans` is set to `true`
(the default), or strings if set to `false`. Array typed columns can be queried using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(the default), or strings if set to `false`. Array typed columns can be queried using
(the default) or strings if set to `false`. Array typed columns can be queried using

@@ -297,7 +297,7 @@ Supported features:
* constants and identifiers are supported for any column type
* `cast` is supported for numeric and string types
* math operators: `+`,`-`,`*`,`/`,`%`,`^` are supported for numeric types
* logical operators: `!`, `&&`, `||`, are supported for string and numeric types (if `druid.expressions.useStrictBooleans=true`)
* logical operators: `!`, `&&`, `||`, are supported for string and numeric types (if `druid.expressions.useStrictBooleans=true`, the default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* logical operators: `!`, `&&`, `||`, are supported for string and numeric types (if `druid.expressions.useStrictBooleans=true`, the default)
* logical operators: `!`, `&&`, `||`, are supported for string and numeric types by default

Prior to the 0.23 release of Apache Druid, boolean function expressions have inconsistent handling of true and false values, and the logical 'and' and 'or' operators behave in a manner that is incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) is enabled. Logical operators also pass through their input values similar to many scripting languages, and treat `null` as false, which can result in some rather strange behavior. Other boolean operations, such as comparisons and equality, retain their input types (e.g. `DOUBLE` comparison would produce `1.0` for true and `0.0` for false), while many other boolean functions strictly produce `LONG` typed values of `1` for true and `0` for false.

After 0.23, while the inconsistent legacy behavior is still the default, it can be optionally be changed by setting `druid.expressions.useStrictBooleans=true`, so that these operations will allow correctly treating `null` values as "unknown" for SQL compatible behavior, and _all boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`. Additionally,
In Druid 28.0 and later, `druid.expressions.useStrictBooleans=true` is set by default, so logical operations will allow correctly treating `null` values as "unknown" for SQL compatible behavior, and _all boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In Druid 28.0 and later, `druid.expressions.useStrictBooleans=true` is set by default, so logical operations will allow correctly treating `null` values as "unknown" for SQL compatible behavior, and _all boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`.
In Druid 28.0 and later, `druid.expressions.useStrictBooleans=true` is set by default. Logical operations treat `null` values as "unknown" for SQL compatible behavior. _all boolean output functions_ output homogeneous `LONG` typed boolean values of `1` for `true` and `0` for `false`.

* `100 && 0` -> `0`
* `'troo' && 'true'` -> `'troo'`
* `'troo' || 'true'` -> `'true'`
* `STRING` - the value `'true'` (case insensitive) is considered `true`, everything else is `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `STRING` - the value `'true'` (case insensitive) is considered `true`, everything else is `false`.
* `STRING`: the value `'true'` (case insensitive) is considered `true`, everything else is `false`.

@@ -321,14 +319,7 @@ For the "and" operator:

Druid currently still retains implicit conversion of `LONG`, `DOUBLE`, and `STRING` types into boolean values in both modes:
* `LONG` or `DOUBLE` - any value greater than 0 is considered `true`, else `false`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `LONG` or `DOUBLE` - any value greater than 0 is considered `true`, else `false`
* `LONG` or `DOUBLE`: any value greater than 0 is considered `true`, else `false`.

@@ -337,4 +328,11 @@ SQL compatible behavior:
* `'troo' && 'true'` -> `0`
* `'troo' || 'true'` -> `1`

Prior to the 28 release of Apache Druid, boolean function expressions had inconsistent handling of true and false values, and the logical 'and' and 'or' operators behave in a manner that is incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) is enabled. Logical operators also pass through their input values similar to many scripting languages, and treat `null` as false, which can result in some rather strange behavior. Other boolean operations, such as comparisons and equality, retain their input types (e.g. `DOUBLE` comparison would produce `1.0` for true and `0.0` for false), while many other boolean functions strictly produce `LONG` typed values of `1` for true and `0` for false. This legacy mode can be enabled by setting `druid.expressions.useStrictBooleans=false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is confusing because you use past and present tense to refer to the past behavior. I think we should rewrite it so that it follows the same format.

Copy link
Contributor

@ektravel ektravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some suggestions for the docs.

Copy link
Contributor

@ektravel ektravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@abhishekagarwal87
Copy link
Contributor

abhishekagarwal87 commented Aug 12, 2023

didn't review the code line by line but +1 on setting this default to true makes sense. So we avoid issues such as https://www.linen.dev/s/apache-druid/t/10978927/hi-all-i-think-i-find-found-a-calcite-bug-or-something-if-i-

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed just the docs. I haven't look at the production code since others have already reviewed that.

@@ -337,4 +328,11 @@ SQL compatible behavior:
* `'troo' && 'true'` -> `0`
* `'troo' || 'true'` -> `1`

Prior to the 28 release of Apache Druid, boolean function expressions had inconsistent handling of true and false values. The logical 'and' and 'or' operators behaved in a manner that was incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) was enabled. Logical operators would also pass through their input values similar to many scripting languages, and treated `null` as false, which would result in some rather strange behavior. Other boolean operations, such as comparisons and equality, retained their input types (e.g. `DOUBLE` comparison would produce `1.0` for true and `0.0` for false), while many other boolean functions strictly produced `LONG` typed values of `1` for true and `0` for false. This legacy mode can be enabled by setting `druid.expressions.useStrictBooleans=false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Prior to Druid 28.0.0" is more consistent with how we write this kind of stuff elsewhere.

Also, I think this paragraph would read better if switched into the present tense, and written as if it's describing druid.expressions.useStrictBooleans = false, not as if it's describing Druid prior to 28.0.0. (Still include a mention that the default changed in 28.0.0 though.)

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest patch LGTM.

@clintropolis clintropolis merged commit 194a9c9 into apache:master Aug 22, 2023
@clintropolis clintropolis deleted the strict-boolean-mode-default branch August 22, 2023 07:19
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants