-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set druid.expressions.useStrictBooleans to true by default #14734
set druid.expressions.useStrictBooleans to true by default #14734
Conversation
docs/configuration/index.md
Outdated
@@ -2271,8 +2271,8 @@ Supported query contexts: | |||
|
|||
|Key|Description|Default| | |||
|---|-----------|-------| | |||
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes)|false| | |||
|`druid.expressions.allowNestedArrays`|If enabled, Druid array expressions can create nested arrays.|false| | |||
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes)|true| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes)|true| | |
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions. If enabled, all boolean values are either `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes) for more information.|true| |
docs/ingestion/schema-design.md
Outdated
native boolean types, Druid ingests these values as strings if `druid.expressions.useStrictBooleans` is set to `false` | ||
(the default), or longs if set to `true` (for more SQL compatible behavior). Array typed columns can be queried using | ||
native boolean types, Druid ingests these values as longs if `druid.expressions.useStrictBooleans` is set to `true` | ||
(the default), or strings if set to `false`. Array typed columns can be queried using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the default), or strings if set to `false`. Array typed columns can be queried using | |
(the default) or strings if set to `false`. Array typed columns can be queried using |
docs/querying/math-expr.md
Outdated
@@ -297,7 +297,7 @@ Supported features: | |||
* constants and identifiers are supported for any column type | |||
* `cast` is supported for numeric and string types | |||
* math operators: `+`,`-`,`*`,`/`,`%`,`^` are supported for numeric types | |||
* logical operators: `!`, `&&`, `||`, are supported for string and numeric types (if `druid.expressions.useStrictBooleans=true`) | |||
* logical operators: `!`, `&&`, `||`, are supported for string and numeric types (if `druid.expressions.useStrictBooleans=true`, the default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* logical operators: `!`, `&&`, `||`, are supported for string and numeric types (if `druid.expressions.useStrictBooleans=true`, the default) | |
* logical operators: `!`, `&&`, `||`, are supported for string and numeric types by default |
docs/querying/math-expr.md
Outdated
Prior to the 0.23 release of Apache Druid, boolean function expressions have inconsistent handling of true and false values, and the logical 'and' and 'or' operators behave in a manner that is incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) is enabled. Logical operators also pass through their input values similar to many scripting languages, and treat `null` as false, which can result in some rather strange behavior. Other boolean operations, such as comparisons and equality, retain their input types (e.g. `DOUBLE` comparison would produce `1.0` for true and `0.0` for false), while many other boolean functions strictly produce `LONG` typed values of `1` for true and `0` for false. | ||
|
||
After 0.23, while the inconsistent legacy behavior is still the default, it can be optionally be changed by setting `druid.expressions.useStrictBooleans=true`, so that these operations will allow correctly treating `null` values as "unknown" for SQL compatible behavior, and _all boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`. Additionally, | ||
In Druid 28.0 and later, `druid.expressions.useStrictBooleans=true` is set by default, so logical operations will allow correctly treating `null` values as "unknown" for SQL compatible behavior, and _all boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Druid 28.0 and later, `druid.expressions.useStrictBooleans=true` is set by default, so logical operations will allow correctly treating `null` values as "unknown" for SQL compatible behavior, and _all boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`. | |
In Druid 28.0 and later, `druid.expressions.useStrictBooleans=true` is set by default. Logical operations treat `null` values as "unknown" for SQL compatible behavior. _all boolean output functions_ output homogeneous `LONG` typed boolean values of `1` for `true` and `0` for `false`. |
docs/querying/math-expr.md
Outdated
* `100 && 0` -> `0` | ||
* `'troo' && 'true'` -> `'troo'` | ||
* `'troo' || 'true'` -> `'true'` | ||
* `STRING` - the value `'true'` (case insensitive) is considered `true`, everything else is `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `STRING` - the value `'true'` (case insensitive) is considered `true`, everything else is `false`. | |
* `STRING`: the value `'true'` (case insensitive) is considered `true`, everything else is `false`. |
docs/querying/math-expr.md
Outdated
@@ -321,14 +319,7 @@ For the "and" operator: | |||
|
|||
Druid currently still retains implicit conversion of `LONG`, `DOUBLE`, and `STRING` types into boolean values in both modes: | |||
* `LONG` or `DOUBLE` - any value greater than 0 is considered `true`, else `false` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `LONG` or `DOUBLE` - any value greater than 0 is considered `true`, else `false` | |
* `LONG` or `DOUBLE`: any value greater than 0 is considered `true`, else `false`. |
docs/querying/math-expr.md
Outdated
@@ -337,4 +328,11 @@ SQL compatible behavior: | |||
* `'troo' && 'true'` -> `0` | |||
* `'troo' || 'true'` -> `1` | |||
|
|||
Prior to the 28 release of Apache Druid, boolean function expressions had inconsistent handling of true and false values, and the logical 'and' and 'or' operators behave in a manner that is incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) is enabled. Logical operators also pass through their input values similar to many scripting languages, and treat `null` as false, which can result in some rather strange behavior. Other boolean operations, such as comparisons and equality, retain their input types (e.g. `DOUBLE` comparison would produce `1.0` for true and `0.0` for false), while many other boolean functions strictly produce `LONG` typed values of `1` for true and `0` for false. This legacy mode can be enabled by setting `druid.expressions.useStrictBooleans=false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph is confusing because you use past and present tense to refer to the past behavior. I think we should rewrite it so that it follows the same format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some suggestions for the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
didn't review the code line by line but +1 on setting this default to true makes sense. So we avoid issues such as https://www.linen.dev/s/apache-druid/t/10978927/hi-all-i-think-i-find-found-a-calcite-bug-or-something-if-i- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed just the docs. I haven't look at the production code since others have already reviewed that.
docs/querying/math-expr.md
Outdated
@@ -337,4 +328,11 @@ SQL compatible behavior: | |||
* `'troo' && 'true'` -> `0` | |||
* `'troo' || 'true'` -> `1` | |||
|
|||
Prior to the 28 release of Apache Druid, boolean function expressions had inconsistent handling of true and false values. The logical 'and' and 'or' operators behaved in a manner that was incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) was enabled. Logical operators would also pass through their input values similar to many scripting languages, and treated `null` as false, which would result in some rather strange behavior. Other boolean operations, such as comparisons and equality, retained their input types (e.g. `DOUBLE` comparison would produce `1.0` for true and `0.0` for false), while many other boolean functions strictly produced `LONG` typed values of `1` for true and `0` for false. This legacy mode can be enabled by setting `druid.expressions.useStrictBooleans=false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Prior to Druid 28.0.0" is more consistent with how we write this kind of stuff elsewhere.
Also, I think this paragraph would read better if switched into the present tense, and written as if it's describing druid.expressions.useStrictBooleans = false
, not as if it's describing Druid prior to 28.0.0. (Still include a mention that the default changed in 28.0.0 though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest patch LGTM.
Description
Part of #14154, this PR sets 'strict booleans' to be on by default, which homogenizes Druid boolean handling to consistently use
LONG
types internally and behave in a more SQL compliant manner, particularly with how nulls are handled in correct tri-state logic. This config was introduced in #11184.Release note
druid.expressions.useStrictBooleans
is now enabled by default in the code in Druid 28. If this setting is not explicitly configured in runtime.properties, upon upgrade clusters will take on new behavior with how Druid handles boolean types during ingestion and query processing, consistently usingLONG
types internally for any boolean values which are ingested and as the output of boolean functions for ingestion transformations and query time operations.Previous behavior:
100 && 11
->11
0.7 || 0.3
->0.7
100 && 0
->0
'troo' && 'true'
->'troo'
'troo' || 'true'
->'true'
New behavior:
100 && 11
->1
0.7 || 0.3
->1
100 && 0
->0
'troo' && 'true'
->0
'troo' || 'true'
->1
Logical operators will also behave in an SQL compliant manner, treating nulls as 'unknown' instead of false.
For the "or" operator:
true || null
,null || true
->1
false || null
,null || false
,null || null
->null
For the "and" operator:
true && null
,null && true
,null && null
->null
false && null
,null && false
->0
To revert to the default behavior present in Druid 27 and older,
druid.expressions.useStrictBooleans
may be set tofalse
in the runtime.properties of cluster configuration.This PR has: