Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set druid.expressions.useStrictBooleans to true by default #14734

Merged
merged 7 commits into from
Aug 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ public class SqlExpressionBenchmark

static {
NullHandling.initializeForTests();
ExpressionProcessing.initializeForStrictBooleansTests(true);
ExpressionProcessing.initializeForTests();
}

private static final DruidProcessingConfig PROCESSING_CONFIG = new DruidProcessingConfig()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ public class SqlNestedDataBenchmark

static {
NullHandling.initializeForTests();
ExpressionProcessing.initializeForStrictBooleansTests(true);
ExpressionProcessing.initializeForTests();
}

private static final DruidProcessingConfig PROCESSING_CONFIG = new DruidProcessingConfig()
Expand Down
4 changes: 2 additions & 2 deletions docs/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2288,8 +2288,8 @@ Supported query contexts:

|Key|Description|Default|
|---|-----------|-------|
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values will be either a `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes)|false|
|`druid.expressions.allowNestedArrays`|If enabled, Druid array expressions can create nested arrays.|false|
|`druid.expressions.useStrictBooleans`|Controls the behavior of Druid boolean operators and functions, if set to `true` all boolean values are either `1` or `0`. See [expression documentation](../querying/math-expr.md#logical-operator-modes) for more information.|true|
|`druid.expressions.allowNestedArrays`|If enabled, Druid array expressions can create nested arrays.|true|
### Router

#### Router Process Configs
Expand Down
1 change: 0 additions & 1 deletion docs/development/experimental-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,3 @@ Note that this document does not track the status of contrib extensions, all of

- [Configuration reference](../configuration/index.md)
- `CLOSED_SEGMENTS_SINKS` mode
- Expression processing configuration `druid.expressions.allowNestedArrays`
4 changes: 2 additions & 2 deletions docs/ingestion/schema-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,8 +258,8 @@ You can have Druid infer the schema and types for your data partially or fully b
When performing type-aware schema discovery, Druid can discover all of the columns of your input data (that aren't in
the exclusion list). Druid automatically chooses the most appropriate native Druid type among `STRING`, `LONG`,
`DOUBLE`, `ARRAY<STRING>`, `ARRAY<LONG>`, `ARRAY<DOUBLE>`, or `COMPLEX<json>` for nested data. For input formats with
native boolean types, Druid ingests these values as strings if `druid.expressions.useStrictBooleans` is set to `false`
(the default), or longs if set to `true` (for more SQL compatible behavior). Array typed columns can be queried using
native boolean types, Druid ingests these values as longs if `druid.expressions.useStrictBooleans` is set to `true`
(the default) or strings if set to `false`. Array typed columns can be queried using
the [array functions](../querying/sql-array-functions.md) or [UNNEST](../querying/sql-functions.md#unnest). Nested
columns can be queried with the [JSON functions](../querying/sql-json-functions.md).

Expand Down
24 changes: 11 additions & 13 deletions docs/querying/math-expr.md
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ Supported features:
* constants and identifiers are supported for any column type
* `cast` is supported for numeric and string types
* math operators: `+`,`-`,`*`,`/`,`%`,`^` are supported for numeric types
* logical operators: `!`, `&&`, `||`, are supported for string and numeric types (if `druid.expressions.useStrictBooleans=true`)
* logical operators: `!`, `&&`, `||`, are supported for string and numeric types
* comparison operators: `=`, `!=`, `>`, `>=`, `<`, `<=` are supported for string and numeric types
* math functions: `abs`, `acos`, `asin`, `atan`, `cbrt`, `ceil`, `cos`, `cosh`, `cot`, `exp`, `expm1`, `floor`, `getExponent`, `log`, `log10`, `log1p`, `nextUp`, `rint`, `signum`, `sin`, `sinh`, `sqrt`, `tan`, `tanh`, `toDegrees`, `toRadians`, `ulp`, `atan2`, `copySign`, `div`, `hypot`, `max`, `min`, `nextAfter`, `pow`, `remainder`, `scalb` are supported for numeric types
* time functions: `timestamp_floor` (with constant granularity argument) is supported for numeric types
Expand All @@ -309,9 +309,7 @@ Supported features:
* other: `parse_long` is supported for numeric and string types

## Logical operator modes
Prior to the 0.23 release of Apache Druid, boolean function expressions have inconsistent handling of true and false values, and the logical 'and' and 'or' operators behave in a manner that is incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) is enabled. Logical operators also pass through their input values similar to many scripting languages, and treat `null` as false, which can result in some rather strange behavior. Other boolean operations, such as comparisons and equality, retain their input types (e.g. `DOUBLE` comparison would produce `1.0` for true and `0.0` for false), while many other boolean functions strictly produce `LONG` typed values of `1` for true and `0` for false.

After 0.23, while the inconsistent legacy behavior is still the default, it can be optionally be changed by setting `druid.expressions.useStrictBooleans=true`, so that these operations will allow correctly treating `null` values as "unknown" for SQL compatible behavior, and _all boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`. Additionally,
In Druid 28.0 and later, `druid.expressions.useStrictBooleans=true` is set by default. Logical operations treat `null` values as "unknown" for SQL compatible behavior. _All boolean output functions_ will output 'homogeneous' `LONG` typed boolean values of `1` for `true` and `0` for `false`.

For the "or" operator:
* `true || null`, `null || true`, -> `1`
Expand All @@ -322,15 +320,8 @@ For the "and" operator:
* `false && null`, `null && false` -> `0`

Druid currently still retains implicit conversion of `LONG`, `DOUBLE`, and `STRING` types into boolean values in both modes:
* `LONG` or `DOUBLE` - any value greater than 0 is considered `true`, else `false`
* `STRING` - the value `'true'` (case insensitive) is considered `true`, everything else is `false`.

Legacy behavior:
* `100 && 11` -> `11`
* `0.7 || 0.3` -> `0.3`
* `100 && 0` -> `0`
* `'troo' && 'true'` -> `'troo'`
* `'troo' || 'true'` -> `'true'`
* `LONG` or `DOUBLE`: any value greater than 0 is considered `true`, else `false`.
* `STRING`: the value `'true'` (case insensitive) is considered `true`, everything else is `false`.

SQL compatible behavior:
* `100 && 11` -> `1`
Expand All @@ -339,4 +330,11 @@ SQL compatible behavior:
* `'troo' && 'true'` -> `0`
* `'troo' || 'true'` -> `1`

Prior to Druid 28.0.0, `druid.expressions.useStrictBooleans=false` was the default. In this mode, boolean function expressions have inconsistent handling of true and false values. The logical 'and' and 'or' operators behave in a manner that is incompatible with SQL, even if SQL compatible null handling mode (`druid.generic.useDefaultValueForNull=false`) is enabled. Logical operators also pass through their input values, similar to many scripting languages, and treat `null` as false, which results in some rather strange behavior. Other boolean operations, such as comparisons and equality, retain their input types (e.g. `DOUBLE` comparison produces `1.0` for true and `0.0` for false), while many other boolean functions strictly produce `LONG` typed values of `1` for true and `0` for false. This legacy mode can still be enabled by setting `druid.expressions.useStrictBooleans=false`.

Legacy behavior:
* `100 && 11` -> `11`
* `0.7 || 0.3` -> `0.3`
* `100 && 0` -> `0`
* `'troo' && 'true'` -> `'troo'`
* `'troo' || 'true'` -> `'true'`
6 changes: 3 additions & 3 deletions docs/querying/sql-data-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,13 +153,13 @@ values are treated as zeroes. This was the default prior to Druid 28.0.0.
## Boolean logic

The [`druid.expressions.useStrictBooleans`](../configuration/index.md#expression-processing-configurations)
runtime property controls Druid's boolean logic mode. For the most SQL compliant behavior, set this to `true`.
runtime property controls Druid's boolean logic mode. For the most SQL compliant behavior, set this to `true` (the default).

When `druid.expressions.useStrictBooleans = false` (the default mode), Druid uses three-valued logic for
When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued logic for
[expressions](math-expr.md) evaluation, such as `expression` virtual columns or `expression` filters.
However, even in this mode, Druid uses two-valued logic for filter types other than `expression`.

When `druid.expressions.useStrictBooleans = true` (legacy mode), Druid uses two-valued logic.
When `druid.expressions.useStrictBooleans = false` (legacy mode), Druid uses two-valued logic.

## Nested columns

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,71 +58,11 @@ public class InputSourceSamplerDiscoveryTest extends InitializedNullHandlingTest
private InputSourceSampler inputSourceSampler = new InputSourceSampler(OBJECT_MAPPER);

@Test
public void testDiscoveredTypes()
public void testDiscoveredTypesNonStrictBooleans()
{
final InputSource inputSource = new InlineInputSource(Strings.join(STR_JSON_ROWS, '\n'));
final SamplerResponse response = inputSourceSampler.sample(
inputSource,
new JsonInputFormat(null, null, null, null, null),
new DataSchema(
"test",
new TimestampSpec("t", null, null),
DimensionsSpec.builder().useSchemaDiscovery(true).build(),
null,
null,
null
),
null
);

Assert.assertEquals(6, response.getNumRowsRead());
Assert.assertEquals(5, response.getNumRowsIndexed());
Assert.assertEquals(6, response.getData().size());
Assert.assertEquals(
ImmutableList.of(
new StringDimensionSchema("string"),
new LongDimensionSchema("long"),
new DoubleDimensionSchema("double"),
new StringDimensionSchema("bool"),
new StringDimensionSchema("variant"),
new AutoTypeColumnSchema("array"),
new AutoTypeColumnSchema("nested")
),
response.getLogicalDimensions()
);

Assert.assertEquals(
ImmutableList.of(
new AutoTypeColumnSchema("string"),
new AutoTypeColumnSchema("long"),
new AutoTypeColumnSchema("double"),
new AutoTypeColumnSchema("bool"),
new AutoTypeColumnSchema("variant"),
new AutoTypeColumnSchema("array"),
new AutoTypeColumnSchema("nested")
),
response.getPhysicalDimensions()
);
Assert.assertEquals(
RowSignature.builder()
.addTimeColumn()
.add("string", ColumnType.STRING)
.add("long", ColumnType.LONG)
.add("double", ColumnType.DOUBLE)
.add("bool", ColumnType.STRING)
.add("variant", ColumnType.STRING)
.add("array", ColumnType.LONG_ARRAY)
.add("nested", ColumnType.NESTED_DATA)
.build(),
response.getLogicalSegmentSchema()
);
}

@Test
public void testDiscoveredTypesStrictBooleans()
{
try {
ExpressionProcessing.initializeForStrictBooleansTests(true);
ExpressionProcessing.initializeForStrictBooleansTests(false);
final InputSource inputSource = new InlineInputSource(Strings.join(STR_JSON_ROWS, '\n'));
final SamplerResponse response = inputSourceSampler.sample(
inputSource,
Expand All @@ -146,7 +86,7 @@ public void testDiscoveredTypesStrictBooleans()
new StringDimensionSchema("string"),
new LongDimensionSchema("long"),
new DoubleDimensionSchema("double"),
new LongDimensionSchema("bool"),
new StringDimensionSchema("bool"),
new StringDimensionSchema("variant"),
new AutoTypeColumnSchema("array"),
new AutoTypeColumnSchema("nested")
Expand All @@ -172,7 +112,7 @@ public void testDiscoveredTypesStrictBooleans()
.add("string", ColumnType.STRING)
.add("long", ColumnType.LONG)
.add("double", ColumnType.DOUBLE)
.add("bool", ColumnType.LONG)
.add("bool", ColumnType.STRING)
.add("variant", ColumnType.STRING)
.add("array", ColumnType.LONG_ARRAY)
.add("nested", ColumnType.NESTED_DATA)
Expand All @@ -185,6 +125,67 @@ public void testDiscoveredTypesStrictBooleans()
}
}

@Test
public void testDiscoveredTypesStrictBooleans()
{
final InputSource inputSource = new InlineInputSource(Strings.join(STR_JSON_ROWS, '\n'));
final SamplerResponse response = inputSourceSampler.sample(
inputSource,
new JsonInputFormat(null, null, null, null, null),
new DataSchema(
"test",
new TimestampSpec("t", null, null),
DimensionsSpec.builder().useSchemaDiscovery(true).build(),
null,
null,
null
),
null
);

Assert.assertEquals(6, response.getNumRowsRead());
Assert.assertEquals(5, response.getNumRowsIndexed());
Assert.assertEquals(6, response.getData().size());
Assert.assertEquals(
ImmutableList.of(
new StringDimensionSchema("string"),
new LongDimensionSchema("long"),
new DoubleDimensionSchema("double"),
new LongDimensionSchema("bool"),
new StringDimensionSchema("variant"),
new AutoTypeColumnSchema("array"),
new AutoTypeColumnSchema("nested")
),
response.getLogicalDimensions()
);

Assert.assertEquals(
ImmutableList.of(
new AutoTypeColumnSchema("string"),
new AutoTypeColumnSchema("long"),
new AutoTypeColumnSchema("double"),
new AutoTypeColumnSchema("bool"),
new AutoTypeColumnSchema("variant"),
new AutoTypeColumnSchema("array"),
new AutoTypeColumnSchema("nested")
),
response.getPhysicalDimensions()
);
Assert.assertEquals(
RowSignature.builder()
.addTimeColumn()
.add("string", ColumnType.STRING)
.add("long", ColumnType.LONG)
.add("double", ColumnType.DOUBLE)
.add("bool", ColumnType.LONG)
.add("variant", ColumnType.STRING)
.add("array", ColumnType.LONG_ARRAY)
.add("nested", ColumnType.NESTED_DATA)
.build(),
response.getLogicalSegmentSchema()
);
}

@Test
public void testTypesClassicDiscovery()
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -561,9 +561,6 @@ public static ExprEval ofType(@Nullable ExpressionType type, @Nullable Object va
return ofDouble((Number) value);
}
if (value instanceof Boolean) {
if (ExpressionProcessing.useStrictBooleans()) {
return ofLongBoolean((Boolean) value);
}
return ofDouble(Evals.asDouble((Boolean) value));
}
if (value instanceof String) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,11 @@ public ExpressionProcessingConfig(
@JsonProperty("homogenizeNullMultiValueStringArrays") @Nullable Boolean homogenizeNullMultiValueStringArrays
)
{
this.useStrictBooleans = getWithPropertyFallbackFalse(useStrictBooleans, NULL_HANDLING_LEGACY_LOGICAL_OPS_STRING);
this.useStrictBooleans = getWithPropertyFallback(
useStrictBooleans,
NULL_HANDLING_LEGACY_LOGICAL_OPS_STRING,
"true"
);
this.processArraysAsMultiValueStrings = getWithPropertyFallbackFalse(
processArraysAsMultiValueStrings,
PROCESS_ARRAYS_AS_MULTIVALUE_STRINGS_CONFIG_STRING
Expand Down Expand Up @@ -78,6 +82,11 @@ public boolean isHomogenizeNullMultiValueStringArrays()

private static boolean getWithPropertyFallbackFalse(@Nullable Boolean value, String property)
{
return value != null ? value : Boolean.valueOf(System.getProperty(property, "false"));
return getWithPropertyFallback(value, property, "false");
}

private static boolean getWithPropertyFallback(@Nullable Boolean value, String property, String fallback)
{
return value != null ? value : Boolean.valueOf(System.getProperty(property, fallback));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -1382,9 +1382,6 @@ public void testBestEffortOf()
// strings
assertBestEffortOf("stringy", ExpressionType.STRING, "stringy");

// by default, booleans are handled as strings
assertBestEffortOf(true, ExpressionType.STRING, "true");
assertBestEffortOf(Arrays.asList(true, false), ExpressionType.STRING_ARRAY, new Object[]{"true", "false"});

assertBestEffortOf(
new byte[]{1, 2, 3, 4},
Expand All @@ -1396,11 +1393,15 @@ public void testBestEffortOf()
assertBestEffortOf(1L, ExpressionType.LONG, 1L);
assertBestEffortOf(1, ExpressionType.LONG, 1L);

// by default, booleans are handled as longs
assertBestEffortOf(true, ExpressionType.LONG, 1L);
assertBestEffortOf(Arrays.asList(true, false), ExpressionType.LONG_ARRAY, new Object[]{1L, 0L});

try {
// in strict boolean mode, they are longs
ExpressionProcessing.initializeForStrictBooleansTests(true);
assertBestEffortOf(true, ExpressionType.LONG, 1L);
assertBestEffortOf(Arrays.asList(true, false), ExpressionType.LONG_ARRAY, new Object[]{1L, 0L});
// in non-strict boolean mode, they are strings
ExpressionProcessing.initializeForStrictBooleansTests(false);
assertBestEffortOf(true, ExpressionType.STRING, "true");
assertBestEffortOf(Arrays.asList(true, false), ExpressionType.STRING_ARRAY, new Object[]{"true", "false"});
}
finally {
// reset
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,7 @@
import org.apache.druid.java.util.common.logger.Logger;
import org.apache.druid.math.expr.vector.ExprEvalVector;
import org.apache.druid.testing.InitializedNullHandlingTest;
import org.junit.AfterClass;
import org.junit.Assert;
import org.junit.BeforeClass;
import org.junit.Test;

import javax.annotation.Nullable;
Expand Down Expand Up @@ -64,18 +62,6 @@ public class VectorExprSanityTest extends InitializedNullHandlingTest
.put("boolString2", ExpressionType.STRING)
.build();

@BeforeClass
public static void setupTests()
{
ExpressionProcessing.initializeForStrictBooleansTests(true);
}

@AfterClass
public static void teardownTests()
{
ExpressionProcessing.initializeForTests();
}

@Test
public void testUnaryOperators()
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,10 @@
import org.apache.druid.java.util.common.logger.Logger;
import org.apache.druid.math.expr.Expr;
import org.apache.druid.math.expr.ExprEval;
import org.apache.druid.math.expr.ExpressionProcessing;
import org.apache.druid.math.expr.ExpressionType;
import org.apache.druid.math.expr.VectorExprSanityTest;
import org.apache.druid.testing.InitializedNullHandlingTest;
import org.joda.time.DateTime;
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;

import java.util.Map;
Expand Down Expand Up @@ -62,18 +59,6 @@ public class VectorExpressionsSanityTest extends InitializedNullHandlingTest
.put("boolString2", ExpressionType.STRING)
.build();

@BeforeClass
public static void setupTests()
{
ExpressionProcessing.initializeForStrictBooleansTests(true);
}

@AfterClass
public static void teardownTests()
{
ExpressionProcessing.initializeForTests();
}

static void testExpression(String expr, Expr parsed, Map<String, ExpressionType> types)
{
log.debug("[%s]", expr);
Expand Down
Loading