-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add optional 'castToType' parameter to 'auto' column schema #15417
Conversation
@@ -107,7 +107,7 @@ public Object getObject() | |||
if (expressionType == null) { | |||
return delegateDimensionSelector.getObject(); | |||
} | |||
return ExprEval.ofType(expressionType, delegateDimensionSelector.getObject()).value(); | |||
return ExprEval.bestEffortOf(delegateDimensionSelector.getObject()).castTo(expressionType).value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the reason behind this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bug i noticed while testing some stuff with MSQ, ofType
assumes that the typing information is absolutely correct and so will do what you tell it, which ends up doing some not great things. For example, without bestEffort/explicit cast, ingesting some column specifieid as ARRAY<STRING>
in an extern in MSQ that has some rows with arrays of nested objects will do something like this
which should instead be a parse exception
@@ -73,7 +73,7 @@ public static DimensionSchema createDimensionSchema( | |||
.getDimensionSchema(capabilities); | |||
} | |||
|
|||
return new AutoTypeColumnSchema(column); | |||
return new AutoTypeColumnSchema(column, type != null && type.is(ValueType.COMPLEX) ? type : null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be
type != null && !type.is(ValueType.COMPLEX)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you can't cast to complex types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops yea, or should just leave it null
return new EncodedKeyComponent<>(StructuredData.wrap(eval.value()), effectiveSizeBytes); | ||
} | ||
catch (IAE invalidCast) { | ||
throw new ParseException(eval.asString(), invalidCast, "Cannot coerce to requested type [%s]", castToType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we include the field / column name here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems reasonable, but will make sure whatever catches these doesn't already do something like that
processing/src/main/java/org/apache/druid/segment/AutoTypeColumnIndexer.java
Show resolved
Hide resolved
@@ -148,20 +152,26 @@ public void writeMergedValueDictionary(List<IndexableAdapter> adapters) throws I | |||
final FieldTypeInfo.MutableTypeSet rootTypes = mergedFields.get(NestedPathFinder.JSON_PATH_ROOT); | |||
final boolean rootOnly = mergedFields.size() == 1 && rootTypes != null; | |||
|
|||
final ColumnType explicitType; | |||
if (castToType != null && (castToType.isPrimitive() || castToType.isPrimitiveArray())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you actually need the (castToType.isPrimitive() || castToType.isPrimitiveArray())
check here or is it just defensive coding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defensive
processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…5417) * auto but.. with an expected type
…5417) * auto but.. with an expected type
Description
This PR updates the 'auto' schema to add a new
castToType
parameter which accepts aColumnType
to allow forcing the 'auto' indexer to cast values to a specific type. One of the main use cases for this is MSQ array ingestion, which can now specify the exact desired type, but also allows known type columns to use all of the new features provided by 'auto' columns such as numeric indexes and array type columns for any other native ingestion. This only works for primitive/primitive array types, and will be ignored if any other type is used. The column part serde stores an extraenforceLogicalType
which is used to ensure the correct casting schema is created for stuff like auto-compaction.Release note
tbd
This PR has: