-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add range filtering support for iceberg ingestion #15782
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall! I've left a few minor comments. Also, the docs static CI checks are failing.
docs/ingestion/input-sources.md
Outdated
|
||
|Property|Description|Required| | ||
|--------|-----------|---------| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|--------|-----------|---------| | |
|--------|-----------|---------| |
@JsonProperty("upperOpen") @Nullable Boolean upperOpen | ||
) | ||
{ | ||
Preconditions.checkNotNull(filterColumn, "You must specify a filter column on the range filter"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using DruidException
for these validation checks:
Preconditions.checkNotNull(filterColumn, "You must specify a filter column on the range filter"); | |
if (filterColumn == null) { | |
throw InvalidInput.exception("filterColumn cannot be null."); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can refactor this in a separate PR since the other filters also need to start using DruidException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me, thanks!
) | ||
{ | ||
Preconditions.checkNotNull(filterColumn, "You must specify a filter column on the range filter"); | ||
Preconditions.checkArgument(lower != null || upper != null, "Both lower and upper bounds cannot be empty"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preconditions.checkArgument(lower != null || upper != null, "Both lower and upper bounds cannot be empty"); | |
if (lower == null && upper == null) { | |
throw InvalidInput.exception("Both lower and upper bounds cannot be null."); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
@Override | ||
public Expression getFilterExpression() | ||
{ | ||
List<Expression> expressions = new ArrayList<>(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since one of lower
and upper
can be null
List<Expression> expressions = new ArrayList<>(2); | |
List<Expression> expressions = new ArrayList<>(); |
@JsonSubTypes.Type(name = "or", value = IcebergOrFilter.class) | ||
@JsonSubTypes.Type(name = "or", value = IcebergOrFilter.class), | ||
@JsonSubTypes.Type(name = "range", value = IcebergRangeFilter.class) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preconditions.checkNotNull(filterColumn, "You must specify a filter column on the range filter"); | ||
Preconditions.checkArgument(lower != null || upper != null, "Both lower and upper bounds cannot be empty"); | ||
this.filterColumn = filterColumn; | ||
this.lowerOpen = lowerOpen != null ? lowerOpen : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we also add checks to validate if lowerOpen
is specified if lower
is null as it doesn't make sense? Ditto for upperOpen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be checked, but are you suggesting that the ingest should fail in that scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think so. I will leave it up to you though if the stricter validation is required
docs/ingestion/input-sources.md
Outdated
|filterColumn|The column name from the iceberg table schema based on which range filtering needs to happen.|yes| | ||
|lower|Lower bound value to match.|no. At least one of `lower` or `upper` must not be null.| | ||
|upper|Upper bound value to match. |no. At least one of `lower` or `upper` must not be null.| | ||
|lowerOpen|Boolean indicating if lower bound is open in the interval of values defined by the range (">" instead of ">="). |no| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would also be good to specify the defaults explicitly via a separate column or next to the "no" in the required column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 after CI. Thanks, @a2l007!
* Add range filtering support for iceberg ingestion * Docs formatting * Spelling
* Add range filtering support for iceberg ingestion * Docs formatting * Spelling
Description
Adds a new iceberg ingestion filter of type
range
which can be used to filter on ranges of column values. It supports lower and/or upper bounds with<
,<=
,>
, and>=
Example spec:
which is equivalent to
45 < price <= 50
This PR also fixes a dependency related issue which may be encountered while talking to Kerberos secured HDFS warehouses.
Release note
Adds a new iceberg ingestion filter of type
range
which can be used to filter on ranges of column values.This PR has: