-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add bloom filter druid expression #6904
Conversation
…ExpressionVirtualColumn and ExpressionDimFilter and sql expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM 👍 Also, I left few comments.
The bloom filter extension also adds a bloom filter [Druid expression](../../misc/math-expr.html) which shares syntax | ||
with the SQL operator. | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the sql
type for this block
with the SQL operator. | ||
|
||
``` | ||
bloom_filter_test(<expr>, "<serialized_bytes_for_BloomKFilter>") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use '<serialized_bytes_for_BloomKFilter>'
instead of "<serialized_bytes_for_BloomKFilter>"
docs/content/misc/math-expr.md
Outdated
@@ -60,6 +60,7 @@ The following built-in functions are available. | |||
|like|like(expr, pattern[, escape]) is equivalent to SQL `expr LIKE pattern`| | |||
|case_searched|case_searched(expr1, result1, \[\[expr2, result2, ...\], else-result\])| | |||
|case_simple|case_simple(expr, value1, result1, \[\[value2, result2, ...\], else-result\])| | |||
|bloom_filter_test|bloom_filter_test(expr,filter) tests the value of 'expr' against 'filter', a bloom filter serialized as a base64 string. See [bloom filter extension](../development/extensions-core/bloom-filter.html) documentation for additional details.| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a blank character between expr
and filter
, e.g., (expr, filter)
|
||
|
||
final String serializedFilter = filterExpr.getLiteralValue().toString(); | ||
final byte[] decoded = BaseEncoding.base64().decode(serializedFilter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use java.util.Base64.getDecoder().decode(serializedFilter)
is a better choice..
FYI, I found some benchmark results using the initial Java 8 release on the Xeon [email protected] Ghz
Name | Encode, 100 bytes | Decode, 100 bytes | Encode, 1000 bytes | Decode, 1000 bytes | Encode, 200000000 bytes | Decode, 200000000 bytes |
---|---|---|---|---|---|---|
JavaXmlImpl | 0.721 sec | 1.067 sec | 0.664 sec | 0.947 sec | 0.689 sec | 0.885 sec |
Java8Impl | 0.808 sec | 1.101 sec | 0.712 sec | 0.913 sec | 0.699 sec | 0.887 sec |
MiGBase64Impl | 0.887 sec | 2.113 sec | 0.788 sec | 1.978 sec | 0.812 sec | 1.928 sec |
IHarderImpl | 1.179 sec | 2.645 sec | 1.012 sec | 2.439 sec | 0.976 sec | 2.431 sec |
ApacheImpl | 4.113 sec | 4.733 sec | 2.308 sec | 2.667 sec | 2.205 sec | 2.552 sec |
GuavaImpl | 3.305 sec | 4.178 sec | 3.12 sec | 3.755 sec | 3.102 sec | 3.744 sec |
SunImpl | 11.153 sec | 8.281 sec | 3.992 sec | 4.533 sec | 3.289 sec | 4.096 sec |
filter.addBytes(null, 0, 0); | ||
} | ||
byte[] bytes = BloomFilterSerializersModule.bloomKFilterToBytes(filter); | ||
String base64 = Base64.encodeBase64String(bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same, use java.util.Base64.getEncoder().encodeToString(bytes)
would be better.
BloomKFilter filter = new BloomKFilter(1500); | ||
filter.addDouble(20.2); | ||
byte[] bytes = BloomFilterSerializersModule.bloomKFilterToBytes(filter); | ||
String base64 = Base64.encodeBase64String(bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same, use java.util.Base64.getEncoder().encodeToString(bytes)
would be better.
Thanks for the review and benchmarks @asdf2014 👍, switched to use |
@clintropolis You are welcome. Maybe we should replace all non JDK Base64 in the project and record it to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I only saw 2 other occurrences I think, both in tests, but probably a good idea to mark forbidden given those benchmarks 👍 |
Nevermind, I see a handful of commons base64 usage in places, so probably some improvements to be had there by making the switch. |
@clintropolis Yep, I think so. |
@asdf2014 It would also probably be worth considering driving these conversions through |
@clintropolis Yep, I agree with you. In addition, I found the above comment from JDK, so that we can also hold the instance of |
@clintropolis I raised up a PR #6913, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Adds a
bloom_filter_test
Druid expression to allow use inExpressionVirtualColumn
andExpressionDimFilter
, as well as SQL virtual column expression support.