-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IPAddress java library as dependency and migrate IPv4 functions to use the new library. #11634
Conversation
This pull request introduces 1 alert and fixes 2 when merging 85064d2 into 2a658ac - view on LGTM.com new alerts:
fixed alerts:
|
This pull request introduces 1 alert and fixes 2 when merging bb576d5 into 2a658ac - view on LGTM.com new alerts:
fixed alerts:
|
This pull request introduces 1 alert and fixes 2 when merging 70fb9d8f6a61f66cbaed1e8124f2a14f0da9913f into 2a658ac - view on LGTM.com new alerts:
fixed alerts:
|
70fb9d8
to
09cb838
Compare
This pull request introduces 1 alert and fixes 2 when merging 09cb838 into 2a658ac - view on LGTM.com new alerts:
fixed alerts:
|
hi @jihoonson, @asdf2014, the build is failing in the stage 2 starting with these jobs:
It appears that mysql is not installed/found. /bin/sh: 1: /etc/init.d/mysql: not found I see a few jobs as part of other pull requests fail due to the same reason as well. Is this a known issue? Thanks! |
Hi @abhishekrb19 , we're on it. Please refer to #11638. |
This pull request introduces 1 alert and fixes 2 when merging 68acd77 into a096888 - view on LGTM.com new alerts:
fixed alerts:
|
@asdf2014, @jihoonson, @clintropolis, this change-set is ready for review. Please take a look whenever you get a chance. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very sorry for the delayed review!
this does seem like a nicer and more flexible libary 👍
Have you had a chance to measure if there is any performance difference after the changes in this PR? I did a quick look and didn't spot any obvious pre-existing benchmarks for the ipv4 functions, but there might be something in https://github.com/apache/druid/tree/master/benchmarks/src/test/java/org/apache/druid/benchmark to either modify or model a new benchmark from, to make sure there is no performance regression by making this change.
Expr arg = args.get(0); | ||
|
||
class IPv4AddressMatchExpr extends ExprMacroTable.BaseScalarUnivariateMacroFunctionExpr | ||
{ | ||
private final SubnetUtils.SubnetInfo subnetInfo; | ||
private final IPAddressString subnetString; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this will be checked every time we check a match, should we maybe store both IPAddressString
and IPv4Address
so we can do whichever contains
check is more efficient? (assuming we split long and string handling to not both convert to IPv4Address
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPAddressString containment checks are optimized to be efficient (in some cases more efficient than going straight to IPv4Address). So you could use IPv4Address or IPAddressString, but I'd avoid storing both, there's not much benefit to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree with @seancfoley, the containment checks are already efficient in most cases. So I'd will leave it as this.
@@ -104,19 +105,21 @@ public ExprEval eval(final ObjectBinding bindings) | |||
|
|||
private boolean isStringMatch(String stringValue) | |||
{ | |||
return IPv4AddressExprUtils.isValidAddress(stringValue) && subnetInfo.isInRange(stringValue); | |||
IPv4Address iPv4Address = IPv4AddressExprUtils.parse(stringValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this parse to an IPAddressString
instead of an IPv4Address
so it can be checked against the subnetString
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you intend to support IPv6 here at some point in time, you'd want to use IPAddressString here (or possibly IPAddress, super class of IPv4Address and IPv6Address). But since this code here is currently IPv4-specific at this time, you could stick with IPv4Address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think IPv6 would be a separate class on its own. As things stand now, IPv4Address works well. We can perhaps revisit this when we have the IPv6 functionality in place. :)
This pull request introduces 1 alert and fixes 2 when merging f2b3069 into 2c79d28 - view on LGTM.com new alerts:
fixed alerts:
|
@clintropolis, sorry the delayed response. I've added a benchmark and the results are in the PR summary now. Please let me know when you've a chance to take a look. Thanks! |
This pull request introduces 1 alert and fixes 2 when merging 34d4769 into 2c79d28 - view on LGTM.com new alerts:
fixed alerts:
|
This pull request introduces 1 alert and fixes 2 when merging 4b97c16 into f24e9c6 - view on LGTM.com new alerts:
fixed alerts:
|
CI is failing at the indexing modules test job (https://app.travis-ci.com/github/apache/druid/jobs/567005675). The failure seems unrelated to my changes here in this PR and I see similar failures in other PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 very sorry for the delay finishing review
Thanks for collecting the measurements, the numbers mostly look good, with the exception of ipv4_match
against LONG
typed inputs, but I think the positives of this library outweigh this performance impact, and it doesn't seem worth the trouble to use both libraries.
I imagine this match performance could be improved one way or another (maybe just do bitwise operations directly?) but I think its fine to do this as future work.
This change-set does the following:
IPV4_MATCH()
,IPV4_PARSE()
,IPV4_STRINGIFY()
to use the above packageinet.ipaddr
instead of theorg.apache.commons.net
,com.google.common.net
andjava.net
packages.Rationale:
org.apache.commons.net.util.SubnetUtils
class doesn't support IPv6 addresses yet. Please see the open JIRA https://issues.apache.org/jira/browse/NET-405. In order to support IPV6 address functions, going down the path of writing custom regexes, parser and longest string matching functions can be tricky given the various IPv6 address formats. The newly addedipaddress
library supports these right out of the box, so we could leverage that.Next steps:
If this looks good, I will go ahead and make an attempt at adding the IPv6 address functions. Open to suggestions. Thanks!
Benchmark comparison between the new ipaddress package and the old subnet utils package:
As we can see above, in most cases, the new
ipaddress
library does better in most cases. There might be more room for optimization, but looks good overall IMO.This PR has: