Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IPAddress java library as dependency and migrate IPv4 functions to use the new library. #11634

Merged
merged 25 commits into from
May 12, 2022

Conversation

abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Aug 25, 2021

This change-set does the following:

  • Include a new IPAddress java library dependency to handle IP addresses.
  • Migrate the IPv4 address functions -- IPV4_MATCH(), IPV4_PARSE(), IPV4_STRINGIFY() to use the above package inet.ipaddr instead of the org.apache.commons.net, com.google.common.net and java.net packages.

Rationale:

  • The org.apache.commons.net.util.SubnetUtils class doesn't support IPv6 addresses yet. Please see the open JIRA https://issues.apache.org/jira/browse/NET-405. In order to support IPV6 address functions, going down the path of writing custom regexes, parser and longest string matching functions can be tricky given the various IPv6 address formats. The newly added ipaddress library supports these right out of the box, so we could leverage that.
  • In the future, if there is a desire to expose the IP address parsing options as run time context parameters or similar, it can be plumbed in relatively easily since the java library provides flexibility there.

Next steps:
If this looks good, I will go ahead and make an attempt at adding the IPv6 address functions. Open to suggestions. Thanks!

Benchmark comparison between the new ipaddress package and the old subnet utils package:

IPv4AddressBenchmark.parseStringUsingIpAddr                        10  avgt    5     6185.716 ±    220.531  ns/op
IPv4AddressBenchmark.parseStringUsingIpAddr                       100  avgt    5    64528.912 ±  10058.256  ns/op
IPv4AddressBenchmark.parseStringUsingIpAddr                      1000  avgt    5   617576.564 ± 129451.314  ns/op

IPv4AddressBenchmark.parseStringUsingSubnetUtils                   10  avgt    5     7972.548 ±   3246.404  ns/op
IPv4AddressBenchmark.parseStringUsingSubnetUtils                  100  avgt    5    51201.575 ±  68695.599  ns/op
IPv4AddressBenchmark.parseStringUsingSubnetUtils                 1000  avgt    5   691529.605 ± 704383.570  ns/op

IPv4AddressBenchmark.parseLongUsingIpAddr                          10  avgt    5      258.954 ±      2.800  ns/op
IPv4AddressBenchmark.parseLongUsingIpAddr                         100  avgt    5     2466.993 ±      7.883  ns/op
IPv4AddressBenchmark.parseLongUsingIpAddr                        1000  avgt    5    24735.502 ±    228.493  ns/op

IPv4AddressBenchmark.parseLongUsingSubnetUtils                     10  avgt    5        8.147 ±      0.549  ns/op
IPv4AddressBenchmark.parseLongUsingSubnetUtils                    100  avgt    5       57.435 ±      2.224  ns/op
IPv4AddressBenchmark.parseLongUsingSubnetUtils                   1000  avgt    5      566.308 ±     31.257  ns/op


IPv4AddressBenchmark.stringContainsUsingIpAddr                     10  avgt    5    10829.294 ±    929.158  ns/op
IPv4AddressBenchmark.stringContainsUsingIpAddr                    100  avgt    5   105955.838 ±   2047.309  ns/op
IPv4AddressBenchmark.stringContainsUsingIpAddr                   1000  avgt    5  1070301.262 ±  95573.228  ns/op

IPv4AddressBenchmark.stringContainsUsingSubnetUtils                10  avgt    5    14650.519 ±    755.609  ns/op
IPv4AddressBenchmark.stringContainsUsingSubnetUtils               100  avgt    5   151821.759 ±  44261.354  ns/op
IPv4AddressBenchmark.stringContainsUsingSubnetUtils              1000  avgt    5  1508744.485 ±  61303.553  ns/op

IPv4AddressBenchmark.toLongUsingIpAddr                             10  avgt    5        9.470 ±      0.043  ns/op
IPv4AddressBenchmark.toLongUsingIpAddr                            100  avgt    5       62.052 ±      3.588  ns/op
IPv4AddressBenchmark.toLongUsingIpAddr                           1000  avgt    5     2042.493 ±     77.128  ns/op

IPv4AddressBenchmark.toLongUsingSubnetUtils                        10  avgt    5     2014.053 ±    151.281  ns/op
IPv4AddressBenchmark.toLongUsingSubnetUtils                       100  avgt    5    19694.955 ±   6740.819  ns/op
IPv4AddressBenchmark.toLongUsingSubnetUtils                      1000  avgt    5   207336.015 ±   9011.625  ns/op

IPv4AddressBenchmark.longContainsUsingIpAddr                       10  avgt    5     3959.837 ±     11.629  ns/op
IPv4AddressBenchmark.longContainsUsingIpAddr                      100  avgt    5    44261.551 ±    191.061  ns/op
IPv4AddressBenchmark.longContainsUsingIpAddr                     1000  avgt    5   441864.580 ±   1245.393  ns/op

IPv4AddressBenchmark.longContainsUsingSubnetUtils                  10  avgt    5     2104.853 ±     41.862  ns/op
IPv4AddressBenchmark.longContainsUsingSubnetUtils                 100  avgt    5    20934.168 ±    382.555  ns/op
IPv4AddressBenchmark.longContainsUsingSubnetUtils                1000  avgt    5   198465.549 ±   2022.392  ns/op

As we can see above, in most cases, the new ipaddress library does better in most cases. There might be more room for optimization, but looks good overall IMO.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@lgtm-com
Copy link

lgtm-com bot commented Aug 25, 2021

This pull request introduces 1 alert and fixes 2 when merging 85064d2 into 2a658ac - view on LGTM.com

new alerts:

  • 1 for User-controlled data in numeric cast

fixed alerts:

  • 2 for User-controlled data in numeric cast

@lgtm-com
Copy link

lgtm-com bot commented Aug 25, 2021

This pull request introduces 1 alert and fixes 2 when merging bb576d5 into 2a658ac - view on LGTM.com

new alerts:

  • 1 for User-controlled data in numeric cast

fixed alerts:

  • 2 for User-controlled data in numeric cast

@lgtm-com
Copy link

lgtm-com bot commented Aug 26, 2021

This pull request introduces 1 alert and fixes 2 when merging 70fb9d8f6a61f66cbaed1e8124f2a14f0da9913f into 2a658ac - view on LGTM.com

new alerts:

  • 1 for User-controlled data in numeric cast

fixed alerts:

  • 2 for User-controlled data in numeric cast

@lgtm-com
Copy link

lgtm-com bot commented Aug 26, 2021

This pull request introduces 1 alert and fixes 2 when merging 09cb838 into 2a658ac - view on LGTM.com

new alerts:

  • 1 for User-controlled data in numeric cast

fixed alerts:

  • 2 for User-controlled data in numeric cast

@abhishekrb19
Copy link
Contributor Author

hi @jihoonson, @asdf2014, the build is failing in the stage 2 starting with these jobs:

32932.42 (Compile=openjdk8, Run=openjdk8) batch index integration test
32932.43 (Compile=openjdk8, Run=openjdk8) batch index integration test with Indexer
...

It appears that mysql is not installed/found.

/bin/sh: 1: /etc/init.d/mysql: not found

I see a few jobs as part of other pull requests fail due to the same reason as well. Is this a known issue? Thanks!

@asdf2014
Copy link
Member

Hi @abhishekrb19 , we're on it. Please refer to #11638.

@lgtm-com
Copy link

lgtm-com bot commented Aug 30, 2021

This pull request introduces 1 alert and fixes 2 when merging 68acd77 into a096888 - view on LGTM.com

new alerts:

  • 1 for User-controlled data in numeric cast

fixed alerts:

  • 2 for User-controlled data in numeric cast

@abhishekrb19
Copy link
Contributor Author

@asdf2014, @jihoonson, @clintropolis, this change-set is ready for review. Please take a look whenever you get a chance. Thanks!

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very sorry for the delayed review!

this does seem like a nicer and more flexible libary 👍

Have you had a chance to measure if there is any performance difference after the changes in this PR? I did a quick look and didn't spot any obvious pre-existing benchmarks for the ipv4 functions, but there might be something in https://github.com/apache/druid/tree/master/benchmarks/src/test/java/org/apache/druid/benchmark to either modify or model a new benchmark from, to make sure there is no performance regression by making this change.

Expr arg = args.get(0);

class IPv4AddressMatchExpr extends ExprMacroTable.BaseScalarUnivariateMacroFunctionExpr
{
private final SubnetUtils.SubnetInfo subnetInfo;
private final IPAddressString subnetString;
Copy link
Member

@clintropolis clintropolis Oct 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this will be checked every time we check a match, should we maybe store both IPAddressString and IPv4Address so we can do whichever contains check is more efficient? (assuming we split long and string handling to not both convert to IPv4Address).

Copy link

@seancfoley seancfoley Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPAddressString containment checks are optimized to be efficient (in some cases more efficient than going straight to IPv4Address). So you could use IPv4Address or IPAddressString, but I'd avoid storing both, there's not much benefit to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with @seancfoley, the containment checks are already efficient in most cases. So I'd will leave it as this.

@@ -104,19 +105,21 @@ public ExprEval eval(final ObjectBinding bindings)

private boolean isStringMatch(String stringValue)
{
return IPv4AddressExprUtils.isValidAddress(stringValue) && subnetInfo.isInRange(stringValue);
IPv4Address iPv4Address = IPv4AddressExprUtils.parse(stringValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this parse to an IPAddressString instead of an IPv4Address so it can be checked against the subnetString directly?

Copy link

@seancfoley seancfoley Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you intend to support IPv6 here at some point in time, you'd want to use IPAddressString here (or possibly IPAddress, super class of IPv4Address and IPv6Address). But since this code here is currently IPv4-specific at this time, you could stick with IPv4Address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think IPv6 would be a separate class on its own. As things stand now, IPv4Address works well. We can perhaps revisit this when we have the IPv6 functionality in place. :)

@lgtm-com
Copy link

lgtm-com bot commented Apr 12, 2022

This pull request introduces 1 alert and fixes 2 when merging f2b3069 into 2c79d28 - view on LGTM.com

new alerts:

  • 1 for User-controlled data in numeric cast

fixed alerts:

  • 2 for User-controlled data in numeric cast

@abhishekrb19
Copy link
Contributor Author

@clintropolis, sorry the delayed response. I've added a benchmark and the results are in the PR summary now. Please let me know when you've a chance to take a look. Thanks!

@lgtm-com
Copy link

lgtm-com bot commented Apr 13, 2022

This pull request introduces 1 alert and fixes 2 when merging 34d4769 into 2c79d28 - view on LGTM.com

new alerts:

  • 1 for User-controlled data in numeric cast

fixed alerts:

  • 2 for User-controlled data in numeric cast

@lgtm-com
Copy link

lgtm-com bot commented Apr 13, 2022

This pull request introduces 1 alert and fixes 2 when merging 4b97c16 into f24e9c6 - view on LGTM.com

new alerts:

  • 1 for User-controlled data in numeric cast

fixed alerts:

  • 2 for User-controlled data in numeric cast

@abhishekrb19
Copy link
Contributor Author

CI is failing at the indexing modules test job (https://app.travis-ci.com/github/apache/druid/jobs/567005675). The failure seems unrelated to my changes here in this PR and I see similar failures in other PRs.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 very sorry for the delay finishing review

Thanks for collecting the measurements, the numbers mostly look good, with the exception of ipv4_match against LONG typed inputs, but I think the positives of this library outweigh this performance impact, and it doesn't seem worth the trouble to use both libraries.

I imagine this match performance could be improved one way or another (maybe just do bitwise operations directly?) but I think its fine to do this as future work.

@clintropolis clintropolis merged commit 9177515 into apache:master May 12, 2022
@abhishekagarwal87 abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022
@gl gl mentioned this pull request Aug 25, 2023
@abhishekrb19 abhishekrb19 deleted the ipnet_package branch October 31, 2023 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants