Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable remaining excluded Timeout* tests in microprofile-rest-client TCK for Java 16+ #19559

Merged
merged 1 commit into from
Aug 23, 2021

Conversation

jaikiran
Copy link
Member

@jaikiran jaikiran commented Aug 21, 2021

Fixes #17093

JDK 13+ re-implemented the internal implementation details of the Socket API as part of https://openjdk.java.net/jeps/353. This implementation change now leads to java.net.NoRouteToHostException instead of the java.net.ConnectException in previous Java versions.

These specific TCK tests which are failing in Java 16+, are intentionally trying to connect to a host/port combination that isn't accessible (http://microprofile.io:1234/null)[1]. In the RestEasy client implementation, Apache HTTP client (4.5.x) gets used and that is the library which deals with socket connections. One part of that code deals with HTTP request retries[2]. This DefaultHttpRequestRetryHandler in the Apache HTTP client library has a logic which decides if a particular exception during the HTTP request handling needs to be retried. As can be seen in [2], it does not retry for a specific set of exception types. java.net.ConnectException is one such exception type where it does not retry. Since until Java 13, this is the exception that gets thrown for Socket.connect(...) calls against this specific URL, those failures aren't retried and the tests complete in a timely fashion. However, in Java 16, since a java.net.NoRouteToHostException gets thrown in this flow and since that exception type isn't marked as "do not retry exception type", this retry handler initiates a retry of this HTTP request. This it does for 3 times (from what I can see in the logs/code). As a result, this leads to an increased "wait time" (due to the wait time for each of the 3 attempts as against just 1 attempt) and the test goes past the expected timeout.

The OpenJDK team, as noted in the JEP as well as the release notes[3], states that efforts have been made to keep this change backward compatible as much as possible, but there are chances that a different type of SocketException might now get thrown. Do note that the new java.net.NoRouteToHostException that is now getting thrown is still a IOException (and SocketException), so it isn't breaking any API contract defined by the Socket.connect(...) javadoc. I'll bring this up separately with the OpenJDK team to see if they want to continue this changed expection type behaviour. Until then, this commit uses the jdk.net.usePlainSocketImpl=true system property as recommended in [3] to use the old implementation of the Socket API. That should help us have these tests running against Java 16+.

In any case, this workaround shouldn't stay for too long, since the real fix/change will either be in JDK itself or in the Apache HTTP client library which needs to decide how it wants to address this change.

[1] https://github.com/eclipse/microprofile-rest-client/blob/master/tck/src/main/java/org/eclipse/microprofile/rest/client/tck/timeout/TimeoutTestBase.java#L41
[2] https://github.com/apache/httpcomponents-client/blob/4.5.x/httpclient/src/main/java/org/apache/http/impl/client/DefaultHttpRequestRetryHandler.java#L102
[3] https://bugs.openjdk.java.net/browse/JDK-8223354

@famod
Copy link
Member

famod commented Aug 21, 2021

Thanks for looking into this and for providing such a thorough explanation!

I suppose if neither JDK nor Apache HTTP client want to change anything, the TCK has to adapt?

Copy link
Member

@famod famod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the profile is relevant for JDK 13+, but I don't think anyone uses JDK 15 or lower to build Quarkus (except 11 of course).

@jaikiran
Copy link
Member Author

I suppose if neither JDK nor Apache HTTP client want to change anything, the TCK has to adapt?

I don't think the TCK should have to do any changes. It already has the necessary timeout "cushions" to give some grace period. The issue, IMO, is in the implementations of the spec, in this case RestEasy which uses Apache HTTP client underneath. So if this isn't fixed in JDK or Apache HTTP client, then I think the place to handle this will be the RestEasy code.

There's now a open discussion about this in the OpenJDK mailing list here https://mail.openjdk.java.net/pipermail/net-dev/2021-August/016409.html

@famod
Copy link
Member

famod commented Aug 22, 2021

I think this is definitely better than the current state!
Given the "global" effect of that property, I was just thinking that it might be better to set it only for the timeout tests, via a separate surefire execution?

@jaikiran
Copy link
Member Author

Hello @famod, I've now updated this PR to run only those tests with that system property.

@gsmet gsmet merged commit 7fb0070 into quarkusio:main Aug 23, 2021
@quarkus-bot quarkus-bot bot added this to the 2.3 - main milestone Aug 23, 2021
@jaikiran jaikiran deleted the 17093 branch August 23, 2021 09:08
@jaikiran
Copy link
Member Author

In any case, this workaround shouldn't stay for too long, since the real fix/change will either be in JDK itself or in the Apache HTTP client library which needs to decide how it wants to address this change.

This issue has now been addressed in Apache HTTP client library apache/httpcomponents-client#311. Whenever they release the next version of the library and we upgrade to it, we should be able to remove the current workaround we have in place.

@famod
Copy link
Member

famod commented Oct 3, 2021

@jaikiran FYI, our EA JDK job that has moved to Java 18 recently is failing "again" at this point because the flag was removed in Java 18: https://bugs.openjdk.java.net/browse/JDK-8253119
Sadly, apache/httpcomponents-client#311 hasn't been released yet.
So I'll have to exclude those tests on Java 18 for now.

@ok2c
Copy link

ok2c commented Oct 6, 2021

@jaikiran Realistically there have not been enough changes in 4.5.x to warrant a release. Please consider moving to 5.1 for more regualr releases.

@jaikiran
Copy link
Member Author

jaikiran commented Oct 7, 2021

Thank you @ok2c.

@gsmet - Just a FYI - this essentially will mean RESTEasy will have to upgrade its internal dependency on Apache HTTPClient from 4.5.x to 5.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable TCKS rest-client tests on JDK 16+
4 participants