Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of openjdk tests failed on test-aws-rhel8-x64-1, passed on other machines #2360

Closed
sophia-guo opened this issue Oct 18, 2021 · 14 comments
Closed

Comments

@sophia-guo
Copy link

Quite a few tests on test-aws-rhel8-x64-1.

Test_openjdk17_hs_extended.openjdk_x86-64_linux_testList_0 ❌ FAILURE ❌
jvm_compiler_1 => deep history 0/1 passed | possible issues
jdk_jmx_0 => deep history 0/1 passed | possible issues
jdk_net_0 => deep history 1/2 passed | possible issues
jdk_tools_0 => deep history 1/2 passed | possible issues
Post Test => deep history 2/3 passed | possible issues
jdk_jfr_1 => deep history 0/1 passed | possible issues
To make it easy for the infrastructure team to repeat and diagnose, please
answer the following questions:

Rerun jdk_jfr on same machine got the same failure.
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/359/
Rerun tests on different machines and passed.
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/361/

Rerun other tests on other machine and passed:
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/360/
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/358/

two others wait for the results:
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/362/
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/355/

@sxa
Copy link
Member

sxa commented Feb 6, 2023

Running full extended.openjdk in Grinder on x64 and s390x RHEL8 systems for comparison
Also running on the AWS RHEL7 system as that has had it's ci.role.test label removed.
As per comment elsewhere I will discuss the results with the internal team at Red Hat to progress this if it turns out to be generic across all RHEL8 systems.

@sxa sxa added this to the 2023-02 (February) milestone Feb 6, 2023
@sxa
Copy link
Member

sxa commented Feb 8, 2023

AWS RHEL7 stopped after 10h (Re-running with TIME_LIMIT=100)
AWS RHEL8 completed in 8h44
Marist RHEL8 completed in 9h05

@sxa
Copy link
Member

sxa commented Feb 21, 2023

Full extended.openjdk AWS RHEL7 run completed in 10h20m so just over the 10h limit and not too much slower than the other two machines. It ran 'green' unlike the two RHEL8 systems

@sxa
Copy link
Member

sxa commented Feb 21, 2023

For the two RHEL8 jobs, both failed as per #2900 and sun/security/tools/keytool/NssTest.java.NssTest and sun/security/tools/keytool/NssTest.java.NssTest

In addition the Marist machine failed the following:

Compression tests from jdk_imageio_0 are a file length issue so potentially a different algorithm being used. we've seen differences on zLinux compression tests in the past on certain distributions I believe.
The failure in the jlink tests from jdk_tools_0 is not immediaately obvious and would require further analysis

@sxa
Copy link
Member

sxa commented Feb 21, 2023

On the above tests it looks like only the three suites jdk_security3, jdk_imageio and jdk_tools have failed suggesting the others have been resolved or disabled

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Feb 8, 2024

Rerunning https://ci.adoptium.net/job/Grinder/8748/console

Only 2 tests fail

java/net/HttpURLConnection/HttpURLConnectionExpectContinueTest.java.HttpURLConnectionExpectContinueTest
compiler/loopopts/superword/TestMovingLoadBeforeStore.java.TestMovingLoadBeforeStore

Rerunning both tests for 10 iterations

java/net/HttpURLConnection/HttpURLConnectionExpectContinueTest.java.HttpURLConnectionExpectContinueTest
https://ci.adoptium.net/job/Grinder/8757/console

Rerunning with more heap space -Xmx1000m https://ci.adoptium.net/job/Grinder/8759/console

11:40:00  FINE: sun.net.www.MessageHeader@2e212dad10 pairs: {POST / HTTP/1.1: null}{Connection: Close}{Expect: 100-Continue}{Cache-Control: no-cache}{Pragma: no-cache}{User-Agent: Java/17.0.10}{Host: localhost:54321}{Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2}{Content-type: application/x-www-form-urlencoded}{Content-Length: 52}
11:40:00  java.net.SocketTimeoutException: Read timed out
11:40:00  	at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:288)
11:40:00  	at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:314)
11:40:00  	at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
11:40:00  	at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
11:40:00  	at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966)

compiler/loopopts/superword/TestMovingLoadBeforeStore.java.TestMovingLoadBeforeStore
https://ci.adoptium.net/job/Grinder/8761/console

CompileCommand: compileonly compiler/loopopts/superword/TestMovingLoadBeforeStore.test* bool compileonly = true
For random generator using seed: 4204409583826590688
To re-run test with same seed value please add "-Djdk.test.lib.random.seed=4204409583826590688" to command line.
Wrong: 17:-83 vs -84 from -85
Wrong: 18:99 vs 98 from 97
Wrong: 19:-56 vs -57 from -58
Wrong: 20:68 vs 67 from 66
Wrong: 21:-102 vs -103 from -104
...
java.lang.RuntimeException: wrong result for array a in test1
	at compiler.loopopts.superword.TestMovingLoadBeforeStore.verify(TestMovingLoadBeforeStore.java:70)
	at compiler.loopopts.superword.TestMovingLoadBeforeStore.main(TestMovingLoadBeforeStore.java:57)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Feb 9, 2024

Taking a bit of a closer look to what the test is actually doing, https://github.com/adoptium/jdk17u/blob/master/test/hotspot/jtreg/compiler/loopopts/superword/TestMovingLoadBeforeStore.java,

            byte[] a_ref = a.clone();
            byte[] a_res = a.clone();

    static void verify(String name, byte[] ref, byte[] res, byte[] orig) {
        boolean fail = false;
        for (int j = 0; j < ref.length; j++) {
            if (ref[j] != res[j]) {
                System.out.println("Wrong: " + j + ":" + ref[j] + " vs " + res[j] + " from " + orig[j]);
                fail = true;
            }
        }
        if (fail) {
            throw new RuntimeException("wrong result for array " + name);
        }
    }

One array is created, then cloned twice using clone(). The test then goes through all 3 arrays and and checks if each element is the same. If not, the test will fail

Failed test outputs look like this

Wrong: 116:-57 vs -58 from -59
Wrong: 117:-20 vs -21 from -22
Wrong: 118:79 vs 78 from 77

The elements in each array are off by 1

            ref1(a_ref, a_ref, i % 2);
            test1(a_res, a_res, i % 2);

The above functions are applied to the cloned arrays before the verify() function. Both functions have identical code

    static void test1(byte[] a, byte[] b, int inv) {
        for (int i = 0; i < RANGE-4; i+=4) {
            a[i + 0]++;
            a[i + 1]++;
            a[i + 2]++;
            a[i + 3]++;
            b[inv + i + 0]++;
            b[inv + i + 1]++;
            b[inv + i + 2]++;
            b[inv + i + 3]++;
        }
    }

    static void ref1(byte[] a, byte[] b, int inv) {
        for (int i = 0; i < RANGE-4; i+=4) {
            a[i + 0]++;
            a[i + 1]++;
            a[i + 2]++;
            a[i + 3]++;
            b[inv + i + 0]++;
            b[inv + i + 1]++;
            b[inv + i + 2]++;
            b[inv + i + 3]++;
        }
    }

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Feb 9, 2024

ref1(a_ref, a_ref, i % 2);
test1(a_res, a_res, i % 2);

Removing these two functions allows the test to pass. This means the clone() function is running fine. The error lies with these functions

@sxa
Copy link
Member

sxa commented Feb 9, 2024

Looks like it passed on docker nodes hosted on dockerhost-skytap-ubuntu2204-x64-1, @sxa Any special setup on this machine?

No - it's a pretty basic setup at the moment as it's the new machine created as part of #3352 - certainly nothing that I would expect to change anything that that sort of level.

Interesting problem though and since it's not specific to one machine I think we should spin off another issue for it - also it's worth seeing if can you replicate this from a standalone test java program.

@Haroon-Khel
Copy link
Contributor

The test1 and ref1 functions look like they do the same thing (the code is exactly the same). If I replace the test1 execution with another ref1, so it looks like this

ref1(a_ref, a_ref, i % 2);
ref1(a_res, a_res, i % 2);

The test passes.

But if I replace the ref1 execution with a test1,

test1(a_ref, a_ref, i % 2);
test1(a_res, a_res, i % 2);

The test fails: (the numbers surrounding the vs should be equal, ie -65 vs -65 or -82 vs -82)

Wrong: 44:-65 vs -66 from -67
Wrong: 45:-82 vs -83 from -84
Wrong: 46:-77 vs -78 from -79

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Feb 9, 2024

Continuing this test failure issue in #3377 since this test failure affects more than just test-aws-rhel8-x64-1

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Feb 9, 2024

Since we are out of a release period I am adding test-aws-rhel8-x64-1 back into the test pool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

No branches or pull requests

3 participants