Lots of openjdk tests failed on test-aws-rhel8-x64-1, passed on other machines #2360

sophia-guo · 2021-10-18T18:52:48Z

Quite a few tests on test-aws-rhel8-x64-1.

Test_openjdk17_hs_extended.openjdk_x86-64_linux_testList_0 ❌ FAILURE ❌
jvm_compiler_1 => deep history 0/1 passed | possible issues
jdk_jmx_0 => deep history 0/1 passed | possible issues
jdk_net_0 => deep history 1/2 passed | possible issues
jdk_tools_0 => deep history 1/2 passed | possible issues
Post Test => deep history 2/3 passed | possible issues
jdk_jfr_1 => deep history 0/1 passed | possible issues
To make it easy for the infrastructure team to repeat and diagnose, please
answer the following questions:

Rerun jdk_jfr on same machine got the same failure.
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/359/
Rerun tests on different machines and passed.
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/361/

Rerun other tests on other machine and passed:
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/360/
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/358/

two others wait for the results:
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/362/
https://ci.adoptopenjdk.net/view/work-in-progress/job/grinder_sandbox_new/355/

sxa · 2023-02-06T13:24:43Z

Running full extended.openjdk in Grinder on x64 and s390x RHEL8 systems for comparison
Also running on the AWS RHEL7 system as that has had it's ci.role.test label removed.
As per comment elsewhere I will discuss the results with the internal team at Red Hat to progress this if it turns out to be generic across all RHEL8 systems.

sxa · 2023-02-08T11:32:35Z

AWS RHEL7 stopped after 10h (Re-running with TIME_LIMIT=100)
AWS RHEL8 completed in 8h44
Marist RHEL8 completed in 9h05

sxa · 2023-02-21T09:41:46Z

Full extended.openjdk AWS RHEL7 run completed in 10h20m so just over the 10h limit and not too much slower than the other two machines. It ran 'green' unlike the two RHEL8 systems

sxa · 2023-02-21T09:48:45Z

For the two RHEL8 jobs, both failed as per #2900 and sun/security/tools/keytool/NssTest.java.NssTest and sun/security/tools/keytool/NssTest.java.NssTest

In addition the Marist machine failed the following:

Compression tests from jdk_imageio_0 are a file length issue so potentially a different algorithm being used. we've seen differences on zLinux compression tests in the past on certain distributions I believe.
The failure in the jlink tests from jdk_tools_0 is not immediaately obvious and would require further analysis

sxa · 2023-02-21T09:52:32Z

On the above tests it looks like only the three suites jdk_security3, jdk_imageio and jdk_tools have failed suggesting the others have been resolved or disabled

Haroon-Khel · 2024-02-08T11:02:48Z

Rerunning https://ci.adoptium.net/job/Grinder/8748/console

Only 2 tests fail

java/net/HttpURLConnection/HttpURLConnectionExpectContinueTest.java.HttpURLConnectionExpectContinueTest
compiler/loopopts/superword/TestMovingLoadBeforeStore.java.TestMovingLoadBeforeStore

Rerunning both tests for 10 iterations

java/net/HttpURLConnection/HttpURLConnectionExpectContinueTest.java.HttpURLConnectionExpectContinueTest
https://ci.adoptium.net/job/Grinder/8757/console

Rerunning with more heap space -Xmx1000m https://ci.adoptium.net/job/Grinder/8759/console

11:40:00  FINE: sun.net.www.MessageHeader@2e212dad10 pairs: {POST / HTTP/1.1: null}{Connection: Close}{Expect: 100-Continue}{Cache-Control: no-cache}{Pragma: no-cache}{User-Agent: Java/17.0.10}{Host: localhost:54321}{Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2}{Content-type: application/x-www-form-urlencoded}{Content-Length: 52}
11:40:00  java.net.SocketTimeoutException: Read timed out
11:40:00  	at java.base/sun.nio.ch.NioSocketImpl.timedRead(NioSocketImpl.java:288)
11:40:00  	at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:314)
11:40:00  	at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:355)
11:40:00  	at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:808)
11:40:00  	at java.base/java.net.Socket$SocketInputStream.read(Socket.java:966)

compiler/loopopts/superword/TestMovingLoadBeforeStore.java.TestMovingLoadBeforeStore
https://ci.adoptium.net/job/Grinder/8761/console

CompileCommand: compileonly compiler/loopopts/superword/TestMovingLoadBeforeStore.test* bool compileonly = true
For random generator using seed: 4204409583826590688
To re-run test with same seed value please add "-Djdk.test.lib.random.seed=4204409583826590688" to command line.
Wrong: 17:-83 vs -84 from -85
Wrong: 18:99 vs 98 from 97
Wrong: 19:-56 vs -57 from -58
Wrong: 20:68 vs 67 from 66
Wrong: 21:-102 vs -103 from -104
...
java.lang.RuntimeException: wrong result for array a in test1
	at compiler.loopopts.superword.TestMovingLoadBeforeStore.verify(TestMovingLoadBeforeStore.java:70)
	at compiler.loopopts.superword.TestMovingLoadBeforeStore.main(TestMovingLoadBeforeStore.java:57)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

Haroon-Khel · 2024-02-09T11:38:01Z

Rerunning compiler/loopopts/superword/TestMovingLoadBeforeStore.java on other x64 linux nodes

Haroon-Khel · 2024-02-09T12:21:27Z

Taking a bit of a closer look to what the test is actually doing, https://github.com/adoptium/jdk17u/blob/master/test/hotspot/jtreg/compiler/loopopts/superword/TestMovingLoadBeforeStore.java,

            byte[] a_ref = a.clone();
            byte[] a_res = a.clone();

    static void verify(String name, byte[] ref, byte[] res, byte[] orig) {
        boolean fail = false;
        for (int j = 0; j < ref.length; j++) {
            if (ref[j] != res[j]) {
                System.out.println("Wrong: " + j + ":" + ref[j] + " vs " + res[j] + " from " + orig[j]);
                fail = true;
            }
        }
        if (fail) {
            throw new RuntimeException("wrong result for array " + name);
        }
    }

One array is created, then cloned twice using clone(). The test then goes through all 3 arrays and and checks if each element is the same. If not, the test will fail

Failed test outputs look like this

Wrong: 116:-57 vs -58 from -59
Wrong: 117:-20 vs -21 from -22
Wrong: 118:79 vs 78 from 77

The elements in each array are off by 1

            ref1(a_ref, a_ref, i % 2);
            test1(a_res, a_res, i % 2);

The above functions are applied to the cloned arrays before the verify() function. Both functions have identical code

    static void test1(byte[] a, byte[] b, int inv) {
        for (int i = 0; i < RANGE-4; i+=4) {
            a[i + 0]++;
            a[i + 1]++;
            a[i + 2]++;
            a[i + 3]++;
            b[inv + i + 0]++;
            b[inv + i + 1]++;
            b[inv + i + 2]++;
            b[inv + i + 3]++;
        }
    }

    static void ref1(byte[] a, byte[] b, int inv) {
        for (int i = 0; i < RANGE-4; i+=4) {
            a[i + 0]++;
            a[i + 1]++;
            a[i + 2]++;
            a[i + 3]++;
            b[inv + i + 0]++;
            b[inv + i + 1]++;
            b[inv + i + 2]++;
            b[inv + i + 3]++;
        }
    }

Haroon-Khel · 2024-02-09T12:28:08Z

https://ci.adoptium.net/job/Grinder/8787/console test-ibmcloud-rhel7-x64-1 Failed
https://ci.adoptium.net/job/Grinder/8788/console test-docker-debian12-x64-1 Passed
https://ci.adoptium.net/job/Grinder/8789/console test-equinix_esxi-ubuntu2204-x64-1 Failed
https://ci.adoptium.net/job/Grinder/8790/console test-docker-ubi8-x64-2 Failed
https://ci.adoptium.net/job/Grinder/8791/console test-docker-ubuntu2204-x64-4 Passed
https://ci.adoptium.net/job/Grinder/8792/console test-ibmcloud-rhel6-x64-1 Failed
https://ci.adoptium.net/job/Grinder/8795/console test-docker-ubuntu2204-x64-5 Passed
https://ci.adoptium.net/job/Grinder/8796/console test-docker-debian11-x64-2 Failed
https://ci.adoptium.net/job/Grinder/8797/console test-docker-fedora35-x64-2 Failed
https://ci.adoptium.net/job/Grinder/8799/console test-docker-ubi8-x64-2 Failed

Looks like it passed on docker nodes hosted on dockerhost-skytap-ubuntu2204-x64-1, @sxa Any special setup on this machine?

Haroon-Khel · 2024-02-09T12:30:49Z

ref1(a_ref, a_ref, i % 2);
test1(a_res, a_res, i % 2);

Removing these two functions allows the test to pass. This means the clone() function is running fine. The error lies with these functions

sxa · 2024-02-09T13:33:28Z

Looks like it passed on docker nodes hosted on dockerhost-skytap-ubuntu2204-x64-1, @sxa Any special setup on this machine?

No - it's a pretty basic setup at the moment as it's the new machine created as part of #3352 - certainly nothing that I would expect to change anything that that sort of level.

Interesting problem though and since it's not specific to one machine I think we should spin off another issue for it - also it's worth seeing if can you replicate this from a standalone test java program.

Haroon-Khel · 2024-02-09T15:50:28Z

The test1 and ref1 functions look like they do the same thing (the code is exactly the same). If I replace the test1 execution with another ref1, so it looks like this

ref1(a_ref, a_ref, i % 2);
ref1(a_res, a_res, i % 2);

The test passes.

But if I replace the ref1 execution with a test1,

test1(a_ref, a_ref, i % 2);
test1(a_res, a_res, i % 2);

The test fails: (the numbers surrounding the vs should be equal, ie -65 vs -65 or -82 vs -82)

Wrong: 44:-65 vs -66 from -67
Wrong: 45:-82 vs -83 from -84
Wrong: 46:-77 vs -78 from -79

Haroon-Khel · 2024-02-09T17:14:09Z

Continuing this test failure issue in #3377 since this test failure affects more than just test-aws-rhel8-x64-1

Haroon-Khel · 2024-02-09T17:15:32Z

Since we are out of a release period I am adding test-aws-rhel8-x64-1 back into the test pool

sophia-guo added the testFail label Oct 18, 2021

github-actions bot added arch:x64 provider:aws labels Oct 18, 2021

sophia-guo mentioned this issue Oct 19, 2021

PRE RELEASE - Triage JDK17 adoptium/aqa-tests#2991

Closed

sxa self-assigned this Dec 2, 2021

sxa mentioned this issue Dec 2, 2021

System unavailable: test-aws-rhel8-x64-1 #2030

Closed

sxa mentioned this issue Jan 30, 2023

sun/security/pkcs11 tests failing on RHEL8 systems #2900

Open

sxa added this to the 2023-02 (February) milestone Feb 6, 2023

sxa modified the milestones: 2023-02 (February), 2023-04 (April) Apr 6, 2023

sxa modified the milestones: 2023-04 (April), 2023-06 (June) Jun 6, 2023

sxa modified the milestones: 2023-06 (June), 2023-07 (July) Jul 7, 2023

sxa modified the milestones: 2023-07 (July), 2023-11 (November) Sep 22, 2023

sxa modified the milestones: 2023-11 (November), Backlog Jan 3, 2024

sxa mentioned this issue Jan 3, 2024

Problem machines for release #2662

Open

Haroon-Khel mentioned this issue Feb 8, 2024

Exclude java/net/HttpURLConnection/HttpURLConnectionExpectContinueTest.java for linux on jdk17 adoptium/aqa-tests#5050

Merged

Haroon-Khel self-assigned this Feb 9, 2024

Haroon-Khel added this to 2024 1Q Adoptium Plan Feb 9, 2024

Haroon-Khel mentioned this issue Feb 9, 2024

compiler/loopopts/superword/TestMovingLoadBeforeStore.java.TestMovingLoadBeforeStore failure on x64 linux jdk17 #3377

Closed

Haroon-Khel closed this as completed Feb 9, 2024

github-project-automation bot moved this to Done in 2024 1Q Adoptium Plan Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lots of openjdk tests failed on test-aws-rhel8-x64-1, passed on other machines #2360

Lots of openjdk tests failed on test-aws-rhel8-x64-1, passed on other machines #2360

sophia-guo commented Oct 18, 2021

sxa commented Feb 6, 2023 •

edited

Loading

sxa commented Feb 8, 2023

sxa commented Feb 21, 2023 •

edited

Loading

sxa commented Feb 21, 2023

sxa commented Feb 21, 2023

Haroon-Khel commented Feb 8, 2024 •

edited

Loading

Haroon-Khel commented Feb 9, 2024 •

edited

Loading

Haroon-Khel commented Feb 9, 2024 •

edited

Loading

Haroon-Khel commented Feb 9, 2024 •

edited

Loading

Haroon-Khel commented Feb 9, 2024 •

edited

Loading

sxa commented Feb 9, 2024

Haroon-Khel commented Feb 9, 2024

Haroon-Khel commented Feb 9, 2024 •

edited

Loading

Haroon-Khel commented Feb 9, 2024 •

edited

Loading

Lots of openjdk tests failed on test-aws-rhel8-x64-1, passed on other machines #2360

Lots of openjdk tests failed on test-aws-rhel8-x64-1, passed on other machines #2360

Comments

sophia-guo commented Oct 18, 2021

sxa commented Feb 6, 2023 • edited Loading

sxa commented Feb 8, 2023

sxa commented Feb 21, 2023 • edited Loading

sxa commented Feb 21, 2023

sxa commented Feb 21, 2023

Haroon-Khel commented Feb 8, 2024 • edited Loading

Haroon-Khel commented Feb 9, 2024 • edited Loading

Haroon-Khel commented Feb 9, 2024 • edited Loading

Haroon-Khel commented Feb 9, 2024 • edited Loading

Haroon-Khel commented Feb 9, 2024 • edited Loading

sxa commented Feb 9, 2024

Haroon-Khel commented Feb 9, 2024

Haroon-Khel commented Feb 9, 2024 • edited Loading

Haroon-Khel commented Feb 9, 2024 • edited Loading

sxa commented Feb 6, 2023 •

edited

Loading

sxa commented Feb 21, 2023 •

edited

Loading

Haroon-Khel commented Feb 8, 2024 •

edited

Loading

Haroon-Khel commented Feb 9, 2024 •

edited

Loading

Haroon-Khel commented Feb 9, 2024 •

edited

Loading

Haroon-Khel commented Feb 9, 2024 •

edited

Loading

Haroon-Khel commented Feb 9, 2024 •

edited

Loading

Haroon-Khel commented Feb 9, 2024 •

edited

Loading

Haroon-Khel commented Feb 9, 2024 •

edited

Loading