-
Notifications
You must be signed in to change notification settings - Fork 737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenJDK AIX jdk_security2 Agent communication error #19962
Comments
https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_OpenJDK21_testList_1/29 - p8-java1-ibm04
|
https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_1/299/ |
https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_1/302 |
This is 0.48 |
I suspect this is the same cause, although no test is named. I've seen a number of these.
|
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/313/ - p8-java1-ibm06 https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_1/312 - p8-java1-ibm12 https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_OMR_testList_2/138 - p8-java1-ibm10
|
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/316 |
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/317 - p8-java1-ibm10 https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64_aix_Nightly_testList_1/316 |
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/319 - p8-java1-ibm05
|
https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_OMR_testList_2/145 - p8-java1-ibm10 |
https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_OMR_testList_1/147 - p8-java1-ibm03 |
https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_1/323 |
https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_1/324 |
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64_aix_Nightly_testList_1/325 |
@jasonkatonica I've set this as a blocker. |
https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_1/325 https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64_aix_OpenJDK17_testList_0/39 |
Probably related to https://github.ibm.com/runtimes/infrastructure/issues/10081 |
Checking the failures, they occur in different tests. Investigating these tests to see if there are any similarities. |
no, it didn't sound like that. the machine was even rendered inaccessible, wasn't it? that meant sshd or some other critical process(es) were killed as well. |
by the way, you can configure AIX to pro-actively allocate paging space for sure. i.e. paging space is used as if it were a copy of all active virtual memory, instead of being an extension to the physical memory. if you config like so, java process(es) just failed to start in the current situation. for this case, system had better have more paging space (a few times of) than physical memory (otherwise, machine is likely under-utilized). |
From the failed case, https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_1/329/testReport/. It run the |
it sounded to me a normal pick of process to be killed. as far as i know, the most likely pick is the "culprit" failing to allocate the paging space. |
Since it's being looked at I stopped reporting the failures, but they are still occurring. Also, I don't know if it's related or not, but AIX machines keep going down and need to be resurrected. |
@pshipton I run the same test packages under the |
Issue eclipse-openj9/openj9#19962 Signed-off-by: Peter Shipton <[email protected]>
Excluding the test. |
Issue eclipse-openj9/openj9#19962 Signed-off-by: Peter Shipton <[email protected]>
Issue eclipse-openj9/openj9#19962 Signed-off-by: Peter Shipton <[email protected]>
Issue eclipse-openj9/openj9#19962 Signed-off-by: Peter Shipton <[email protected]>
Still problems with the jdk_security2 test even with TestCipherMode excluded. https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64_aix_Nightly_testList_0/341 https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64_aix_Nightly_testList_1/340 https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_ppc64_aix_Nightly_testList_1/346 |
Summarize my updates until now:
Next, my plan:
|
Restored TestCipherMode adoptium/aqa-tests#5805 |
Summary:
Next:
|
Run the “ Real-time memory and paging usage captured during the 2nd iteration found one process/JVM “
After a while, the current iteration is killed, Jenkins output as follow. But Jenkins keeps running the other iterations.
The process/JVM
I checked this AgentServer process and found that it runs in a dedicated JVM, separate from the test case JVMs. It functions as a manager and communication interface for the test cases. |
After researching As of now, I have not found any direct connection between the failures related to For the above finding, consuming an extremely large amount of paging space in the JTReg agent server process (com.sun.javatest.regtest.agent.AgentServer). Details in comment #19962 (comment). @pshipton Do you have any suggestions or know which team might be able to assist with this @jasonkatonica FYI. |
If we catch the process in action of consuming a large amount of memory and successfully get a core and maybe a javacore file, we can dig into these diagnostics to help figure out what is using all the memory. Assuming it's not the JVM itself, I don't have any experience with tracking memory usage on AIX. We can approach the service team to find out how best to track it. @llxia might know more about the AgentServer or be able to refer to someone else who does. |
|
FYI |
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64_aix_OMR_testList_1/123 - p8-java1-ibm10
jdk_security2_1
javax/crypto/Cipher/TestCipherMode.java
The text was updated successfully, but these errors were encountered: