-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ansible request for <AIX> x11 setup #2297
Comments
As far as I can see, the log that showed this as https://ci.adoptopenjdk.net/job/Test_openjdk16_hs_extended.openjdk_ppc64_aix_testList_1/9/consoleFull says:
I have logged onto the machine that showed the error - aix71-2 - and there is nothing stopping that process starting up properly. Has it been seen anywhere else i.e. is it reproducible, or could this have been a case where the machine had a leftover process, possibly from a previously terminated job, that was stopping it from starting up properly? I seem to be able to start an |
This is a consistent issue and believe happens to all AIX. test-osuosl-aix72-ppc64-2 |
That is the one I mentioned above from four weeks ago - I was interested to know if it had been seen at any other time
That was a run from your branch where you explicitly put in an override to set the DISPLAY to an incorrect value (You can see from the line above your change that the virtual X server is started on |
aix72-1 had a leftover process from August 6th which was stopping it from starting a new one. That has also now been cleared but we need the test suite modified to be able to handle this situation - it is NOT an infrastructure request for an installation on the machine :-) |
Rerun with test-ibm-aix71-ppc64-1: |
I can see the failure with test-osuosl-aix72-ppc64-2 since July 4th https://ci.adoptopenjdk.net/job/Test_openjdk16_hs_extended.openjdk_ppc64_aix_testList_0/6/testReport/junit/java_beans_XMLEncoder_Test4652928/java/Test4652928/. build-osuosl-aix71-ppc64-2 passed on https://ci.adoptopenjdk.net/job/Test_openjdk16_hs_extended.openjdk_ppc64_aix_testList_0/8/testReport/java_beans_XMLEncoder_Test4631471/ and failed on https://ci.adoptopenjdk.net/job/Test_openjdk16_hs_extended.openjdk_ppc64_aix_testList_1/9/testReport/junit/java_beans_XMLEncoder_Test4631471/java/Test4631471/ Rerun on build-osuosl-aix71-ppc64-2 |
The issue happened to different machines, it is definitely reproducible. What is the leftover process on aix72-1, could you confirm if it is a leftover process created by openjdk tests? As for jenkins DISPLAY has been reset when jenkins job is done. https://github.com/adoptium/aqa-tests/pull/1835/files |
Rerun test java/beans/XMLEncoder/on test-ibm-aix71-ppc64-1 and test-osuosl-aix72-ppc64-2 both passed. A second rerun passed too, which means if there is a leftover process it's not created by test java/beans/XMLEncoder/. We probably need to know how the leftover process is created. |
It'll be the |
@smlambert Has this been discussed in the AQAvit meetings? We'll need to find a way to ensure the X server is terminated at the end of the job, which it may not be at present. Do we have a post-test clean-up phase that we could add this too? @Haroon-Khel It's possible this was introduced as a result of adoptium/aqa-tests#1835 although that was from over a year ago now, so I wonder if it's possible that the While adding a different port number would probably work around this issue it will result in process leaks so I'd be reluctant to implement the changes proposed in adoptium/aqa-tests#2831 for this |
In the 'post' stage of a test pipeline, for platforms that use the xvfb plugin (all linux platforms), the plugin closes/cleans up the process. For AIX, that plugin does not work, so Xvfb is manually launched and I presume adoptium/aqa-tests#2831 is meant to both address the security scan issue of the process running, but also clean up the process in the post stage for that platform. |
so to be clear, https://github.com/adoptium/aqa-tests/blob/dce1f080f4e7fb1b69b429982aa62e71f54d2a9d/buildenv/jenkins/JenkinsfileBase#L602 is definitely NOT used on Linux because it's started via the jenkins plgin? |
This is the line that invokes/starts the Jenkins xvfb plugin:https://github.com/adoptium/aqa-tests/blob/dce1f080f4e7fb1b69b429982aa62e71f54d2a9d/buildenv/jenkins/JenkinsfileBase#L604 |
Gotcha - I hadn't read that syntax as being an invocation of stuff from the plugin. I don't believe that 2821 does anything to address the cleanup, only attempt to cycle the port number so it doesn't hit any leftover one (which is solving the wrong problem IMHO!) |
Possible solution in adoptium/aqa-tests#2892, but I think we need to determine if the current code is always leaving the process around or not |
Hmmm even without that change an aborted job still cleaned up the |
I'll take a look again. iirc, what I saw is that (usually) the X VFB process stopped itself shortly after the job finished. When it did continue to run it took PID 1 as PPID. |
Just adding a comment - the scans done at OSUOSL are still picking up on port 6000 - so regardless of what has been done (or not done) - the issue is still active (as of 11 October 2021) I'll go back to my PR - and undo the 'generic' code - ie, choosing a port other than 6000 (adoptium/aqa-tests#2831) - and only use the -secIP argument - and hopefully, the issue with the scan is gone (but not the hanging process). |
FYI: about to kill process - but on ojdk05 this has been hanging since October 10th:
|
a) This issue (Ansible request) - asis - can be closed, as it is not the problem (AIX X11 configuration). In any case - this is not related to ansible playbooks and the issue cannot be resolved via a playbook change. |
The processCheck job should pick up on incidents of the server process being left around so we should try and keep an eye on that to see if it occurs. I haven't heard of any issues with this recently though. |
Closing due to the lack of problems being highlighted recently. |
Please put the name of the software product (and affected platforms if relevant) in the title of this issue
Details:
java/beans/XMLEncoder/* failed on AIX jdk16 with java.awt.AWTError:
Can't connect to X11 window server using ':0' as the value of the DISPLAY variable
Details adoptium/aqa-tests#2810
The text was updated successfully, but these errors were encountered: