Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test-aws-rhel8-x64-1 - tests which create core files fail #1829

Closed
lumpfish opened this issue Jan 12, 2021 · 12 comments
Closed

test-aws-rhel8-x64-1 - tests which create core files fail #1829

lumpfish opened this issue Jan 12, 2021 · 12 comments

Comments

@lumpfish
Copy link

It looks like the openj9 functional tests which create a j9core.dmp or other core file and then examine the contents of the file are failing to create the core dump.

Example job link: https://ci.adoptopenjdk.net/job/Test_openjdk8_j9_sanity.functional_x86-64_linux/71/consoleFull

The failing test targets are

21:20:34  FAILED test targets:
21:20:34  	cmdLineTester_callsitedbgddrext_openj9_0
21:20:34  	cmdLineTester_pltest_0

The cmdLineTester_callsitedbgddrext_openj9_0 test failure message is

19:50:25  Output from test:
19:50:25   [OUT] DTFJView version 4.29.5, using DTFJ version 1.12.29003
19:50:25   [OUT] Loading image from DTFJ...
19:50:25   [OUT] 
19:50:25   [OUT] Could not load dump file and/or could not load XML file: Image file '/home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-64_linux/openjdk-tests/TKG/test_output_16103945291575/cmdLineTester_callsitedbgddrext_openj9_0/j9core.dmp' not found.
19:50:25   [OUT] For a list of commands, type "help"; for how to use "help", type "help help"
19:50:25   [OUT] > DDR is not enabled for this core file, '!' commands are disabled
19:50:25   [OUT] > 
19:50:25  >> Success condition was not found: [Output match: jvminit.c]

The cmdLineTester_pltest_0 test failure message is

20:13:11   [ERR] ----------------------------------------
20:13:11   [ERR] dump tests
20:13:11   [ERR] ----------------------------------------
20:13:11   [ERR] 
20:13:11   [ERR] 
20:13:11   [ERR] Starting test j9dump_verify_functiontable_slots
20:13:11   [ERR] Ending test j9dump_verify_functiontable_slots
20:13:11   [ERR] 
20:13:11   [ERR] 
20:13:11   [ERR] Starting test j9dump_test_create_dump_with_name
20:13:11   [ERR]   calling j9dump_create with filename: /home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-64_linux/openjdk-tests/TKG/test_output_16103945291575/cmdLineTester_pltest_0/j9dump_test_create_dump_with_name
20:13:11   [ERR] JVMPORT030W 
20:13:11   [ERR] j9dumpTest.c line  213: j9dump_test_create_dump_with_name j9dump_create returned: 1, with filename: The core file created by child process with pid = 3757128 was not found. Expected to find core file with name "/home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-64_linux/openjdk-tests/TKG/test_output_16103945291575/cmdLineTester_pltest_0/core.3757128"
20:13:11   [ERR] 
20:13:11   [ERR] 		LastErrorNumber: -108
20:13:11   [ERR] 		LastErrorMessage: No such file or directory
20:13:11   [ERR] 
20:13:11   [ERR] Ending test j9dump_test_create_dump_with_name
20:13:11   [ERR] 
20:13:11   [ERR] 
20:13:11   [ERR] Starting test j9dump_test_create_dump_from_signal_handler
20:13:11   [ERR]   calling j9dump_create with filename: /home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-64_linux/openjdk-tests/TKG/test_output_16103945291575/cmdLineTester_pltest_0/j9dump_test_create_dump_from_signal_handler
20:13:11   [ERR] JVMPORT030W 
20:13:11   [ERR] j9dumpTest.c line  311: j9dump_test_create_dump_from_signal_handler j9dump_create returned: 1, with filename: The core file created by child process with pid = 3757143 was not found. Expected to find core file with name "/home/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-64_linux/openjdk-tests/TKG/test_output_16103945291575/cmdLineTester_pltest_0/core.3757143"
20:13:11   [ERR] 		LastErrorNumber: -108
20:13:11   [ERR] 		LastErrorMessage: No such file or directory
20:13:11   [ERR] 
20:13:11   [ERR] Ending test j9dump_test_create_dump_from_signal_handler
20:13:11   [ERR] 
20:13:11   [ERR] 
20:13:11   [ERR] Starting test j9dump_test_create_dump_with_NO_name
20:13:11   [ERR]   calling j9dump_create with empty filename
20:13:11   [ERR] JVMPORT030W 
20:13:11   [ERR] j9dumpTest.c line  153: j9dump_test_create_dump_with_NO_name j9dump_create returned: 1, with filename: The core file created by child process with pid = 3757157 was not found. Expected to find core file with name "core.3757157"
20:13:11   [ERR] 		LastErrorNumber: -108
20:13:11   [ERR] 		LastErrorMessage: No such file or directory
20:13:11   [ERR] 
20:13:11   [ERR] Ending test j9dump_test_create_dump_with_NO_name
20:13:11   [ERR] 
20:13:11   [ERR] Dump test done, failures detected.

The same tests pass on other machines - e.g. https://ci.adoptopenjdk.net/job/Test_openjdk8_j9_sanity.functional_x86-64_linux_xl/72/consoleFull

@sxa
Copy link
Member

sxa commented Jan 19, 2021

I suspect they're ending up in /var/lib/systemd/coredump/ as defined in /proc/sys/kernel/core_pattern ... Depends if we want to adjust the default on the system or handle the situation differently (We may well want to adjust the system default if only because they're in a different format - LZ4 compressed)

@sxa
Copy link
Member

sxa commented Jan 19, 2021

I have made an adjustment to /etc/sysctl.conf to set kernel.core_pattern = core.%p
Running a test at https://ci.adoptopenjdk.net/job/Test_openjdk8_j9_sanity.functional_x86-64_linux/78
Will need to ensure that's the best place to set it and ensure we add it to the playbooks if we choose to go ahead with this as a final solution, but I'm a little reluctant to reset system defaults like this particuarly if it changes the behaviour of someone's system when they run our playbooks so I may put this in but guarded by the adoptopenjdk tag

FYA @smlambert

@sxa
Copy link
Member

sxa commented Jan 19, 2021

(Job above succeeded so that resolves the issue)

@karianna karianna added this to the January 2021 milestone Jan 20, 2021
@lumpfish
Copy link
Author

lumpfish commented Feb 9, 2021

Closing - these tests are now passing on test-aws-rhel8-x64-1.

@sxa
Copy link
Member

sxa commented Feb 9, 2021

As per statement above, I do not yet considered this closed as the playbooks have (to my knowledge) not been updated to handle this.

@Haroon-Khel
Copy link
Contributor

Haroon-Khel commented Apr 6, 2021

As per statement above, I do not yet considered this closed as the playbooks have (to my knowledge) not been updated to handle this.

@sxa Is the solution to this to set kernel.core_pattern = core.%p in /etc/sysctl.conf? Should be fairly simple to add to the playbooks. If so, it shouldnt take me more than a second

@Haroon-Khel
Copy link
Contributor

@sxa ^^ ping

@Haroon-Khel Haroon-Khel modified the milestones: April 2021, May 2021 May 18, 2021
@sxa
Copy link
Member

sxa commented May 18, 2021

Thanks for the reminder :-) I think so, but we need to see which distributions this affects - given the choice I'd rather not replace it everywhere unless we need to.

Related: #1817

Also, it would be good to include the test parameters for this one to the list at the bottom of this section of the FAQ: https://github.com/adoptium/infrastructure/blob/master/FAQ.md#how-do-i-replicate-a-test-failure

@Haroon-Khel Haroon-Khel modified the milestones: May 2021, June 2021 Jun 21, 2021
@sxa
Copy link
Member

sxa commented Jan 30, 2023

Going to mark this as critical because@

  1. I think we have got a solution to this somewhere,
  2. There are other issues that seem specific to this box and we should determine if they are common across any/all RHEL8 systems.

@sxa
Copy link
Member

sxa commented May 13, 2024

Discussed with @steelhead31 and @Haroon-Khel today - we should see if this still occurs, whether we have a solution documented somewhere, and if so implement it, otherwise verify if it's a problem with the UBI8 containers that we have.

@sxa
Copy link
Member

sxa commented Sep 4, 2024

On the basis that the hotspot test works ok I'm closing this. Verified at https://ci.adoptium.net/job/Grinder/10819/console

@sxa sxa closed this as completed Sep 4, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in 2024 2Q Adoptium Plan Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

No branches or pull requests

4 participants