Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix valgrind detection during configure script #2823

Merged
merged 8 commits into from
Feb 27, 2025

Conversation

jimklimov
Copy link
Member

@jimklimov jimklimov commented Feb 26, 2025

A problem was noticed with some versions of OpenIndiana builders on NUT CI, where valgrind claimed that strchr (in /lib/ld.so.1) jumped into an Illegal opcode (or at least one that the tool did not recognize), and peppered the core dump location with memcheck-x86-sol cores for each run of NUT build since November 2024 or so:

:; valgrind /bin/true
==799754== Memcheck, a memory error detector
==799754== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==799754== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==799754== Command: /bin/true
==799754==
vex x86->IR: unhandled instruction bytes: 0x2E 0x8D 0x74 0x26
==799754== valgrind: Unrecognised instruction at address 0x3ffe291b.
==799754==    at 0x3FFE291B: strchr (in /lib/ld.so.1)
==799754==    by 0x3FFCEB5B: procenv_user (in /lib/ld.so.1)
==799754==    by 0x3FFCB34A: setup (in /lib/ld.so.1)
==799754==    by 0x3FFDBC49: _setup (in /lib/ld.so.1)
==799754==    by 0x3FFC0D94: _rt_boot (in /lib/ld.so.1)
==799754==    by 0x37FEFE46: ???
==799754== Your program just tried to execute an instruction that Valgrind
==799754== did not recognise.  There are two possible reasons for this.
==799754== 1. Your program has a bug and erroneously jumped to a non-code
==799754==    location.  If you are running Memcheck and you just saw a
==799754==    warning about a bad jump, it's probably your program's fault.
==799754== 2. The instruction is legitimate but Valgrind doesn't handle it,
==799754==    i.e. it's Valgrind's fault.  If you think this is the case or
==799754==    you are not sure, please let us know and we'll try to fix it.
==799754== Either way, Valgrind will now raise a SIGILL signal which will
==799754== probably kill your program.
==799754==
==799754== Process terminating with default action of signal 4 (SIGILL)
==799754==  Illegal opcode at address 0x3FFE291B
==799754==    at 0x3FFE291B: strchr (in /lib/ld.so.1)
==799754==    by 0x3FFCEB5B: procenv_user (in /lib/ld.so.1)
==799754==    by 0x3FFCB34A: setup (in /lib/ld.so.1)
==799754==    by 0x3FFDBC49: _setup (in /lib/ld.so.1)
==799754==    by 0x3FFC0D94: _rt_boot (in /lib/ld.so.1)
==799754==    by 0x37FEFE46: ???
==799754==
==799754== HEAP SUMMARY:
==799754==     in use at exit: 0 bytes in 0 blocks
==799754==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==799754==
==799754== All heap blocks were freed -- no leaks are possible
==799754==
==799754== For lists of detected and suppressed errors, rerun with: -s
==799754== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal Instruction (core dumped)
:; valgrind /bin/sh --help
==805321== Memcheck, a memory error detector
==805321== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==805321== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==805321== Command: /bin/sh --help
==805321==
vex x86->IR: unhandled instruction bytes: 0x2E 0x8D 0x74 0x26
==805321== valgrind: Unrecognised instruction at address 0x3ffe291b.
==805321==    at 0x3FFE291B: strchr (in /lib/ld.so.1)
==805321==    by 0x3FFCEB5B: procenv_user (in /lib/ld.so.1)
==805321==    by 0x3FFCB34A: setup (in /lib/ld.so.1)
==805321==    by 0x3FFDBC49: _setup (in /lib/ld.so.1)
==805321==    by 0x3FFC0D94: _rt_boot (in /lib/ld.so.1)
==805321==    by 0x37FEFE46: ???
==805321== Your program just tried to execute an instruction that Valgrind
==805321== did not recognise.  There are two possible reasons for this.
==805321== 1. Your program has a bug and erroneously jumped to a non-code
==805321==    location.  If you are running Memcheck and you just saw a
==805321==    warning about a bad jump, it's probably your program's fault.
==805321== 2. The instruction is legitimate but Valgrind doesn't handle it,
==805321==    i.e. it's Valgrind's fault.  If you think this is the case or
==805321==    you are not sure, please let us know and we'll try to fix it.
==805321== Either way, Valgrind will now raise a SIGILL signal which will
==805321== probably kill your program.
==805321==
==805321== Process terminating with default action of signal 4 (SIGILL)
==805321==  Illegal opcode at address 0x3FFE291B
==805321==    at 0x3FFE291B: strchr (in /lib/ld.so.1)
==805321==    by 0x3FFCEB5B: procenv_user (in /lib/ld.so.1)
==805321==    by 0x3FFCB34A: setup (in /lib/ld.so.1)
==805321==    by 0x3FFDBC49: _setup (in /lib/ld.so.1)
==805321==    by 0x3FFC0D94: _rt_boot (in /lib/ld.so.1)
==805321==    by 0x37FEFE46: ???
==805321==
==805321== HEAP SUMMARY:
==805321==     in use at exit: 0 bytes in 0 blocks
==805321==   total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==805321==
==805321== All heap blocks were freed -- no leaks are possible
==805321==
==805321== For lists of detected and suppressed errors, rerun with: -s
==805321== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal Instruction (core dumped)

FWIW, gdb was not of much help:

:; gdb --core /var/cores/memcheck-x86-sol.global.nutci-oi.1740566518.805321.core

GNU gdb (GDB) 14.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-solaris2.11".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
--Type <RET> for more, q to quit, c to continue without paging--
[New LWP 1]
[LWP 1 exited]
[New LWP 1]
Core was generated by `valgrind /bin/sh --help'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x380471e0 in ?? ()
[Current thread is 2 (LWP 1)]
(gdb) bt
#0  0x380471e0 in ?? ()
#1  0x3803b183 in ?? ()
#2  0x38044c7f in ?? ()
#3  0x380b7bf5 in ?? ()
#4  0x3809470b in ?? ()
#5  0x00000000 in ?? ()
(gdb) quit

This was not seen to happen for other programs though, such as NUT ones under test. A closely related OmniOS (or older OI builds) also did not expose such mis-behavior.

In any case, this led to some investigation into the NUT configure script and found/fixed some issues by this PR:

  • configure --with-valgrind=PROG option was not well handled, setting the PROG after we tried (or not) to test the tool found in PATH
  • avoid using system shell and its/independent true implementation, if we can build and run a trivial program to same effect as the test for valgrind usability
  • enable --with-valgrind=auto by default (so try to detect the tool, then try to use it, but do not fail if either of those does not succeed)

@jimklimov jimklimov added the portability We want NUT to build and run everywhere possible label Feb 26, 2025
@jimklimov jimklimov added this to the 2.8.3 milestone Feb 26, 2025
@jimklimov jimklimov added the Solaris/illumos Solaris and illumos systems (OpenIndiana, OmniOS, SmartOS, TribbliX...) label Feb 26, 2025
… to resolve in SCRIPTDIR for out-of-tree builds

Signed-off-by: Jim Klimov <[email protected]>
…ntf() called from report_pass() on OpenIndiana [networkupstools#2823]

Example fault reports:
13:27:00  ==228221== 4,104 bytes in 1 blocks are possibly lost in loss record 1 of 1
13:27:00  ==228221==    at 0x7FFF64A22: malloc (vg_replace_malloc.c:458)
13:27:00  ==228221==    by 0x7FFDC5CAD: _findbuf (in /lib/amd64/libc.so.1)
13:27:00  ==228221==    by 0x7FFDAEBDF: _ndoprnt (in /lib/amd64/libc.so.1)
13:27:00  ==228221==    by 0x7FFDB112E: printf (in /lib/amd64/libc.so.1)
13:27:00  ==228221==    by 0x408F8C: report_pass (in /tmp/jenkins-swarm/jenkins-nutci/nut_nut_PR-2823/tests/driver_methods_utest)
13:27:00  ==228221==    by 0x408F55: report_0_means_pass (in /tmp/jenkins-swarm/jenkins-nutci/nut_nut_PR-2823/tests/driver_methods_utest)
13:27:00  ==228221==    by 0x40875A: main (in /tmp/jenkins-swarm/jenkins-nutci/nut_nut_PR-2823/tests/driver_methods_utest)

Signed-off-by: Jim Klimov <[email protected]>
… are possibly lost in loss record" in malloc in printf (on OpenIndiana)

Signed-off-by: Jim Klimov <[email protected]>
@jimklimov jimklimov merged commit 1d019ca into networkupstools:master Feb 27, 2025
30 checks passed
@jimklimov jimklimov deleted the fix-valgrind branch February 27, 2025 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
portability We want NUT to build and run everywhere possible Solaris/illumos Solaris and illumos systems (OpenIndiana, OmniOS, SmartOS, TribbliX...)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant