Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ivykis 0.42 failure on OmniOS in syslog-ng 3.9.1 #12

Closed
bldewolf opened this issue Aug 24, 2017 · 9 comments
Closed

ivykis 0.42 failure on OmniOS in syslog-ng 3.9.1 #12

bldewolf opened this issue Aug 24, 2017 · 9 comments

Comments

@bldewolf
Copy link

We have a couple OmniOS systems and we run a few things built using pkgsrc, including syslog-ng and ivykis. After updating from ivykis-0.40 to 0.42 and syslog-ng-3.8.1 to 3.9.1, syslog-ng dies after a few seconds and core dumps. I noticed iv_* functions in the backtraces so I started fiddling with the ivykis package. Updating to 0.42.1 doesn't improve things, but moving back to 0.40 makes things stable again.

Let me know if you need anything else for debugging.

Here's some backtraces from one of the core dumps:

(gdb) info threads
  4 process 205161      0xfe930e65 in __sigaction () from /lib/libc.so.1
  3 process 139625      0xfe9326ce in door_restart () from /lib/libc.so.1
  2 process 74089      0xfe930e07 in _portfs () from /lib/libc.so.1
* 1 process 270697      0xfe931d85 in _lwp_kill () from /lib/libc.so.1
(gdb) thread 1
[Switching to thread 1 (process 270697    )]#0  0xfe931d85 in _lwp_kill () from /lib/libc.so.1
(gdb) bt
#0  0xfe931d85 in _lwp_kill () from /lib/libc.so.1
#1  0xfe929422 in thr_kill () from /lib/libc.so.1
#2  0xfe8c672b in raise () from /lib/libc.so.1
#3  0xfe8a104e in abort () from /lib/libc.so.1
#4  0xfe724e92 in iv_event_post () from /usr/pkg/lib/libivykis.so.0
#5  0xfe72a6a5 in iv_fd_port_set_poll_timeout () from /usr/pkg/lib/libivykis.so.0
#6  0xfe726a64 in iv_fd_poll_and_run () from /usr/pkg/lib/libivykis.so.0
#7  0xfe727e92 in iv_main () from /usr/pkg/lib/libivykis.so.0
#8  0xfe725e9e in iv_work_thread () from /usr/pkg/lib/libivykis.so.0
#9  0xfe728d93 in iv_thread_died () from /usr/pkg/lib/libivykis.so.0
#10 0xfe92cf1b in _thrp_setup () from /lib/libc.so.1
#11 0xfe92d0b0 in setup_context () from /lib/libc.so.1
(gdb) thread 2
[Switching to thread 2 (process 74089    )]#0  0xfe930e07 in _portfs () from /lib/libc.so.1
(gdb) bt
#0  0xfe930e07 in _portfs () from /lib/libc.so.1
#1  0xfe8a6ee9 in port_getn () from /lib/libc.so.1
#2  0xfe72a433 in iv_fd_port_poll () from /usr/pkg/lib/libivykis.so.0
#3  0xfe726a64 in iv_fd_poll_and_run () from /usr/pkg/lib/libivykis.so.0
#4  0xfe727e92 in iv_main () from /usr/pkg/lib/libivykis.so.0
#5  0xfeefbd1d in main_loop_run () from /usr/pkg/lib/libsyslog-ng-3.9.so.0
#6  0x08051d57 in ?? ()
#7  0x08051783 in ?? ()
(gdb) thread 3
[Switching to thread 3 (process 139625    )]#0  0xfe9326ce in door_restart () from /lib/libc.so.1
(gdb) bt
#0  0xfe9326ce in door_restart () from /lib/libc.so.1
#1  0xfe918cb9 in door_return () from /lib/libc.so.1
#2  0xfe918d18 in door_create_func () from /lib/libc.so.1
#3  0xfe92cf1b in _thrp_setup () from /lib/libc.so.1
#4  0xfe92d0b0 in setup_context () from /lib/libc.so.1
(gdb) thread 4
[Switching to thread 4 (process 205161    )]#0  0xfe930e65 in __sigaction () from /lib/libc.so.1
(gdb) bt
#0  0xfe930e65 in __sigaction () from /lib/libc.so.1
#1  0xfe9215dd in sigaction () from /lib/libc.so.1
#2  0xfe8a0f6d in abort () from /lib/libc.so.1
#3  0xfe724e92 in iv_event_post () from /usr/pkg/lib/libivykis.so.0
#4  0xfe72a6a5 in iv_fd_port_set_poll_timeout () from /usr/pkg/lib/libivykis.so.0
#5  0xfe726a64 in iv_fd_poll_and_run () from /usr/pkg/lib/libivykis.so.0
#6  0xfe727e92 in iv_main () from /usr/pkg/lib/libivykis.so.0
#7  0xfe725e9e in iv_work_thread () from /usr/pkg/lib/libivykis.so.0
#8  0xfe728d93 in iv_thread_died () from /usr/pkg/lib/libivykis.so.0
#9  0xfe92cf1b in _thrp_setup () from /lib/libc.so.1
#10 0xfe92d0b0 in setup_context () from /lib/libc.so.1
@buytenh
Copy link
Owner

buytenh commented Sep 29, 2017

Hi, thank you for the bug report, and sorry for the inconvenience. Are any messages logged by syslog-ng when you get this crash? I can't really see off-hand what has changed in this area that could cause this issue to appear, is there any chance you could bisect this?

@bldewolf
Copy link
Author

bldewolf commented Nov 2, 2017

While bisecting the daemon in foreground mode, I also received this error message:

iv_fd_port_poll: got error 0[Error 0]

The final commit that the bisect points to is:

3c9bdf7a8bd2335e7b9dae3b77d4d27cf0abadc4 is the first bad commit
commit 3c9bdf7a8bd2335e7b9dae3b77d4d27cf0abadc4
Author: Lennert Buytenhek <[email protected]>
Date:   Fri Nov 18 08:17:28 2016 +0200

    Stop linking against libpthread.so by default.
    
    Signed-off-by: Lennert Buytenhek <[email protected]>

:100644 100644 38b1899f68179e6ffb636ffbf7fb1d4a38b377d8 6735af3270534be317606f12e412aee7f10949b7 M      configure.ac

@buytenh
Copy link
Owner

buytenh commented Dec 28, 2017

The "Error 0" error you got is weird, it suggests that port_getn() returned a negative value but then errno was set to zero. According to the man page for port_getn() (for Solaris 11.3), this function does behave like most C library functions in that it returns a negative value and sets errno when there is an error, so I'm not sure why it's behaving this way. Is there any chance you could run this through truss and see exactly what happens?

Your gdb output shows that it's crashing because iv_event_post() ended up calling into abort(), but I don't see how it's getting there, nothing in that function calls abort() directly. What happens if you try this?

diff --git a/src/iv_private_posix.h b/src/iv_private_posix.h
index bb04ae5..764250c 100644
--- a/src/iv_private_posix.h
+++ b/src/iv_private_posix.h
@@ -187,7 +187,7 @@ extern pthr_key_t iv_state_key;
 
 static inline int is_mt_app(void)
 {
-       return pthreads_available();
+       return 1;
 }
 
 static inline struct iv_state *iv_get_state(void)

cc @MrAnno

@buytenh
Copy link
Owner

buytenh commented Dec 28, 2017

Also, on the off chance that this will make a difference, could you try this?

diff --git a/src/iv_fd_port.c b/src/iv_fd_port.c
index 510c948..e8e2bae 100644
--- a/src/iv_fd_port.c
+++ b/src/iv_fd_port.c
@@ -20,6 +20,7 @@
 
 #include <stdio.h>
 #include <stdlib.h>
+#include <errno.h>
 #include <fcntl.h>
 #include <port.h>
 #include <siginfo.h>

@MrAnno
Copy link

MrAnno commented Dec 28, 2017

Hi,

While bisecting the daemon in foreground mode, I also received this error message:

iv_fd_port_poll: got error 0[Error 0]

I'm not really familiar with OmniOS, but I've seen a similar problem once on Solaris:

While my project was built using thread-safe errno (-D_REENTRANT and a proper version of _POSIX_C_SOURCE), one of its dependencies was NOT. The end result was a disaster. :)

@mamash
Copy link

mamash commented Jan 12, 2018

I've hit a similar problem, although the error message is different. Here's a full stack trace:

[root ~]# syslog-ng -e --process-mode=foreground
[2018-01-12T12:54:15.235433] syslog-ng starting up; version='3.13.2'
iv_fd_port_poll: got error 95[Socket operation on non-socket]
Abort (core dumped)

[root ~]# pstack core.syslog-ng.31991
core 'core.syslog-ng.31991' of 31991:	syslog-ng -e --process-mode=foreground
-----------------  lwp# 1 / thread# 1  --------------------
 ffffbf7fff2122fa _portfs () + a
 ffffbf7f4928e909 iv_fd_port_poll () + 149
 ffffbf7f4928ada3 iv_fd_poll_and_run () + 1a3
 ffffbf7f4928c0da iv_main () + 7a
 ffffbf7f48d2568e main_loop_run () + 7e
 00000000004027b1 main () + 1c1
 000000000040229c _start () + 6c
-----------------  lwp# 2 / thread# 2  --------------------
 ffffbf7fff21377d __door_return () + 1d
 ffffbf7fff1f6f0d door_return (0, 0, 0, 0) + cd
 ffffbf7fff2137b0 __door_return () + 50
-----------------  lwp# 3 / thread# 3  --------------------
 ffffbf7fff21377d __door_return () + 1d
 ffffbf7fff1f6f0d door_return (0, 0, 0, 0) + cd
 ffffbf7fff1f75cc door_create_func (0) + 2c
 ffffbf7fff20bb2a _thrp_setup (ffffbf7ffefe9cc0) + 8a
 ffffbf7fff20be40 _lwp_start ()
-----------------  lwp# 4 / thread# 4  --------------------
 ffffbf7fff2130da _lwp_kill () + a
 ffffbf7fff1a64d0 raise (6) + 20
 ffffbf7fff1803a8 abort () + 98
 ffffbf7f49288f50 iv_fatal () + d0
 ffffbf7f4928eba7 iv_fd_port_poll () + 3e7
 ffffbf7f4928ac88 iv_fd_poll_and_run () + 88
 ffffbf7f4928c0da iv_main () + 7a
 ffffbf7f4928a02a iv_work_thread () + 7a
 ffffbf7f4928d364 iv_thread_handler () + 34
 ffffbf7fff20bb2a _thrp_setup (ffffbf7ffefedcc0) + 8a
 ffffbf7fff20be40 _lwp_start ()

Using syslog-ng-3.12.2 with ivykis-0.40 seems to work fine. Forcing ivykis-0.42.2 to build threadsafe works too.

Sounds like making pthread support optional is not really an option - at least on SunOS with eventports.

@MrAnno
Copy link

MrAnno commented Mar 9, 2018

Forcing ivykis-0.42.2 to build threadsafe works too.

So the problem here is that after the -pthread C flag has been removed, ivykis started to operate on a non-threadsafe errno (on Solaris-like systems). This is because -pthread implies -D_REENTRANT.

@buytenh Can you define _REENTRANT explicitly in ivykis? Setting -D_POSIX_C_SOURCE >= 199506L works too.

More info: https://docs.oracle.com/cd/E19455-01/806-5257/compile-4/index.html

@buytenh
Copy link
Owner

buytenh commented Apr 1, 2018

OK, so does this work?

diff --git a/configure.ac b/configure.ac
index 8cd7330..042bb02 100644
--- a/configure.ac
+++ b/configure.ac
@@ -69,6 +69,12 @@ osf*)
        #
        CFLAGS="$CFLAGS -D_BSD -D_POSIX_PII_SOCKET"
        ;;
+solaris*)
+       #
+       # On Solaris, we want -D_REENTRANT, so that we get the
+       # threadsafe version of 'errno'.
+       #
+       CFLAGS="$CFLAGS -D_REENTRANT"
 esac
 
 #

@buytenh
Copy link
Owner

buytenh commented May 4, 2018

Fixed by 0d482cc, and released as part of v0.42.3.

@buytenh buytenh closed this as completed May 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants