segfault when date format is incorrect #239

poblahblahblah · 2016-12-08T18:14:20Z

Hello,

I am currently testing out carbon-c-relay and had accidentally configured the metric emitter to send a higher resolution timestamp (milliseconds rather than seconds) than I meant to. This resulted in a segfault from carbon-c-relay.

This results in a segfault:

# Current time in milliseconds
now=$(($(date +%s%N)/1000000))
metric=test.metric
value=$(( $RANDOM % 100 ))
host=127.0.0.1

This is fine and works as expected:

now=$(date +'%s')
metric=test.metric
value=$(echo $(( $RANDOM % 10 )))
host=127.0.0.1

I am on CentOS7.2, using carbon-c-relay 2.3. I also tested this against HEAD in master and it caused a segfault there as well.

The text was updated successfully, but these errors were encountered:

grobian · 2016-12-09T09:12:47Z

I don't really get this syntax. Where do you put this and how does it relate to carbon-c-relay?

poblahblahblah · 2016-12-09T15:56:14Z

@grobian I should have been more clear - those are shell scripts that act as "emitters" to send fake metrics to carbon-c-relay.

When you send the timestamp as time since unix epoch in milliseconds it causes carbon-c-relay to segfault. I am presuming that this is an integer overflow.

piotr1212 · 2016-12-09T18:05:51Z

Interesting. Could you please post your configuration and check what those emitters are actually sending? Check with tcpdump or start 'nc -v -k -l someport' and let the emitter send there. Op vr 9 dec. 2016 16:56 schreef poblahblahblah <[email protected]>:

…

@grobian <https://github.com/grobian> I should have been more clear - those are shell scripts that act as "emitters" to send fake metrics to carbon-c-relay. When you send the timestamp as time since unix epoch in milliseconds it causes carbon-c-relay to segfault. I am presuming that this is an integer overflow. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#239 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACxrWhtXKEcNaGfgDAh0GhXJkxmZz8AXks5rGXofgaJpZM4LIHvM> .

poblahblahblah · 2016-12-09T19:11:49Z

carbon-c-relay config:

cluster kairos forward 192.168.168.11;

aggregate
    ^sys\.dc1\.somehost-[0-9]+\.somecluster\.mysql\.replication_delay
  every 30 seconds
  expire after 60 seconds
  compute average write to
    mysql.somecluster.average_replication_delay
  send to kairos
;

match *
  send to kairos
;

version: carbon-c-relay v2.3 (633ff6)

Here is an example of a emitter sending something good (using netcat as a "listener"):

sys.dc1.somehost-1.somecluster.mysql.replication_delay 6 1481310457

Here is an example of an emitter sending something that causes carbon-c-relay to crash:

sys.dc1.somehost-1.somecluster.mysql.replication_delay 3 1481310571477

So, yeah - sending the timestamp in milliseconds causes carbon-c-relay to segfault. I know clients out there are able to send in millisecond resolution even though it's not supported by carbon-c-relay, so it seems like a misconfigured client could pretty easily bring down a carbon-c-relay based setup.

grobian · 2016-12-10T07:46:15Z

ok, this is an aggregator problem then, will have a look.

grobian · 2016-12-10T08:12:09Z

I tried your example, it doesn't crash for me. I've checked the code, and it does bounds checking, so a crash on this is not expected either.

grobian · 2016-12-10T08:32:13Z

A stacktrace would be very helpful here, if possible.

grobian · 2016-12-28T10:37:17Z

I tried reproducing once again, but no matter what I try, no crack.

poblahblahblah · 2017-01-05T00:52:03Z

sorry, @grobian was busy with work, then on vacation.

How would I go about getting a stacktrace? I can still trigger the crash locally on 2.3 and the recently released 2.4.

grobian · 2017-01-05T06:43:33Z

Ok, is it possible for you to compile carbon-c-relay yourself? If so, use CFLAGS="-g". Then run the relay in gdb, like gdb --args ./relay . When it crashes, type thread apply all bt -- that should give the information I'm looking for.

poblahblahblah · 2017-01-05T18:31:10Z

@grobian is this what you are looking for:

[vagrant@localhost carbon-c-relay-2.4]$ gdb --args relay -f conf/relay.conf
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/vagrant/carbon-c-relay-2.4/relay...done.

(gdb) run
Starting program: /home/vagrant/carbon-c-relay-2.4/relay -f conf/relay.conf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[2017-01-05 18:28:29] starting carbon-c-relay v2.4 (2017-01-05), pid=4434
configuration:
    relay hostname = localhost.localdomain
    listen port = 2003
    workers = 1
    send batch size = 2500
    server queue size = 25000
    server max stalls = 4
    statistics submission interval = 60s
    listen backlog = 32
    server connection IO timeout = 600ms
    routes configuration = conf/relay.conf

parsed configuration follows:
cluster kairos
    forward
        localhost:4444
    ;

aggregate (.*)\.bidder\.(.*)\.blah\.(.*)
    every 30 seconds
    expire after 60 seconds
    timestamp at end of bucket
    compute average write to
        agg.platform.bidder.all.blah.\3
    compute average write to
        agg.\1.bidder.all.blah.\3
    send to kairos
    ;

[2017-01-05 18:28:29] listening on tcp4 0.0.0.0 port 2003
[2017-01-05 18:28:29] listening on tcp6 :: port 2003
[2017-01-05 18:28:29] listening on udp4 0.0.0.0 port 2003
[2017-01-05 18:28:29] listening on udp6 :: port 2003
[2017-01-05 18:28:29] listening on UNIX socket /tmp/.s.carbon-c-relay.2003
[New Thread 0x7ffff61d4700 (LWP 4438)]
[2017-01-05 18:28:29] starting 1 workers
[New Thread 0x7ffff59d3700 (LWP 4439)]
[New Thread 0x7ffff51d2700 (LWP 4440)]
[2017-01-05 18:28:29] starting aggregator
[New Thread 0x7ffff49d1700 (LWP 4441)]
[2017-01-05 18:28:29] starting statistics collector
[New Thread 0x7ffff41d0700 (LWP 4442)]
[2017-01-05 18:28:29] starting servers
[New Thread 0x7ffff39cf700 (LWP 4443)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff59d3700 (LWP 4439)]
aggregator_putmetric (s=0x6237f0, metric=metric@entry=0x7ffff61e3d0d "pob.bidder.pobtest001.blah.test 9 1483578550658\n",
    firstspace=firstspace@entry=0x7ffff61e3d5d " 9 1483578550658\n", nmatch=4, pmatch=pmatch@entry=0x7ffff59d0ad0) at aggregator.c:326
326			if (bucket->cnt == 0) {
Missing separate debuginfos, use: debuginfo-install glibc-2.17-106.el7_2.8.x86_64
(gdb) thread apply all bt

Thread 7 (Thread 0x7ffff39cf700 (LWP 4443)):
#0  0x00007ffff75b8efd in nanosleep () from /lib64/libc.so.6
#1  0x00007ffff75e9b34 in usleep () from /lib64/libc.so.6
#2  0x000000000040f1af in server_queuereader (d=0x61b1b0) at server.c:119
#3  0x00007ffff78c4dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff75f1ced in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7ffff41d0700 (LWP 4442)):
#0  0x00007ffff75b8efd in nanosleep () from /lib64/libc.so.6
#1  0x00007ffff75b8d94 in sleep () from /lib64/libc.so.6
#2  0x0000000000410569 in collector_runner (s=<optimized out>) at collector.c:140
#3  0x00007ffff78c4dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff75f1ced in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7ffff49d1700 (LWP 4441)):
#0  0x00007ffff75b8efd in nanosleep () from /lib64/libc.so.6
#1  0x00007ffff75e9b34 in usleep () from /lib64/libc.so.6
#2  0x0000000000412168 in aggregator_expire (sub=0x6237f0) at aggregator.c:568
#3  0x00007ffff78c4dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff75f1ced in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7ffff51d2700 (LWP 4440)):
#0  0x00007ffff75b8efd in nanosleep () from /lib64/libc.so.6
#1  0x00007ffff75e9b34 in usleep () from /lib64/libc.so.6
#2  0x000000000040f1af in server_queuereader (d=0x621520) at server.c:119
#3  0x00007ffff78c4dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff75f1ced in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7ffff59d3700 (LWP 4439)):
#0  aggregator_putmetric (s=0x6237f0, metric=metric@entry=0x7ffff61e3d0d "pob.bidder.pobtest001.blah.test 9 1483578550658\n",
    firstspace=firstspace@entry=0x7ffff61e3d5d " 9 1483578550658\n", nmatch=4, pmatch=pmatch@entry=0x7ffff59d0ad0) at aggregator.c:326
#1  0x00000000004079a5 in router_route_intern (blackholed=blackholed@entry=0x7ffff59d2d27 "", ret=ret@entry=0x7ffff61e5d10, curlen=curlen@entry=0x7ffff59d2d28, retsize=retsize@entry=64,
    srcaddr=srcaddr@entry=0x7ffff61e1ced "192.168.168.11", metric=metric@entry=0x7ffff61e3d0d "pob.bidder.pobtest001.blah.test 9 1483578550658\n",
    firstspace=firstspace@entry=0x7ffff61e3d5d " 9 1483578550658\n", r=0x7ffff7fac298) at router.c:3073
#2  0x000000000040eb27 in router_route (rtr=<optimized out>, ret=ret@entry=0x7ffff61e5d10, retcnt=retcnt@entry=0x7ffff61e6110, retsize=retsize@entry=64, srcaddr=srcaddr@entry=0x7ffff61e1ced "192.168.168.11",
    metric=metric@entry=0x7ffff61e3d0d "pob.bidder.pobtest001.blah.test 9 1483578550658\n", firstspace=firstspace@entry=0x7ffff61e3d5d " 9 1483578550658\n") at router.c:3180
#3  0x0000000000406819 in dispatch_connection (start=..., self=0x621260, conn=0x7ffff61e1ce8) at dispatcher.c:447
#4  dispatch_runner (arg=0x621260) at dispatcher.c:628
#5  0x00007ffff78c4dc5 in start_thread () from /lib64/libpthread.so.0
#6  0x00007ffff75f1ced in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7ffff61d4700 (LWP 4438)):
#0  0x00007ffff75e769d in poll () from /lib64/libc.so.6
#1  0x0000000000406438 in dispatch_runner (arg=0x623840) at dispatcher.c:578
#2  0x00007ffff78c4dc5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007ffff75f1ced in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7ffff7fed740 (LWP 4434)):
#0  0x00007ffff75b8efd in nanosleep () from /lib64/libc.so.6
#1  0x00007ffff75b8d94 in sleep () from /lib64/libc.so.6
#2  0x0000000000402e26 in main (argc=<optimized out>, argv=<optimized out>) at relay.c:827

grobian · 2017-01-05T18:35:46Z

YES! Thank you!!!

This should be the fix for issue #239. The bucket position was calculated in a temp variable which was not big enough to hold the result. Likely the truncated value fitted in the position list, but the bucket wasn't initialised yet.

grobian · 2017-01-05T19:46:01Z

Can you try if my latest commit changes anything for you? After pushing I got a bit sceptical whether or not I found the issue.

poblahblahblah · 2017-01-05T21:16:39Z

@grobian Everything looks good on the latest commit. Here is the netcat output that carbon-c-relay is relaying to:

pob.bidder.pobtest001.blah.test 9 1483578450
pob.bidder.pobtest001.blah.test 9 1483578550658

poblahblahblah · 2017-01-05T22:03:43Z

@grobian also, thanks a lot for the fix and for carbon-c-relay!

grobian · 2017-01-06T07:28:51Z

So, are you happy with this behaviour? (It should drop the metric with a message about being too much in the future or something) If this is ok for you, I'll close this and release 2.5 soon.

poblahblahblah · 2017-01-06T17:40:41Z

Yeah, this behavior is fine with me.

grobian closed this as completed Dec 28, 2016

grobian reopened this Jan 5, 2017

grobian closed this as completed Jan 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segfault when date format is incorrect #239

segfault when date format is incorrect #239

poblahblahblah commented Dec 8, 2016

grobian commented Dec 9, 2016

poblahblahblah commented Dec 9, 2016

piotr1212 commented Dec 9, 2016 via email

poblahblahblah commented Dec 9, 2016

grobian commented Dec 10, 2016

grobian commented Dec 10, 2016

grobian commented Dec 10, 2016

grobian commented Dec 28, 2016

poblahblahblah commented Jan 5, 2017 •

edited

Loading

grobian commented Jan 5, 2017

poblahblahblah commented Jan 5, 2017 •

edited

Loading

grobian commented Jan 5, 2017

grobian commented Jan 5, 2017

poblahblahblah commented Jan 5, 2017

poblahblahblah commented Jan 5, 2017

grobian commented Jan 6, 2017

poblahblahblah commented Jan 6, 2017

segfault when date format is incorrect #239

segfault when date format is incorrect #239

Comments

poblahblahblah commented Dec 8, 2016

grobian commented Dec 9, 2016

poblahblahblah commented Dec 9, 2016

piotr1212 commented Dec 9, 2016 via email

poblahblahblah commented Dec 9, 2016

grobian commented Dec 10, 2016

grobian commented Dec 10, 2016

grobian commented Dec 10, 2016

grobian commented Dec 28, 2016

poblahblahblah commented Jan 5, 2017 • edited Loading

grobian commented Jan 5, 2017

poblahblahblah commented Jan 5, 2017 • edited Loading

grobian commented Jan 5, 2017

grobian commented Jan 5, 2017

poblahblahblah commented Jan 5, 2017

poblahblahblah commented Jan 5, 2017

grobian commented Jan 6, 2017

poblahblahblah commented Jan 6, 2017

poblahblahblah commented Jan 5, 2017 •

edited

Loading

poblahblahblah commented Jan 5, 2017 •

edited

Loading