-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conection reset exception breaks server #23
Comments
Are syslog messages still stored in MongoDB? The MongoDB driver is pooling connections and it seems like they are not re-established after they are lost. This message means that there is no connection to MongoDB:
I'm in the office now, but will take a look at this later. Thanks! |
Thank you very much for this valuable information! We will fix this soon (as I also encountered the problem in a production scenario) and build a patchlevel release. |
Hmm, this may not be graylog-servers fault. I noticed I could no longer connect to mongodb with the "mongo" cli client. So I restarted the mongodb daemon, and the mongo client was able to talk to it again. Then graylog-server seemed to recover correctly and I was able to send syslog and ruby client messages to it again. The graylog web ui didn't seem to recover as well - at least, it was was extraordinarily slow until I restarted it, and then seemed fine. So if mongo was at fault, I still don't see why syslog was able to keep logging, but ruby client was not. Web UI also seemed to work during this, but was really slow, and stayed that way till after I restarted it. Is ruby/gelf receiver thread creating a new connection to mongo each time or something? |
Also, any idea whats up with mongo? I do have both the mongo logs and graylog server logs saved if that will help. Using the mongo package for ubuntu: I do see this: Thu Feb 10 10:01:28 [initandlisten] connection refused because too many open connections: 819 of 819 I also see a whole bunch of these, which might be harmless: Wed Feb 9 12:04:24 [conn6] graylog2.messages Btree::insert: key too large to index, skipping graylog2.messages.$message_1 1757 { : "rails::models: SQL (0.7ms) INSERT INTO "users" ("active", "address"..." } And a bunch of massive lines (hundreds of pages, 800K+columns) like the following, but maybe thats just mongo being too verbose: Thu Feb 10 10:01:28 [conn8] query graylog2.$cmd ntoreturn:1 command: { count: "messages", query: { deleted: { $in: [ false, null, false, null, false, null, false, null, false, null, false, null, .....<many, many, many more false's and null's> ] }, created_at: { $gt: 1297341410 }, message: { $nin: {} } }, fields: null } reslen:64 127ms |
Okay, you are running in the same "too many open connections" as some other users do. I guess this is the root cause of the problems. This only happens to a few, so I'd like to find out what conditions cause this... How many syslog and how many GELF messages do you handle per second? |
and what versions of MongoDB, graylog2-server and graylog2-web-interface are you running? |
Not that many, this is on a staging server, a little bursty sometimes, but no sustained logging - web ui is reporting 21.011 messages, and thats since I brought it up about noon yesterday, so 24 hours worth. Looks like it ran out of connections at about 4:30pm: Wed Feb 9 16:35:02 [initandlisten] connection accepted from 10.114.94.167:52284 The versions I use are: graylog2-server-0.9.4p1 |
BTW, I do have a monit script for checking to make sure both graylog and mongodb are up. The script runs once a minute, connects to make sure it can, then exits, so not long lived enough to hold onto the connections. |
I just tried latest HEAD, but still running out of connections to mongoDB, so catching Exception in GELFClientHandlerThread didn't fix this. |
I monitored the graylog2-server process with jVisualVM and could not see any connection leaks so far. Could it be that you have some load spikes? The server spawns and closes receiver threads dynamically. I fired up some spikes which ended in ~800 threads that were living for 1 minute. Maybe you just have to raise the maximum connection limit in MongoDB - They will automatically decrease after the spike. Thanks for any more information! |
I think the problem is that graylog by default had mongodb_max_connections set to 500, and mongodb had no settings - I thought the default for mongodb was higher than 500, and executing "db.serverStatus()" in a mongo console showed that it should have a default of over 800. However, when it starts denying connections, its nowhere near that number, but rather closer to 300. I think maybe the os limit on open file handles may be taking effect before the connection limit is reached. I'm trying to run graylog with a lower connection limit 0f 200 to see what happens, as well as looking into the best way to increase the limit on open file handles for mongo. |
Looks like that was it, gray connections steady at reduced number (200), and mongodb is ok with it. You should probably reduce the default # of graylog/mongodb connections in graylog2.conf so that it works with the default mongo setup (on ubuntu anyway). Alwo maybe add a comment to the config file telling people they might need to increase connection count in mongodb config file as well as increase the open file handle limit for the mongodb process. |
Wonderful! I'll add that to the wiki and change the config file (+ add the comment). Thank you very much! Closing this issue. |
OSSEC is using a "degraded" syslog format without hostname field. Fixes #23
The previous path in /tmp is problematic because that directory is cleaned up on reboot in most Linux distributions. Refs #23
Update Integrations repo readme
I started up the server last night, and left it running overnight. This morning, I was no longer able to connect to it with the ruby gelf client, though the syslog messages still seem to be getting there. Fortunately I had the server running with debug logging. The first trace is the first one in the log, it happens a number of times, then I start seeing the other traces below it. The last trace is the entirety of a single call from the ruby gelf client
The text was updated successfully, but these errors were encountered: