Closed Bug 623410 Opened 14 years ago Closed 14 years ago

syslog drops socorro messages

Categories

(Socorro :: General, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rhelmer, Assigned: rhelmer)

Details

Attachments

(1 file, 1 obsolete file)

We use syslog for Socorro, and we've seen instances of it dropping messages when doing load testing recently (both on a production node and also on staging). This is not a remote syslog, the messages are being sent from the local machine. We aren't currently monitoring disk activity with ganglia (I'll file a bug for this), but using atop on production, even outside of peak times, I see that syslogd, kjournald and httpd together (mostly the first two) frequently keep the disk at 100% utilization. So a couple things: * we should figure out which messages we can't possibly live without, and log them in a way which can't get dropped like syslog-ng, directly to file ** I think buffering would be OK in most cases * we should make sure the "INFO" level has all the messages we need, and disable "DEBUG" in production. This should decrease the overall volume of log traffic, most of which we don't need.
Specifically, we need a list of things that are DEBUG that should be INFO. (Lars?)
Default to logging to local socket rather than UDP. May want to add TCP in the future, but I think we might be better off always logging to local and letting the local syslogd decide what to do.
Assignee: nobody → rhelmer
Status: NEW → ASSIGNED
Attachment #504283 - Flags: review?(lars)
Fixed errant comment.
Attachment #504283 - Attachment is obsolete: true
Attachment #504285 - Flags: review?(lars)
Attachment #504283 - Flags: review?(lars)
Attachment #504285 - Flags: review?(lars) → review+
Landed: r2881 Fixed mispelling of "transport" in config: r2882
Have done some testing on staging and PHX pre-production and this seems to work. I would like to see this under high volume before we mark this VERIFIED (make sure that # of submitted crashes match # of received crashes). (In reply to comment #0) > We aren't currently monitoring disk activity with ganglia (I'll file a bug for > this), but using atop on production, even outside of peak times, I see that > syslogd, kjournald and httpd together (mostly the first two) frequently keep > the disk at 100% utilization. We are now monitoring disk activity with ganglia now in both PHX and SJC. PHX seems much better in this regard, it's probably either something about RHEL6 versus RHEL5 (like rsyslog) or perhaps just better hardware in PHX. We can figure this out when we turn SJC into staging. > * we should figure out which messages we can't possibly live without, and log > them in a way which can't get dropped like syslog-ng, directly to file > ** I think buffering would be OK in most cases We use local socket now, and could easily add TCP if we need to (I think letting local syslogd handle remoting is simpler for us right now). > * we should make sure the "INFO" level has all the messages we need, and > disable "DEBUG" in production. This should decrease the overall volume of log > traffic, most of which we don't need. I'm not sure we should worry about this after all, it's very useful to have detailed info in production for troubleshooting and debugging live problems. Also the nagios monitors currently depend on these files being constantly written to.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: