Closed
Bug 623410
Opened 14 years ago
Closed 14 years ago
syslog drops socorro messages
Categories
(Socorro :: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rhelmer, Assigned: rhelmer)
Details
Attachments
(1 file, 1 obsolete file)
1.83 KB,
patch
|
lars
:
review+
|
Details | Diff | Splinter Review |
We use syslog for Socorro, and we've seen instances of it dropping messages when doing load testing recently (both on a production node and also on staging). This is not a remote syslog, the messages are being sent from the local machine.
We aren't currently monitoring disk activity with ganglia (I'll file a bug for this), but using atop on production, even outside of peak times, I see that syslogd, kjournald and httpd together (mostly the first two) frequently keep the disk at 100% utilization.
So a couple things:
* we should figure out which messages we can't possibly live without, and log them in a way which can't get dropped like syslog-ng, directly to file
** I think buffering would be OK in most cases
* we should make sure the "INFO" level has all the messages we need, and disable "DEBUG" in production. This should decrease the overall volume of log traffic, most of which we don't need.
Comment 1•14 years ago
|
||
Specifically, we need a list of things that are DEBUG that should be INFO. (Lars?)
Assignee | ||
Comment 2•14 years ago
|
||
Default to logging to local socket rather than UDP. May want to add TCP in the future, but I think we might be better off always logging to local and letting the local syslogd decide what to do.
Assignee | ||
Comment 3•14 years ago
|
||
Fixed errant comment.
Attachment #504283 -
Attachment is obsolete: true
Attachment #504285 -
Flags: review?(lars)
Attachment #504283 -
Flags: review?(lars)
Updated•14 years ago
|
Attachment #504285 -
Flags: review?(lars) → review+
Assignee | ||
Comment 4•14 years ago
|
||
Assignee | ||
Comment 5•14 years ago
|
||
Have done some testing on staging and PHX pre-production and this seems to work. I would like to see this under high volume before we mark this VERIFIED (make sure that # of submitted crashes match # of received crashes).
(In reply to comment #0)
> We aren't currently monitoring disk activity with ganglia (I'll file a bug for
> this), but using atop on production, even outside of peak times, I see that
> syslogd, kjournald and httpd together (mostly the first two) frequently keep
> the disk at 100% utilization.
We are now monitoring disk activity with ganglia now in both PHX and SJC.
PHX seems much better in this regard, it's probably either something about RHEL6 versus RHEL5 (like rsyslog) or perhaps just better hardware in PHX. We can figure this out when we turn SJC into staging.
> * we should figure out which messages we can't possibly live without, and log
> them in a way which can't get dropped like syslog-ng, directly to file
> ** I think buffering would be OK in most cases
We use local socket now, and could easily add TCP if we need to (I think letting local syslogd handle remoting is simpler for us right now).
> * we should make sure the "INFO" level has all the messages we need, and
> disable "DEBUG" in production. This should decrease the overall volume of log
> traffic, most of which we don't need.
I'm not sure we should worry about this after all, it's very useful to have detailed info in production for troubleshooting and debugging live problems. Also the nagios monitors currently depend on these files being constantly written to.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•