Closed Bug 1162154 Opened 9 years ago Closed 9 years ago

modify tools/buildfarm/maintenance/watch_twistd_log.py to send to stdout

Categories

(Release Engineering :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Assigned: coop)

References

Details

Attachments

(2 files)

Right now tools/buildfarm/maintenance/watch_twistd_log.py on each buildbot master sends output to email. As part of centralizing logs, we want that output to be useful and sent to syslog instead.
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: -- → P2
Two things to do here:

* change the cron job to stop passing the email param which will automatically cause the output to go to STDOUT
* change watch_twistd_log.py to still do the aggregation, otherwise we'll still end up with huge logs going to papertrail
Change #1 is simple. I was poking at change #2 and I think that should also be relatively simple. We could simply call format_exceptions for non-email output as well. And we'd want to rewrite the output to make it a bit more useful for logging (putting all the info on one line). For actual implementation, I'd rewrite send_msg so you're just passing in msg and call format_exceptions from __main__ outside of the if.

--- /builds/buildbot/tests1-linux32/tools/buildfarm/maintenance/watch_twistd_log.py     2015-04-30 10:38:23.022328606 -0700
+++ /home/arr/tmp/watch_twistd_log.py   2015-05-12 10:21:34.243266978 -0700
@@ -117,10 +117,10 @@
             len(excs), hostname, name)
         for exc_key in sorted(matched_excs, key=lambda x: (matched_excs[x]['count'], x), reverse=True):
             msg += "\n" + "-" * 80 + "\n"
-            msg += "Count: %d, " % matched_excs[exc_key]['count']
-            msg += "Exception: %s\n" % matched_excs[exc_key]['exception']
-            msg += "First instance: %s, Most recent instance: %s\n" % (matched_excs[exc_key]['first'],
-                                                                       matched_excs[exc_key]['most_recent'])
+            msg += "Count: %d. " % matched_excs[exc_key]['count']
+            msg += "First instance: %s. Most recent: %s. " % (matched_excs[exc_key]['first'],
+                                                              matched_excs[exc_key]['most_recent'])
+            msg += "Twistd exception: %s\n" % matched_excs[exc_key]['exception']
             msg += "Example:\n"
             msg += matched_excs[exc_key]['example']
     return msg
@@ -269,12 +269,13 @@
     exceptions.extend(s.scan_dirs(args))
 
     if exceptions:
+        hostname = os.uname()[1]
         if options.emails:
-            hostname = os.uname()[1]
             send_msg(options.fromaddr, options.emails,
                      hostname, exceptions, options.name)
         else:
-            print "\n".join(exceptions)
+            msg = format_exceptions(hostname, exceptions, options.name)
+            print msg
 
     if options.timefile:
         open(options.timefile, "w").write(str(s.lasttime))



I noticed when running this as a test case that we don't aggregate the count over multiple files, so you'll get a report of each count match for EVERY file it parses. It seems like it would be smarter to concatenate output from all files and then do the aggregation/summation/reporting.  After all, we don't care if the message showed up in twistd.log or twistd.log.1, just when the first and last occurrences and the TOTAL number of occurrences are across all parsed files.

I don't know puppet functions well enough to tell how that should change so that we're not counting the filename as a unique entity in the array(?) that holds the exceptions.
This patch takes your suggestions from comment #2 and also standardizes the output for UnauthorizedLogin exceptions to match the others. Should make parsing in papertrail easier.

Here's some example output:

Count: 4567|First instance: 2015-04-16 12:19:36-0700|Most recent instance: 2015-05-12 11:53:14-0700|Twistd exception: twisted.cred.error.UnauthorizedLogin - b-2008-ix-0010.winbuild.releng.scl3.mozilla.com 10.26.36.19
Example:
Exception in /tmp/master/twistd.log:
2015-05-12 10:20:38-0700 [Broker,10782,10.26.36.19] Unhandled Error
        Traceback (most recent call last):
        Failure: twisted.cred.error.UnauthorizedLogin: 

--------------------------------------------------------------------------------
Count: 96|First instance: 2015-05-11 19:42:44-0700|Most recent instance: 2015-05-11 20:02:29-0700|Twistd exception: urllib2.HTTPError: HTTP Error 500: Internal Server Error
Example:
Exception in /tmp/master/twistd.log.1:
2015-05-11 19:42:44-0700 [-] Unhandled Error
        Traceback (most recent call last):
          File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbot-0.8.2_hg_9a7b73293807_production_0.8-py2.7.egg/buildbot/util/loop.py", line 178, in _loop_done
            self._loop_start()
          File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbot-0.8.2_hg_9a7b73293807_production_0.8-py2.7.egg/buildbot/util/loop.py", line 146, in _loop_start
            self._remaining = list(self.get_processors())
          File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbot-0.8.2_hg_9a7b73293807_production_0.8-py2.7.egg/buildbot/master.py", line 153, in _get_processors
            builders = sorter(self.parent, builders)
          File "/builds/buildbot/try1/master/master_common.py", line 195, in prioritizeBuilders
            twlog.err("handled exception talking to jacuzzi; trying to carry on")
        --- <exception caught here> ---
          File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/misc.py", line 411, in get_slaves
            data = json.load(urllib2.urlopen(url, timeout=self.HTTP_TIMEOUT))
          File "/tools/python27/lib/python2.7/urllib2.py", line 126, in urlopen
            return _opener.open(url, data, timeout)
          File "/tools/python27/lib/python2.7/urllib2.py", line 406, in open
            response = meth(req, response)
          File "/tools/python27/lib/python2.7/urllib2.py", line 519, in http_response
            'http', request, response, code, msg, hdrs)
          File "/tools/python27/lib/python2.7/urllib2.py", line 444, in error
            return self._call_chain(*args)
          File "/tools/python27/lib/python2.7/urllib2.py", line 378, in _call_chain
            result = func(*args)
          File "/tools/python27/lib/python2.7/urllib2.py", line 527, in http_error_default
            raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
        urllib2.HTTPError: HTTP Error 500: Internal Server Error
Attachment #8604862 - Flags: review?(arich)
Comment on attachment 8604862 [details] [diff] [review]
[tools] Standardize exceptions aggregation output for easier consumption by papertrail

Review of attachment 8604862 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not a python programmer, so I'm not going to be much help in reviewing the code itself, but as far as the output goes, the one suggestion I would make is to put spaces around the | because if we alert on this from papertrail, you'll want it to be human readable.

I also noticed that we're seeing duplicates of the same error, but after chatting with you it sounds like this is intentional since it aggregates based on the line number in the code it was triggered from as well as the error message itself. I think we're better off ignoring the line numbers when doing the aggregation since that's likely not important for triggering an alert (and we only give examples, not full output, so you have to go digging through the logs to find each instance, anyway), but this at least gives us a good start.
Attachment #8604862 - Flags: review?(arich) → review+
Attachment #8605246 - Flags: review?(arich) → review+
Comment on attachment 8604862 [details] [diff] [review]
[tools] Standardize exceptions aggregation output for easier consumption by papertrail

Review of attachment 8604862 [details] [diff] [review]:
-----------------------------------------------------------------

https://hg.mozilla.org/build/tools/rev/14514a7c7584
Attachment #8604862 - Flags: checked-in+
Comment on attachment 8605246 [details] [diff] [review]
[puppet] Stop sending twistd error via mail

Review of attachment 8605246 [details] [diff] [review]:
-----------------------------------------------------------------

https://hg.mozilla.org/build/puppet/rev/9ec568d08d19
Attachment #8605246 - Flags: checked-in+
Exception logs are appearing in papertrail as expected.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: