Closed
Bug 1162154
Opened 9 years ago
Closed 9 years ago
modify tools/buildfarm/maintenance/watch_twistd_log.py to send to stdout
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: arich, Assigned: coop)
References
Details
Attachments
(2 files)
6.71 KB,
patch
|
arich
:
review+
coop
:
checked-in+
|
Details | Diff | Splinter Review |
1.74 KB,
patch
|
arich
:
review+
coop
:
checked-in+
|
Details | Diff | Splinter Review |
Right now tools/buildfarm/maintenance/watch_twistd_log.py on each buildbot master sends output to email. As part of centralizing logs, we want that output to be useful and sent to syslog instead.
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: -- → P2
Assignee | ||
Comment 1•9 years ago
|
||
Two things to do here: * change the cron job to stop passing the email param which will automatically cause the output to go to STDOUT * change watch_twistd_log.py to still do the aggregation, otherwise we'll still end up with huge logs going to papertrail
Reporter | ||
Comment 2•9 years ago
|
||
Change #1 is simple. I was poking at change #2 and I think that should also be relatively simple. We could simply call format_exceptions for non-email output as well. And we'd want to rewrite the output to make it a bit more useful for logging (putting all the info on one line). For actual implementation, I'd rewrite send_msg so you're just passing in msg and call format_exceptions from __main__ outside of the if. --- /builds/buildbot/tests1-linux32/tools/buildfarm/maintenance/watch_twistd_log.py 2015-04-30 10:38:23.022328606 -0700 +++ /home/arr/tmp/watch_twistd_log.py 2015-05-12 10:21:34.243266978 -0700 @@ -117,10 +117,10 @@ len(excs), hostname, name) for exc_key in sorted(matched_excs, key=lambda x: (matched_excs[x]['count'], x), reverse=True): msg += "\n" + "-" * 80 + "\n" - msg += "Count: %d, " % matched_excs[exc_key]['count'] - msg += "Exception: %s\n" % matched_excs[exc_key]['exception'] - msg += "First instance: %s, Most recent instance: %s\n" % (matched_excs[exc_key]['first'], - matched_excs[exc_key]['most_recent']) + msg += "Count: %d. " % matched_excs[exc_key]['count'] + msg += "First instance: %s. Most recent: %s. " % (matched_excs[exc_key]['first'], + matched_excs[exc_key]['most_recent']) + msg += "Twistd exception: %s\n" % matched_excs[exc_key]['exception'] msg += "Example:\n" msg += matched_excs[exc_key]['example'] return msg @@ -269,12 +269,13 @@ exceptions.extend(s.scan_dirs(args)) if exceptions: + hostname = os.uname()[1] if options.emails: - hostname = os.uname()[1] send_msg(options.fromaddr, options.emails, hostname, exceptions, options.name) else: - print "\n".join(exceptions) + msg = format_exceptions(hostname, exceptions, options.name) + print msg if options.timefile: open(options.timefile, "w").write(str(s.lasttime)) I noticed when running this as a test case that we don't aggregate the count over multiple files, so you'll get a report of each count match for EVERY file it parses. It seems like it would be smarter to concatenate output from all files and then do the aggregation/summation/reporting. After all, we don't care if the message showed up in twistd.log or twistd.log.1, just when the first and last occurrences and the TOTAL number of occurrences are across all parsed files. I don't know puppet functions well enough to tell how that should change so that we're not counting the filename as a unique entity in the array(?) that holds the exceptions.
Assignee | ||
Comment 3•9 years ago
|
||
This patch takes your suggestions from comment #2 and also standardizes the output for UnauthorizedLogin exceptions to match the others. Should make parsing in papertrail easier. Here's some example output: Count: 4567|First instance: 2015-04-16 12:19:36-0700|Most recent instance: 2015-05-12 11:53:14-0700|Twistd exception: twisted.cred.error.UnauthorizedLogin - b-2008-ix-0010.winbuild.releng.scl3.mozilla.com 10.26.36.19 Example: Exception in /tmp/master/twistd.log: 2015-05-12 10:20:38-0700 [Broker,10782,10.26.36.19] Unhandled Error Traceback (most recent call last): Failure: twisted.cred.error.UnauthorizedLogin: -------------------------------------------------------------------------------- Count: 96|First instance: 2015-05-11 19:42:44-0700|Most recent instance: 2015-05-11 20:02:29-0700|Twistd exception: urllib2.HTTPError: HTTP Error 500: Internal Server Error Example: Exception in /tmp/master/twistd.log.1: 2015-05-11 19:42:44-0700 [-] Unhandled Error Traceback (most recent call last): File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbot-0.8.2_hg_9a7b73293807_production_0.8-py2.7.egg/buildbot/util/loop.py", line 178, in _loop_done self._loop_start() File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbot-0.8.2_hg_9a7b73293807_production_0.8-py2.7.egg/buildbot/util/loop.py", line 146, in _loop_start self._remaining = list(self.get_processors()) File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbot-0.8.2_hg_9a7b73293807_production_0.8-py2.7.egg/buildbot/master.py", line 153, in _get_processors builders = sorter(self.parent, builders) File "/builds/buildbot/try1/master/master_common.py", line 195, in prioritizeBuilders twlog.err("handled exception talking to jacuzzi; trying to carry on") --- <exception caught here> --- File "/builds/buildbot/try1/lib/python2.7/site-packages/buildbotcustom/misc.py", line 411, in get_slaves data = json.load(urllib2.urlopen(url, timeout=self.HTTP_TIMEOUT)) File "/tools/python27/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/tools/python27/lib/python2.7/urllib2.py", line 406, in open response = meth(req, response) File "/tools/python27/lib/python2.7/urllib2.py", line 519, in http_response 'http', request, response, code, msg, hdrs) File "/tools/python27/lib/python2.7/urllib2.py", line 444, in error return self._call_chain(*args) File "/tools/python27/lib/python2.7/urllib2.py", line 378, in _call_chain result = func(*args) File "/tools/python27/lib/python2.7/urllib2.py", line 527, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 500: Internal Server Error
Attachment #8604862 -
Flags: review?(arich)
Reporter | ||
Comment 4•9 years ago
|
||
Comment on attachment 8604862 [details] [diff] [review] [tools] Standardize exceptions aggregation output for easier consumption by papertrail Review of attachment 8604862 [details] [diff] [review]: ----------------------------------------------------------------- I'm not a python programmer, so I'm not going to be much help in reviewing the code itself, but as far as the output goes, the one suggestion I would make is to put spaces around the | because if we alert on this from papertrail, you'll want it to be human readable. I also noticed that we're seeing duplicates of the same error, but after chatting with you it sounds like this is intentional since it aggregates based on the line number in the code it was triggered from as well as the error message itself. I think we're better off ignoring the line numbers when doing the aggregation since that's likely not important for triggering an alert (and we only give examples, not full output, so you have to go digging through the logs to find each instance, anyway), but this at least gives us a good start.
Attachment #8604862 -
Flags: review?(arich) → review+
Assignee | ||
Comment 6•9 years ago
|
||
Attachment #8605246 -
Flags: review?(arich)
Reporter | ||
Updated•9 years ago
|
Attachment #8605246 -
Flags: review?(arich) → review+
Assignee | ||
Comment 7•9 years ago
|
||
Comment on attachment 8604862 [details] [diff] [review] [tools] Standardize exceptions aggregation output for easier consumption by papertrail Review of attachment 8604862 [details] [diff] [review]: ----------------------------------------------------------------- https://hg.mozilla.org/build/tools/rev/14514a7c7584
Attachment #8604862 -
Flags: checked-in+
Assignee | ||
Comment 8•9 years ago
|
||
Comment on attachment 8605246 [details] [diff] [review] [puppet] Stop sending twistd error via mail Review of attachment 8605246 [details] [diff] [review]: ----------------------------------------------------------------- https://hg.mozilla.org/build/puppet/rev/9ec568d08d19
Attachment #8605246 -
Flags: checked-in+
Assignee | ||
Comment 9•9 years ago
|
||
Exception logs are appearing in papertrail as expected.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Component: Tools → General
You need to log in
before you can comment on or make changes to this bug.
Description
•