Closed Bug 623594 Opened 15 years ago Closed 11 years ago

Aggregate similar exceptions in emails from masters

Categories

(Release Engineering :: General, enhancement, P5)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Assigned: coop)

Details

(Whiteboard: [buildmasters][reporting])

Attachments

(3 files)

Two things that would make the exceptions emails from masters more useful IMO: 1) Report an aggregate count of exceptions of a certain type (e.g. twisted.spread.pb.PBConnectionLost). Report the first one found and then print afterwards "...and 10 more like this." 2) Exclude exceptions that we can't do anything about, i.e. twisted.spread.pb.PBConnectionLost. This should be a blacklist so we can start caring about them in the future if we *can* do something about them.
Severity: normal → enhancement
Priority: -- → P5
taking the easy part first - ignore PBConnectionLost exceptions
Attachment #503882 - Flags: review?(coop)
Attachment #503882 - Flags: review?(coop) → review+
Comment on attachment 503882 [details] [diff] [review] Ignore PBConnectionLost exceptions changeset: 1128:a92b4503ee69
Attachment #503882 - Flags: checked-in+
Deployed on all the masters
Product: mozilla.org → Release Engineering
Component: Other → Tools
QA Contact: hwine
I've become fed up enough with the UnauthorizedLogin exceptions from try-linux64-ec2-golden that I've written a patch here.
Assignee: nobody → coop
Status: NEW → ASSIGNED
Because UnauthorizedLogin exceptions happen every few seconds while the slave is still trying to connect, they turn a usually non-existent exception email into a 300K hourly monster. None of the other exceptions we currently hit happen with that kind of frequency, but the pattern I've implemented here could be extended to other cases should it be required. I collect all the exceptions as we did before. I then post-hoc run the exceptions through a comparison function that strips out and aggregates the login exceptions, passing the smaller list of remaining exceptions on to the next comparison (no other comparison functions exist yet). We get a roll-up of the frequency of the UnauthorizedLogin attempts by host. As a bonus, I've also done a hostname lookup from the reported ip, since we only get the ip in the twistd.log. Here's a truncated example of the output that I just ran against *all* the twistd.logs on bm75-try1: --- The following slaves tried to connect unsuccessfully to buildbot-master75.srv.releng.use1.mozilla.com bm75-try1: # attempts - hostname (ip) - last seen 96559 - try-linux64-ec2-golden.try.releng.use1.mozilla.com (10.134.49.65) - 2014-07-31 11:38:15-0700 Example: Exception in /builds/buildbot/try1/master/twistd.log.14: 2014-07-28 01:42:02-0700 [Broker,102652,10.134.49.65] Unhandled Error Traceback (most recent call last): Failure: twisted.cred.error.UnauthorizedLogin: -------------------------------------------------------------------------------- The following other exceptions (total 32) were detected on buildbot-master75.srv.releng.use1.mozilla.com bm75-try1: Exception in /builds/buildbot/try1/master/twistd.log.49: 2014-07-04 02:32:19-0700 [-] Unhandled Error [deletia] --- We won't normally get this many since we limit by timestamp when running normally.
Attachment #8465681 - Flags: review?(bugspam.Callek)
Comment on attachment 8465681 [details] [diff] [review] Aggregate UnauthorizedLogin exceptions wfm, thanks!
Attachment #8465681 - Flags: review?(bugspam.Callek) → review+
Comment on attachment 8465681 [details] [diff] [review] Aggregate UnauthorizedLogin exceptions Review of attachment 8465681 [details] [diff] [review]: ----------------------------------------------------------------- https://hg.mozilla.org/build/tools/rev/b65452fec28d
Attachment #8465681 - Flags: checked-in+
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This small follow-up patch explicitly checks for UnauthorizedLogin errors. Given the regexp the way it was, we were rolling up *any* exception that displayed a an IP in the broker tag. These included things like MySQL host connection errors.
Attachment #8470225 - Flags: review?(bugspam.Callek)
Attachment #8470225 - Flags: review?(bugspam.Callek) → review+
Comment on attachment 8470225 [details] [diff] [review] Only roll-up login exceptions Review of attachment 8470225 [details] [diff] [review]: ----------------------------------------------------------------- https://hg.mozilla.org/build/tools/rev/4a8c1893c3db
Attachment #8470225 - Flags: checked-in+
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: