Closed Bug 1023873 Opened 7 years ago Closed 7 years ago
documentation update for "crontab is CRITICAL: CRITICAL - duplicates (Duplicates
Cron App)" required
tl;dr is that alerts on "duplicates (DuplicatesCronApp)" need to be triaged immediately. The documentation should reflect a greater sense of urgency for this alert. Example alert: 05:42:31 < nagios-phx1> Wed 05:42:31 PDT  sp-admin01.phx1.mozilla.com:Socorro Admin - crontab is CRITICAL: CRITICAL - duplicates (DuplicatesCronApp) (http://m.mozilla.org/Socorro+Admin+-+crontab) Ramifications: 06:45:48 < selenamarie> phrawzty: see bug 1023867 06:45:56 < selenamarie> phrawzty: /var/log/socorro/crontabber.log 06:51:59 <@phrawzty> selenamarie: so it looks like some "bad" data was inserted into postgres ? 06:53:11 < selenamarie> phrawzty: "attempted" to be inserted :) it did not succeed 06:53:17 < selenamarie> the underlying problem is not that. 06:53:49 < selenamarie> underlying problem could be any number of things so i'm going to sit here and read through update_reports_duplicates, reformatting it so that it is readable to start, and then see if either a data fix or a stored proc fix is most appropriate 06:53:53 < selenamarie> we might do both 06:54:03 < selenamarie> this is a pretty bad failure. 06:54:19 < selenamarie> i think we should update the documentation for this particular issue in Mana so that it is clear someone should start triaging it immediately 06:54:29 < selenamarie> phrawzty: this blocked all data processing on postgres overnight
I've updated the run book entry for that alert to indicate that a bug should be filed for Critical alerts from crontabber and further excalation should come to the admin contact listed on the mana page for crash stats. I've also listed myself as the first admin contact on the mana page.
Assignee: nobody → chris.lonnen
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.