tl;dr is that alerts on "duplicates (DuplicatesCronApp)" need to be triaged immediately.  The documentation should reflect a greater sense of urgency for this alert.

Example alert:

05:42:31 < nagios-phx1> Wed 05:42:31 PDT [1398] Admin - crontab is CRITICAL: CRITICAL - duplicates (DuplicatesCronApp) (


06:45:48 < selenamarie> phrawzty:  see bug 1023867
06:45:56 < selenamarie> phrawzty: /var/log/socorro/crontabber.log
06:51:59 <@phrawzty> selenamarie: so it looks like some "bad" data was inserted into postgres ?
06:53:11 < selenamarie> phrawzty: "attempted" to be inserted :) it did not succeed
06:53:17 < selenamarie> the underlying problem is not that.
06:53:49 < selenamarie> underlying problem could be any number of things so i'm going to sit here and read through update_reports_duplicates, reformatting it so that it is readable to start, and then see if either a data fix or a stored proc fix is most appropriate
06:53:53 < selenamarie> we might do both
06:54:03 < selenamarie> this is a pretty bad failure.
06:54:19 < selenamarie> i think we should update the documentation for this particular issue in Mana so that it is clear someone should start triaging it immediately
06:54:29 < selenamarie> phrawzty: this blocked all data processing on postgres overnight
I've updated the run book entry for that alert to indicate that a bug should be filed for Critical alerts from crontabber and further excalation should come to the admin contact listed on the mana page for crash stats. I've also listed myself as the first admin contact on the mana page.
