Closed Bug 822102 Opened 12 years ago Closed 12 years ago

Almost no crash reports since December 16, 11:25 UTC

Categories

(Socorro :: General, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: scoobidiver, Unassigned)

References

Details

Crash collection, moving->hbase, and monitor all seem to be working.

Processor is reporting errors connecting to postgres, looks like it started around 5:41

Dec 16 05:41:50 Socorro Processor (pid 31780): 2012-12-16 05:41:50,671 CRITICAL - Thread-21 - something's gone horribly wrong with the database connection
Dec 16 05:41:50 Socorro Processor (pid 31780): 2012-12-16 05:41:50,672 CRITICAL - Thread-21 - Caught Error: <class 'psycopg2.OperationalError'>
Dec 16 05:41:50 Socorro Processor (pid 31780): 2012-12-16 05:41:50,672 CRITICAL - Thread-21 - server closed the connection unexpectedly#012#011This probably means the server terminated abnormally#012#011before or while processing the request.
Dec 16 05:41:50 Socorro Processor (pid 31780): 2012-12-16 05:41:50,673 CRITICAL - Thread-21 - trace back follows:
Dec 16 05:41:50 Socorro Processor (pid 31780): 2012-12-16 05:41:50,673 CRITICAL - Thread-21 - Traceback (most recent call last):
Dec 16 05:41:50 Socorro Processor (pid 31780): 2012-12-16 05:41:50,673 CRITICAL - Thread-21 - File "/data/socorro/application/socorro/processor/processor.py", line 526, in processJob#012    threadLocalCursor.execute("update jobs set starteddatetime = %s where id = %s", (startedDateTime, jobId))
Depends on: 822106
It looks like the aggregate reports for yesterday are low, I guess we need to backfill to make sure we fill in the gaps and get everything represented in the aggregates.
actually just backfilling will not resolve the problem.  We'll have do some reprocessing, too.  

During the trouble on 2012-12-16 (Beethoven's birthday), the monitor suffered a bad failure.  During the seven hours of apoplexy, the monitor was consuming the crash input queue but failing to pass the crashes on to processing.  By exploring the logs, I find that there were 162323 failed database inserts.  That translates to 162323 crashes that should have been processed that were not.  

we'll have to send these through for processing today.  I'll prepare a list and file a bug for processing.
FYI, Lars filed bug 822275 for the reprocessing and backfill.
This should be resolved.  Scoobidiver, Kairo, please confirm.
It looks to me like everything is back in order. Scoobidiver, can you confirm?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #6)
> It looks to me like everything is back in order. Scoobidiver, can you
> confirm?
Yes. Missing crashes are back.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.