reprocess crashes from socorro regression

RESOLVED FIXED

Status

RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: lars, Assigned: selenamarie)

Tracking

Details

(Reporter)

Description

6 years ago
A Socorros regress has caused a lot of badly processed crashes.    We need all uuids since 7 pm yesterday to be reprocessed.
query run successfully: 

breakpad=# insert into priorityjobs (uuid) select uuid from reports_20121105 where date_processed > '2012-11-08 19:07:46'; 
INSERT 0 383328
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(In reply to Selena Deckelmann :selena from comment #1)
> query run successfully: 
Are you sure because I still see those hang signatures: https://crash-stats.mozilla.com/query/query?product=Firefox&version=Firefox%3A19.0a1&query_search=signature&query_type=contains&build_id=20121108030652&do_query=1 ?
(In reply to Scoobidiver from comment #2)
> (In reply to Selena Deckelmann :selena from comment #1)
> > query run successfully: 
> Are you sure because I still see those hang signatures:
> https://crash-stats.mozilla.com/query/
> query?product=Firefox&version=Firefox%3A19.
> 0a1&query_search=signature&query_type=contains&build_id=20121108030652&do_que
> ry=1 ?

Sorry - this bug was just to make the change to the database. The processors check the database table for jobs to do, and that query set up all the jobs. 

We're tracking the overall reprocessing in bug 810241. 

Also, I am monitoring the job queue, and jobs are being processed at a rate of about 50/second, and we should be complete by 9:30am or so. After that, I will start the backfill and update the other bug with the status. Backfill typically takes about an hour, load-willing.

Comment 4

6 years ago
Hmm, the aggregate data for 11/08 still has the buggy signatures, has the backfill not been done (correctly)?
Can you be a bit more specific?(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #4)
> Hmm, the aggregate data for 11/08 still has the buggy signatures, has the
> backfill not been done (correctly)?

Can you be a bit more specific?
(In reply to Selena Deckelmann :selena from comment #5)
> Can you be a bit more specific?
See the browser crashes with a crash signature starting with hang: https://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&range_value=1&query_search=signature&query_type=startswith&query=hang&process_type=browser&hang_type=crash&do_query=1
(Reporter)

Comment 7

6 years ago
this is some sort of caching or database replication problem.  The reports appear to be correct in the database, but they're not showing up properly in the UI.  For example, a random selection of a crash inappropriately marked as a hang from last Friday:

UI signature:       hang | nsDiskCacheStreamIO::FlushBufferToFile()
database signature: nsDiskCacheStreamIO::FlushBufferToFile()

UI processor_notes:       <blank>
database processor_notes: INFO: This record is a replacement for a previous record with the same uuid

does the UI read from master01 or the slave database, master02?  If it reads from the master02 is replication broken?
(Reporter)

Comment 8

6 years ago
disregard comment #7, I've found counter examples that disprove my thesis.
(Reporter)

Comment 9

6 years ago
I've resolved the issue.  19K crashes escaped reprocessing (Bug 810646) and therefore retained the faulty "hang" signature.  I've forced them all to reprocess.  The crashes now bear the correct signatures.

Comment 10

6 years ago
(In reply to K Lars Lohn [:lars] [:klohn] from comment #9)
> I've resolved the issue.  19K crashes escaped reprocessing (Bug 810646) and
> therefore retained the faulty "hang" signature.  I've forced them all to
> reprocess.  The crashes now bear the correct signatures.

OK, that means that we need to backfill 11/08 yet again for this. Selena, can you do that?
I still see crashes as hangs in topcrasher: https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/16.0.2 for instance hang | nsIDOMHTMLDocument_Write.
(Reporter)

Comment 12

6 years ago
the aggregate reports for the affected time period have not yet been rerun.  If you click on the "hang | nsIDOMHTMLDocument_Write", which starts a search, you'll find that there aren't any crashes with that signature.  Comment #10 has the request to Selena to regenerate the materialized views.  That ought to correct the report that you point out.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #10)
> (In reply to K Lars Lohn [:lars] [:klohn] from comment #9)
> > I've resolved the issue.  19K crashes escaped reprocessing (Bug 810646) and
> > therefore retained the faulty "hang" signature.  I've forced them all to
> > reprocess.  The crashes now bear the correct signatures.
> 
> OK, that means that we need to backfill 11/08 yet again for this. Selena,
> can you do that?

Running this now. Thank you for looking into it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Oops - Sheeri is taking care of this in https://bugzilla.mozilla.org/show_bug.cgi?id=810941
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.