Closed
Bug 643201
Opened 13 years ago
Closed 13 years ago
Some Fennec 4.0b5 crash reports have still not been reprocessed
Categories
(Socorro :: General, task, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jdm, Assigned: rhelmer)
References
Details
Attachments
(2 files)
707 bytes,
patch
|
Details | Diff | Splinter Review | |
1.34 KB,
patch
|
Details | Diff | Splinter Review |
4.0b5 crashes reports are still filled with numerous nonsensical crashes that require reprocessing. Take, for example, https://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2011-03-19%2016%3A00%3A00&signature=nsTableFrame%3A%3AInsertCol&version=Fennec%3A4.0b5, which contains crashes going back to March 7 which look like prime fodder for reprocessing. Is the cron job still working correctly? The majority of named signatures in the 4.0b5 crashes (ie. not libc.so or libdvm.so crash) are completely bogus, which makes triaging them quite difficult.
Assignee | ||
Comment 1•13 years ago
|
||
(In reply to comment #0) > 4.0b5 crashes reports are still filled with numerous nonsensical crashes that > require reprocessing. Take, for example, > https://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2011-03-19%2016%3A00%3A00&signature=nsTableFrame%3A%3AInsertCol&version=Fennec%3A4.0b5, > which contains crashes going back to March 7 which look like prime fodder for > reprocessing. Is the cron job still working correctly? The majority of named > signatures in the 4.0b5 crashes (ie. not libc.so or libdvm.so crash) are > completely bogus, which makes triaging them quite difficult. I just checked and it looks like the cron job is running correctly (according to the logs), I will dig deeper tomorrow.
Assignee: nobody → rhelmer
Priority: -- → P1
Comment 2•13 years ago
|
||
Looking at the first one in the list https://crash-stats.mozilla.com/report/index/d68f1c02-5d79-419a-96d4-179502110319 I see this one has not been fixed. If specifically applying the fix on this crash doesn't work, maybe some of the assumptions of the fix don't work here, in which case it'd be helpful to get the corresponding minidump.
Assignee | ||
Comment 3•13 years ago
|
||
(In reply to comment #2) > Looking at the first one in the list > https://crash-stats.mozilla.com/report/index/d68f1c02-5d79-419a-96d4-179502110319 > > I see this one has not been fixed. If specifically applying the fix on this > crash doesn't work, maybe some of the assumptions of the fix don't work here, > in which case it'd be helpful to get the corresponding minidump. From the cron logs, I don't think the fix was applied, and from the processor notes I don't think these were submitted for re-processing. I've been going through the logic for the "fixBrokenDumps" cron job, and don't see yet how we could be missing these: -- t1 = last_processed_date """ SELECT uuid,date_processed FROM reports WHERE product = 'Fennec' AND version = '4.0b5' AND date_processed > t1 AND date_processed < (now() - INTERVAL '30 minutes') """ for each row: fix crash dump re-insert into hbase mark for re-processing update last_processed_date save last_processed_date -- Looking at the logs, I don't see d68f1c02-5d79-419a-96d4-179502110319 (each step described above is logged). I am going to add a bit more debug logging and get that into prod, since I don't have enough info now to reconstruct the query after the fact (specifically the value of last_date_processed).
Status: NEW → ASSIGNED
Assignee | ||
Comment 4•13 years ago
|
||
Added debug statements for the "update last_processed_date" and "save last_processed_date": Committed revision 3012. Filed bug 643483 to get that in production. Continuing to go over the logic here in the meantime.
Assignee | ||
Comment 5•13 years ago
|
||
One thing I have noticed: (In reply to comment #3) > """ > SELECT uuid,date_processed FROM reports WHERE product = 'Fennec' > AND version = '4.0b5' > AND date_processed > t1 > AND date_processed < (now() - INTERVAL '30 minutes') > """ should probably be using "ORDER BY date_processed", since it looks like that table isn't perfectly ordered in the natural ordered. However, if anything this would cause needless re-processing, it shouldn't cause any records to be skipped.
Assignee | ||
Comment 6•13 years ago
|
||
Digging into one day (2011-03-19), I can see that we logged fixing 60 crashes, and there are 60 records which contain the string 'replacement' in the processor_notes field. However, I see 1266 for that day which *do not* contain that string. This leads me to suspect that the the query isn't returning what we are expecting (perhaps related to the time at which it's being executed?).
Assignee | ||
Comment 7•13 years ago
|
||
Committed revision 3014.
Assignee | ||
Comment 8•13 years ago
|
||
Here's the problem - each call to the fixBrokenDumps.fix() stores the last_date_processed in the persistent file, and we call this twice (first for Firefox Linux then for Fennec), so Fennec gets a much-too-recent last_date_processed. Committed revision 3015. I've tested this (read-only) against production, and it now matches what I expect when running the SQL queries by hand.
Assignee | ||
Comment 9•13 years ago
|
||
Filed bug 643594 to get these corrections into production. I'll schedule a time to rebuild the top crashers list table and correct the Fennec crashes since the cron job went live (everything before that should be ok, the bug here is only in how we store the date for hourly cron purposes).
No longer depends on: 643594
Assignee | ||
Comment 10•13 years ago
|
||
Incoming crashes are now being fixed correctly (bug 643594, I'll do some further testing verify it tomorrow), and we're working on scheduling a time to reprocess the window where we were missing most Fennec crashes (2011-03-07 through 2011-03-21) in bug 643599. This will require some downtime to rebuild the top crashers list. We don't want to do that tonight since we're shipping Fx4 tomorrow, but I'll make sure it gets done as soon as is feasible.
Assignee | ||
Updated•13 years ago
|
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•