Some Fennec 4.0b5 crash reports have still not been reprocessed

RESOLVED FIXED

Status

P1
normal
RESOLVED FIXED
8 years ago
7 years ago

People

(Reporter: jdm, Assigned: rhelmer)

Tracking

Trunk
x86
Linux

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

8 years ago
4.0b5 crashes reports are still filled with numerous nonsensical crashes that require reprocessing.  Take, for example, https://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2011-03-19%2016%3A00%3A00&signature=nsTableFrame%3A%3AInsertCol&version=Fennec%3A4.0b5, which contains crashes going back to March 7 which look like prime fodder for reprocessing.  Is the cron job still working correctly?  The majority of named signatures in the 4.0b5 crashes (ie. not libc.so or libdvm.so crash) are completely bogus, which makes triaging them quite difficult.
(Assignee)

Comment 1

8 years ago
(In reply to comment #0)
> 4.0b5 crashes reports are still filled with numerous nonsensical crashes that
> require reprocessing.  Take, for example,
> https://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2011-03-19%2016%3A00%3A00&signature=nsTableFrame%3A%3AInsertCol&version=Fennec%3A4.0b5,
> which contains crashes going back to March 7 which look like prime fodder for
> reprocessing.  Is the cron job still working correctly?  The majority of named
> signatures in the 4.0b5 crashes (ie. not libc.so or libdvm.so crash) are
> completely bogus, which makes triaging them quite difficult.

I just checked and it looks like the cron job is running correctly (according to the logs), I will dig deeper tomorrow.
Assignee: nobody → rhelmer
Priority: -- → P1
Looking at the first one in the list
https://crash-stats.mozilla.com/report/index/d68f1c02-5d79-419a-96d4-179502110319

I see this one has not been fixed. If specifically applying the fix on this crash doesn't work, maybe some of the assumptions of the fix don't work here, in which case it'd be helpful to get the corresponding minidump.
(Assignee)

Comment 3

8 years ago
(In reply to comment #2)
> Looking at the first one in the list
> https://crash-stats.mozilla.com/report/index/d68f1c02-5d79-419a-96d4-179502110319
> 
> I see this one has not been fixed. If specifically applying the fix on this
> crash doesn't work, maybe some of the assumptions of the fix don't work here,
> in which case it'd be helpful to get the corresponding minidump.

From the cron logs, I don't think the fix was applied, and from the processor notes I don't think these were submitted for re-processing.

I've been going through the logic for the "fixBrokenDumps" cron job, and don't see yet how we could be missing these:

--
t1 = last_processed_date
"""
SELECT uuid,date_processed FROM reports WHERE product = 'Fennec'
  AND version = '4.0b5'
  AND date_processed > t1
  AND date_processed < (now() - INTERVAL '30 minutes')
"""
for each row:
  fix crash dump
  re-insert into hbase
  mark for re-processing
  update last_processed_date
save last_processed_date
--

Looking at the logs, I don't see d68f1c02-5d79-419a-96d4-179502110319 (each step described above is logged).

I am going to add a bit more debug logging and get that into prod, since I don't have enough info now to reconstruct the query after the fact (specifically the value of last_date_processed).
Status: NEW → ASSIGNED
(Assignee)

Comment 4

8 years ago
Added debug statements for the "update last_processed_date" and "save last_processed_date":

Committed revision 3012.

Filed bug 643483 to get that in production.

Continuing to go over the logic here in the meantime.
(Assignee)

Comment 5

8 years ago
One thing I have noticed:

(In reply to comment #3)
> """
> SELECT uuid,date_processed FROM reports WHERE product = 'Fennec'
>   AND version = '4.0b5'
>   AND date_processed > t1
>   AND date_processed < (now() - INTERVAL '30 minutes')
> """

should probably be using "ORDER BY date_processed", since it looks like that table isn't perfectly ordered in the natural ordered. However, if anything this would cause needless re-processing, it shouldn't cause any records to be skipped.
(Assignee)

Comment 6

8 years ago
Digging into one day (2011-03-19), I can see that we logged fixing 60 crashes, and there are 60 records which contain the string 'replacement' in the processor_notes field. 

However, I see 1266 for that day which *do not* contain that string.

This leads me to suspect that the the query isn't returning what we are expecting (perhaps related to the time at which it's being executed?).
(Assignee)

Comment 7

8 years ago
Created attachment 520756 [details] [diff] [review]
order by date_processed when fetching crashes to-be-fixed

Committed revision 3014.
(Assignee)

Comment 8

8 years ago
Created attachment 520773 [details] [diff] [review]
store last_date_processed last

Here's the problem - each call to the fixBrokenDumps.fix() stores the last_date_processed in the persistent file, and we call this twice (first for Firefox Linux then for Fennec), so Fennec gets a much-too-recent last_date_processed.

Committed revision 3015.

I've tested this (read-only) against production, and it now matches what I expect when running the SQL queries by hand.
(Assignee)

Updated

8 years ago
Depends on: 643594
(Assignee)

Comment 9

8 years ago
Filed bug 643594 to get these corrections into production.

I'll schedule a time to rebuild the top crashers list table and correct the Fennec crashes since the cron job went live (everything before that should be ok, the bug here is only in how we store the date for hourly cron purposes).
No longer depends on: 643594
(Assignee)

Comment 10

8 years ago
Incoming crashes are now being fixed correctly (bug 643594, I'll do some further testing verify it tomorrow), and we're working on scheduling a time to reprocess the window where we were missing most Fennec crashes (2011-03-07 through 2011-03-21) in bug 643599.

This will require some downtime to rebuild the top crashers list. We don't want to do that tonight since we're shipping Fx4 tomorrow, but I'll make sure it gets done as soon as is feasible.
(Assignee)

Updated

8 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Depends on: 643599
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.