Closed Bug 643201 Opened 13 years ago Closed 13 years ago

Some Fennec 4.0b5 crash reports have still not been reprocessed

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: jdm, Assigned: rhelmer)

References

Details

Attachments

(2 files)

order by date_processed when fetching crashes to-be-fixed 13 years ago Robert Helmer [:rhelmer] 707 bytes, patch		Details \| Diff \| Splinter Review
store last_date_processed last 13 years ago Robert Helmer [:rhelmer] 1.34 KB, patch		Details \| Diff \| Splinter Review

Josh Matthews [:jdm]

Reporter

Description

•

13 years ago

4.0b5 crashes reports are still filled with numerous nonsensical crashes that require reprocessing.  Take, for example, https://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2011-03-19%2016%3A00%3A00&signature=nsTableFrame%3A%3AInsertCol&version=Fennec%3A4.0b5, which contains crashes going back to March 7 which look like prime fodder for reprocessing.  Is the cron job still working correctly?  The majority of named signatures in the 4.0b5 crashes (ie. not libc.so or libdvm.so crash) are completely bogus, which makes triaging them quite difficult.

Robert Helmer [:rhelmer]

Assignee

Comment 1

•

13 years ago

(In reply to comment #0)
> 4.0b5 crashes reports are still filled with numerous nonsensical crashes that
> require reprocessing.  Take, for example,
> https://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2011-03-19%2016%3A00%3A00&signature=nsTableFrame%3A%3AInsertCol&version=Fennec%3A4.0b5,
> which contains crashes going back to March 7 which look like prime fodder for
> reprocessing.  Is the cron job still working correctly?  The majority of named
> signatures in the 4.0b5 crashes (ie. not libc.so or libdvm.so crash) are
> completely bogus, which makes triaging them quite difficult.

I just checked and it looks like the cron job is running correctly (according to the logs), I will dig deeper tomorrow.

Assignee: nobody → rhelmer

Priority: -- → P1

Mike Hommey [:glandium]

Comment 2

•

13 years ago

Looking at the first one in the list
https://crash-stats.mozilla.com/report/index/d68f1c02-5d79-419a-96d4-179502110319

I see this one has not been fixed. If specifically applying the fix on this crash doesn't work, maybe some of the assumptions of the fix don't work here, in which case it'd be helpful to get the corresponding minidump.

Robert Helmer [:rhelmer]

Assignee

Comment 3

•

13 years ago

(In reply to comment #2)
> Looking at the first one in the list
> https://crash-stats.mozilla.com/report/index/d68f1c02-5d79-419a-96d4-179502110319
> 
> I see this one has not been fixed. If specifically applying the fix on this
> crash doesn't work, maybe some of the assumptions of the fix don't work here,
> in which case it'd be helpful to get the corresponding minidump.

From the cron logs, I don't think the fix was applied, and from the processor notes I don't think these were submitted for re-processing.

I've been going through the logic for the "fixBrokenDumps" cron job, and don't see yet how we could be missing these:

--
t1 = last_processed_date
"""
SELECT uuid,date_processed FROM reports WHERE product = 'Fennec'
  AND version = '4.0b5'
  AND date_processed > t1
  AND date_processed < (now() - INTERVAL '30 minutes')
"""
for each row:
  fix crash dump
  re-insert into hbase
  mark for re-processing
  update last_processed_date
save last_processed_date
--

Looking at the logs, I don't see d68f1c02-5d79-419a-96d4-179502110319 (each step described above is logged).

I am going to add a bit more debug logging and get that into prod, since I don't have enough info now to reconstruct the query after the fact (specifically the value of last_date_processed).

Status: NEW → ASSIGNED

Robert Helmer [:rhelmer]

Assignee

Comment 4

•

13 years ago

Added debug statements for the "update last_processed_date" and "save last_processed_date":

Committed revision 3012.

Filed bug 643483 to get that in production.

Continuing to go over the logic here in the meantime.

Robert Helmer [:rhelmer]

Assignee

Comment 5

•

13 years ago

One thing I have noticed:

(In reply to comment #3)
> """
> SELECT uuid,date_processed FROM reports WHERE product = 'Fennec'
>   AND version = '4.0b5'
>   AND date_processed > t1
>   AND date_processed < (now() - INTERVAL '30 minutes')
> """

should probably be using "ORDER BY date_processed", since it looks like that table isn't perfectly ordered in the natural ordered. However, if anything this would cause needless re-processing, it shouldn't cause any records to be skipped.

Robert Helmer [:rhelmer]

Assignee

Comment 6

•

13 years ago

Digging into one day (2011-03-19), I can see that we logged fixing 60 crashes, and there are 60 records which contain the string 'replacement' in the processor_notes field. 

However, I see 1266 for that day which *do not* contain that string.

This leads me to suspect that the the query isn't returning what we are expecting (perhaps related to the time at which it's being executed?).

Robert Helmer [:rhelmer]

Assignee

Comment 7

•

13 years ago

Attached patch order by date_processed when fetching crashes to-be-fixed — Details — Splinter Review

Committed revision 3014.

Robert Helmer [:rhelmer]

Assignee

Comment 8

•

13 years ago

Attached patch store last_date_processed last — Details — Splinter Review

Here's the problem - each call to the fixBrokenDumps.fix() stores the last_date_processed in the persistent file, and we call this twice (first for Firefox Linux then for Fennec), so Fennec gets a much-too-recent last_date_processed.

Committed revision 3015.

I've tested this (read-only) against production, and it now matches what I expect when running the SQL queries by hand.

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Depends on: 643594

Robert Helmer [:rhelmer]

Assignee

Comment 9

•

13 years ago

Filed bug 643594 to get these corrections into production.

I'll schedule a time to rebuild the top crashers list table and correct the Fennec crashes since the cron job went live (everything before that should be ok, the bug here is only in how we store the date for hourly cron purposes).

No longer depends on: 643594

Robert Helmer [:rhelmer]

Assignee

Comment 10

•

13 years ago

Incoming crashes are now being fixed correctly (bug 643594, I'll do some further testing verify it tomorrow), and we're working on scheduling a time to reprocess the window where we were missing most Fennec crashes (2011-03-07 through 2011-03-21) in bug 643599.

This will require some downtime to rebuild the top crashers list. We don't want to do that tonight since we're shipping Fx4 tomorrow, but I'll make sure it gets done as soon as is feasible.

Robert Helmer [:rhelmer]

Assignee

Updated

•

13 years ago

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Depends on: 643599

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

13 years ago

Component: Socorro → General

Product: Webtools → Socorro

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Some Fennec 4.0b5 crash reports have still not been reprocessed

Categories

(Socorro :: General, task, P1)

Tracking

(Not tracked)

People

(Reporter: jdm, Assigned: rhelmer)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Updated

Updated

Attachment

General

Description

File Name

Content Type