Closed Bug 637680 Opened 14 years ago Closed 14 years ago

Get top crashers for Firefox and Fennec where crash-stats are broken (linux, android)

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: glandium, Assigned: rhelmer)

References

Details

Attachments

(9 files, 1 obsolete file)

Fixup Linux minidumps 14 years ago Mike Hommey [:glandium] 4.21 KB, text/plain		Details
Fixup Android minidumps 14 years ago Mike Hommey [:glandium] 4.63 KB, text/plain		Details
script dump fix and re-insertion 14 years ago Robert Helmer [:rhelmer] 6.57 KB, patch		Details \| Diff \| Splinter Review
script dump fix and re-insertion 14 years ago Robert Helmer [:rhelmer] 6.79 KB, patch	lars : review+	Details \| Diff \| Splinter Review
results of fennec dry-run 14 years ago Robert Helmer [:rhelmer] 394.01 KB, application/octet-stream		Details
fennec OOIDs modified 14 years ago Robert Helmer [:rhelmer] 651.88 KB, text/plain		Details
results of fennec run 14 years ago Robert Helmer [:rhelmer] 411.89 KB, application/octet-stream		Details
results of firefox_linux-dryrun 14 years ago Robert Helmer [:rhelmer] 395.76 KB, application/octet-stream		Details
firefox OOIDs modified 14 years ago Robert Helmer [:rhelmer] 615.33 KB, text/plain		Details
results of firefox linux run 14 years ago Robert Helmer [:rhelmer] 407.67 KB, application/octet-stream		Details

Mike Hommey [:glandium]

Reporter

Description

•

14 years ago

I'll attach the two programs that can be used to fixup minidumps.

Mike Hommey [:glandium]

Reporter

Comment 1

•

14 years ago

Attached file Fixup Linux minidumps — Details

Build with -I$(topsrcdir)/toolkit/toolkit/crashreporter/google-breakpad/src Just give a bunch of minidumps on the command line, and it will modify them in-place.

Mike Hommey [:glandium]

Reporter

Comment 2

•

14 years ago

Attached file Fixup Android minidumps — Details

(not currently active) Ted Mielczarek

Comment 3

•

14 years ago

The plan is to get these minidumps into a dev server (in bug 637678), where I'll run this tool on them, then we'll feed them into the Socorro staging server to generate topcrash lists.

Laura Thomson :laura

Comment 4

•

14 years ago

How many dumps are we talking? Could we: - run a MR to pull each busted dump, fix it, and replace it in hbase - insert all fixed dumps into the legacy processing queue This would get the data up on prod.

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 5

•

14 years ago

Actually, after chatting with laura a bit on IRC, here is what I would offer for your consideration: Create a Postgres query that can extract a list of submitted_timestamp that need to be fixed Create a simple Python script that can iterate over those ooids and talk to the hbaseClient object Call hbaseClient.get_dump(ooid) Shell exec the fixer program on the dump Insert the dump back into HBase using a subset of the code in hbaseClient.put_json_dump() Insert the ooid back into the legacy processing queue by calling hbaseClient.put_crash_report_indices(ooid,CurrentTimestamp,['crash_reports_index_legacy_unprocessed_flag']) Note that the current timestamp in the same format as what is used for submitted_timestamp should be used so that the entries to be reprocessed don't take priority over normal jobs. The end result of this job if it were run on a regular basis is that we would update the record in hbase with a fixed copy of the dump file (the old one would still be present but not visible to the normal Socorro system). The monitor would see these entries in the queue, and as long as it doesn't reject them as already having been processed, it would send them back through the system. There would be no load increase on the production HBase cluster to support this. If we attempted to do a map reduce job, then we'd have to tune and test that carefully to make sure it wouldn't mess things up. If this were tens of thousands of crashes per day then that might be worth it, but for a small volume, this should be a simple to implement solution.

Daniel Einspanjer [:dre] [:deinspanjer]

Comment 6

•

14 years ago

Sorry, at the beginning, the first step should read: Create a Postgres query that can extract a list of ooids that need to be fixed

Laura Thomson :laura

Comment 7

•

14 years ago

Ted, do you want us to go ahead?

(not currently active) Ted Mielczarek

Comment 8

•

14 years ago

Daniel's proposal sounds fine to me. Let me know how I can help make this happen.

Reed Loden [:reed]

Updated

•

14 years ago

Attachment #515926 - Attachment mime type: text/x-csrc → text/plain

Reed Loden [:reed]

Updated

•

14 years ago

Attachment #515927 - Attachment mime type: text/x-csrc → text/plain

Laura Thomson :laura

Comment 9

•

14 years ago

Rob: see comment 5 for the agreed procedure. We'll need two weeks worth of dumps to get decent topcrasher info, for the broken builds. You might need jberkus to run the query on prod PG for you for that part. The other part is hacking up some python to follow the above steps, and running that on prod. We really need to get this done today - it fell off the radar this week. Can you manage it?

Assignee: ted.mielczarek → rhelmer

Severity: normal → blocker