464775 - want queries/reports to help detect crashes that are caused by plugins / binary extensions

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Description

•

16 years ago

It's quite common for top crashes to be caused by particular plugins or binary components in extensions. However, it requires quite a bit of manual labor to demonstrate that this is the case. The following two features would make it much easier to detect this situation when looking at the crash data for a particular topcrash. (1) would let us see which modules (which are what the crash reporter calls shared libraries) are a likely cause of a particular crash, and (2) would let us see which other crashes those same modules were causing. (1) is more important than (2). (1) For each topcrash, we should have a single page that lists all the modules found in the modules list for any occurrence of that crash, listed with: a. the name of the module b. the operating system (Windows, Mac, Linux, Solaris) c. the percentage of crashes for *this* crash stack signature in which the module occurs (i.e., the number of crashes with this module divided by the total number of occurrences of this crash on the operating system in (b)) d. the percentage of all crashes in which this module occurs (same, but for all crashes, not just this stack signature) (Both the name and the operating system should be considered to uniquely identify the module; it's pretty unlikely for the same module name to show up across systems, but if it does, they should be treated as separate modules.) This list would probably be most useful if it is sorted by (c) - (d), so the modules where (c) is much larger than (d) show up first. It could be even more useful if there could be a twisty next to the name of the module so that you can split it out by the "version" field in that module as well (on Windows, where that version exists, anyway), in case the problem is only with particular versions of the module (i.e., with (c) and (d) calculated only for that version of that module). (2) For each module, and perhaps each version of each module, we should have a similar page (linked from the lists in (1)) that lists: (a) each stack signature in which this module shows up in one of the stack traces (b) the percentage of the occurrences of that stack signature in which this module shows up (c) the percentage of all crashes in which this module shows up sorted probably by (c) divided by (b) (although that might bubble statistically insignificant stuff up to the top, in which case (c) minus (b) might work).

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

16 years ago

Summary: want queries to help detect crashes that are caused by plugins / binary extensions → want queries/reports to help detect crashes that are caused by plugins / binary extensions

(not currently active) Ted Mielczarek

Comment 1

•

16 years ago

Related: bug 423968, bug 439679

Michael Morgan [:morgamic]

Updated

•

15 years ago

Whiteboard: cloud

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 2

•

15 years ago

per-crash-interesting-modules in http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/ does item (1) above; it has a few different command line options for variants

Michael Morgan [:morgamic]

Updated

•

15 years ago

Assignee: nobody → ozten.bugs

Depends on: 521917

Target Milestone: --- → 1.2

Michael Morgan [:morgamic]

Updated

•

15 years ago

Depends on: 525785
No longer depends on: 521917

Austin King [:ozten]

Comment 3

•

15 years ago

Created A design document for review: http://code.google.com/p/socorro/wiki/ModuelCorrelationBySignatureDesign UI Data reference: http://dbaron.org/log/20090922-crashes-data/interesting-modules-windows-versions

Austin King [:ozten]

Comment 4

•

15 years ago

(In reply to comment #3) per timeless, renaming wiki page. Please review http://code.google.com/p/socorro/wiki/ModuleCorrelationBySignatureDesign

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 5

•

15 years ago

I can't tell from that design doc what sort of calculations you're planning to do to compute the most important correlations, but the little that's there makes it sound substantially different from what's in comment 0 or in http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/ . Are the differences intentional, or is the computation that you want to do just not fleshed out yet?

Austin King [:ozten]

Comment 6

•

15 years ago

(In reply to comment #5) This is accidental. What specifically sounds different? The goal of the design is looking at porting your work to our existing system. Main differences are that crashes are analyzed in slices and aggregates are stored in a database table, to later be processed and displayed by the frontend. What aspects are different?

Austin King [:ozten]

Comment 7

•

15 years ago

(In reply to comment #6) Sorry for the redundant question...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 8

•

15 years ago

So after reading it again it seems a little closer than I thought the first time. The part that threw me was this paragraph: > "For each signature this column contains a link with the highest correlated > module." ... "NPSWF32.dll would be chosen because it had the highest > correlation for this signature. 72% is the frequency in which this module > appeared. " I don't think showing the number 72% here is a particularly useful measure, unless you're doing analysis to show what percentage of the crashes are *explained* by the correlation. I haven't yet done that, and it's probably a necessary step to sorting the correlations well (which would also be needed to pick out the most important one to show).

Austin King [:ozten]

Comment 9

•

15 years ago

(In reply to comment #8) Okay, this sounds good. The link text is a "guess" at the most interesting module. I could make the text "module correlation" or something static instead. The design doc wasn't to this level of detail, but I was going to base that on the module/os combo with the biggest value for % occurrence for this signature - % occurrence across crashes Example: Assuming the oleacc.dll module had the Windows NT nsCycleCollectingAutoRefCnt::decr(nsISupports*) (118 crashes) 78% (92/118) vs. 31% (2167/7100) oleacc.dll It's value would be (92/118) - (2167/7100) =~ 0.47 oleacc.dll would beat shdocvw.dll 93% (110/118) vs. 80% (5667/7100) shdocvw.dll which would have the value =~ 0.13 This piece would happen in the UI and the details can be changed as we get more feedback.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 10

•

15 years ago

One other comment is that I hope you're planning to do the addon correlation reports too; addon correlation has turned out to be significantly more useful than module correlation (mainly because it's much less noisy).

Austin King [:ozten]

Updated

•

15 years ago

Assignee: ozten.bugs → griswolf

Target Milestone: 1.2 → 1.3

Michael Morgan [:morgamic]

Updated

•

15 years ago

Assignee: griswolf → deinspanjer

Target Milestone: 1.3 → 1.4

Michael Morgan [:morgamic]

Comment 11

•

15 years ago

Daniel - we discussed this and this should be the first item for the hdfs/hadoop cluster. Since the database doesn't store module information we don't have a data set for this and crawling all crashes on our current setup is not feasible.

Target Milestone: 1.4 → 1.5

Michael Morgan [:morgamic]

Updated

•

15 years ago

No longer depends on: 525785

Daniel Einspanjer [:dre] [:deinspanjer]

Assignee

Comment 12

•

15 years ago

Should this bug or bug 470827 have priority for development? Blocked on getting data into Hadoop for dev

Depends on: 538206, 542855

K Lars Lohn [:lars] [:klohn]

Updated

•

15 years ago

Target Milestone: 1.5 → ---

Nobody; OK to take it and work on it

Updated

•

13 years ago

Component: Socorro → General

Product: Webtools → Socorro

Laura Thomson :laura

Updated

•

13 years ago

Depends on: 656297

Laura Thomson :laura

Comment 13

•

13 years ago

Is there something in this that isn't covered by correlation reports/DLL directory? Let me know or I'll wontfix.

Robert Kaiser

Comment 14

•

13 years ago

(In reply to Laura Thomson :laura from comment #13) > Is there something in this that isn't covered by correlation reports/DLL > directory? I think that (1) in comment #0 is basically covered by the correlation reports, (2) would be something we can do based on the correlation data once it's fully in Socorro and searchable in a DB.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 15

•

13 years ago

Yeah, we don't have item (2) in comment 0, but I'm no longer convinced it's useful; I'd rather have something like what I blogged about in http://dbaron.org/log/20101111-crash-future

Robert Kaiser

Comment 16

•

13 years ago

(In reply to David Baron [:dbaron] from comment #15) > Yeah, we don't have item (2) in comment 0, but I'm no longer convinced it's > useful; I'd rather have something like what I blogged about in > http://dbaron.org/log/20101111-crash-future Sounds like we should close this bug and files others instead. Can we try to figure out concrete steps on what to improve and get those filed? Maybe we should do a session on this during the upcoming stability work week and figure out those concrete steps there?

Benjamin Smedberg

Updated

•

12 years ago

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → INCOMPLETE