Closed Bug 464775 Opened 16 years ago Closed 11 years ago

want queries/reports to help detect crashes that are caused by plugins / binary extensions

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: dbaron, Assigned: dre)

References

Details

(Whiteboard: cloud)

It's quite common for top crashes to be caused by particular plugins or binary components in extensions.  However, it requires quite a bit of manual labor to demonstrate that this is the case.

The following two features would make it much easier to detect this situation when looking at the crash data for a particular topcrash.  (1) would let us see which modules (which are what the crash reporter calls shared libraries) are a likely cause of a particular crash, and (2) would let us see which other crashes those same modules were causing.  (1) is more important than (2).

(1) For each topcrash, we should have a single page that lists all the modules found in the modules list for any occurrence of that crash, listed with:
 a. the name of the module
 b. the operating system (Windows, Mac, Linux, Solaris)
 c. the percentage of crashes for *this* crash stack signature in which the module occurs (i.e., the number of crashes with this module divided by the total number of occurrences of this crash on the operating system in (b))
 d. the percentage of all crashes in which this module occurs (same, but for all crashes, not just this stack signature)
(Both the name and the operating system should be considered to uniquely identify the module; it's pretty unlikely for the same module name to show up across systems, but if it does, they should be treated as separate modules.)
This list would probably be most useful if it is sorted by (c) - (d), so the modules where (c) is much larger than (d) show up first.

It could be even more useful if there could be a twisty next to the name of the module so that you can split it out by the "version" field in that module as well (on Windows, where that version exists, anyway), in case the problem is only with particular versions of the module (i.e., with (c) and (d) calculated only for that version of that module).


(2) For each module, and perhaps each version of each module, we should have a similar page (linked from the lists in (1)) that lists:
 (a) each stack signature in which this module shows up in one of the stack traces
 (b) the percentage of the occurrences of that stack signature in which this module shows up
 (c) the percentage of all crashes in which this module shows up
sorted probably by (c) divided by (b) (although that might bubble statistically insignificant stuff up to the top, in which case (c) minus (b) might work).
Summary: want queries to help detect crashes that are caused by plugins / binary extensions → want queries/reports to help detect crashes that are caused by plugins / binary extensions
Whiteboard: cloud
per-crash-interesting-modules in
http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/
does item (1) above; it has a few different command line options for variants
Assignee: nobody → ozten.bugs
Depends on: 521917
Target Milestone: --- → 1.2
Depends on: 525785
No longer depends on: 521917
(In reply to comment #3)
per timeless, renaming wiki page. Please review 
http://code.google.com/p/socorro/wiki/ModuleCorrelationBySignatureDesign
I can't tell from that design doc what sort of calculations you're planning to do to compute the most important correlations, but the little that's there makes it sound substantially different from what's in comment 0 or in http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/ .  Are the differences intentional, or is the computation that you want to do just not fleshed out yet?
(In reply to comment #5)
This is accidental. What specifically sounds different? 

The goal of the design is looking at porting your work to our existing system. Main differences are that crashes are analyzed in slices and aggregates are stored in a database table, to later be processed and displayed by the frontend.

What aspects are different?
(In reply to comment #6)
Sorry for the redundant question...
So after reading it again it seems a little closer than I thought the first time.  The part that threw me was this paragraph:

> "For each signature this column contains a link with the highest correlated
> module." ... "NPSWF32.dll would be chosen because it had the highest
> correlation for this signature. 72% is the frequency in which this module
> appeared. "

I don't think showing the number 72% here is a particularly useful measure, unless you're doing analysis to show what percentage of the crashes are *explained* by the correlation.  I haven't yet done that, and it's probably a necessary step to sorting the correlations well (which would also be needed to pick out the most important one to show).
(In reply to comment #8)
Okay, this sounds good. The link text is a "guess" at the most interesting module. I could make the text "module correlation" or something static instead.

The design doc wasn't to this level of detail, but I was going to base that on the module/os combo with the biggest value for
% occurrence for this signature - % occurrence across crashes

Example: Assuming the oleacc.dll module had the 
Windows NT
  nsCycleCollectingAutoRefCnt::decr(nsISupports*) (118 crashes)
     78% (92/118) vs.  31% (2167/7100) oleacc.dll

It's value would be (92/118) - (2167/7100) =~ 0.47

oleacc.dll would beat shdocvw.dll
93% (110/118) vs.  80% (5667/7100) shdocvw.dll
which would have the value =~ 0.13

This piece would happen in the UI and the details can be changed as we get more feedback.
One other comment is that I hope you're planning to do the addon correlation reports too; addon correlation has turned out to be significantly more useful than module correlation (mainly because it's much less noisy).
Assignee: ozten.bugs → griswolf
Target Milestone: 1.2 → 1.3
Assignee: griswolf → deinspanjer
Target Milestone: 1.3 → 1.4
Daniel - we discussed this and this should be the first item for the hdfs/hadoop cluster.  Since the database doesn't store module information we don't have a data set for this and crawling all crashes on our current setup is not feasible.
Target Milestone: 1.4 → 1.5
No longer depends on: 525785
Should this bug or bug 470827 have priority for development?

Blocked on getting data into Hadoop for dev
Depends on: 538206, 542855
Target Milestone: 1.5 → ---
Component: Socorro → General
Product: Webtools → Socorro
Depends on: 656297
Is there something in this that isn't covered by correlation reports/DLL directory?  Let me know or I'll wontfix.
(In reply to Laura Thomson :laura from comment #13)
> Is there something in this that isn't covered by correlation reports/DLL
> directory?

I think that (1) in comment #0 is basically covered by the correlation reports, (2) would be something we can do based on the correlation data once it's fully in Socorro and searchable in a DB.
Yeah, we don't have item (2) in comment 0, but I'm no longer convinced it's useful; I'd rather have something like what I blogged about in http://dbaron.org/log/20101111-crash-future
(In reply to David Baron [:dbaron] from comment #15)
> Yeah, we don't have item (2) in comment 0, but I'm no longer convinced it's
> useful; I'd rather have something like what I blogged about in
> http://dbaron.org/log/20101111-crash-future

Sounds like we should close this bug and files others instead. Can we try to figure out concrete steps on what to improve and get those filed? Maybe we should do a session on this during the upcoming stability work week and figure out those concrete steps there?
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.