Closed
Bug 464775
Opened 16 years ago
Closed 12 years ago
want queries/reports to help detect crashes that are caused by plugins / binary extensions
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: dbaron, Assigned: dre)
References
Details
(Whiteboard: cloud)
It's quite common for top crashes to be caused by particular plugins or binary components in extensions. However, it requires quite a bit of manual labor to demonstrate that this is the case.
The following two features would make it much easier to detect this situation when looking at the crash data for a particular topcrash. (1) would let us see which modules (which are what the crash reporter calls shared libraries) are a likely cause of a particular crash, and (2) would let us see which other crashes those same modules were causing. (1) is more important than (2).
(1) For each topcrash, we should have a single page that lists all the modules found in the modules list for any occurrence of that crash, listed with:
a. the name of the module
b. the operating system (Windows, Mac, Linux, Solaris)
c. the percentage of crashes for *this* crash stack signature in which the module occurs (i.e., the number of crashes with this module divided by the total number of occurrences of this crash on the operating system in (b))
d. the percentage of all crashes in which this module occurs (same, but for all crashes, not just this stack signature)
(Both the name and the operating system should be considered to uniquely identify the module; it's pretty unlikely for the same module name to show up across systems, but if it does, they should be treated as separate modules.)
This list would probably be most useful if it is sorted by (c) - (d), so the modules where (c) is much larger than (d) show up first.
It could be even more useful if there could be a twisty next to the name of the module so that you can split it out by the "version" field in that module as well (on Windows, where that version exists, anyway), in case the problem is only with particular versions of the module (i.e., with (c) and (d) calculated only for that version of that module).
(2) For each module, and perhaps each version of each module, we should have a similar page (linked from the lists in (1)) that lists:
(a) each stack signature in which this module shows up in one of the stack traces
(b) the percentage of the occurrences of that stack signature in which this module shows up
(c) the percentage of all crashes in which this module shows up
sorted probably by (c) divided by (b) (although that might bubble statistically insignificant stuff up to the top, in which case (c) minus (b) might work).
Reporter | ||
Updated•16 years ago
|
Summary: want queries to help detect crashes that are caused by plugins / binary extensions → want queries/reports to help detect crashes that are caused by plugins / binary extensions
Comment 1•16 years ago
|
||
Related: bug 423968, bug 439679
Updated•15 years ago
|
Whiteboard: cloud
Reporter | ||
Comment 2•15 years ago
|
||
per-crash-interesting-modules in
http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/
does item (1) above; it has a few different command line options for variants
Updated•15 years ago
|
Updated•15 years ago
|
Comment 3•15 years ago
|
||
Created A design document for review:
http://code.google.com/p/socorro/wiki/ModuelCorrelationBySignatureDesign
UI Data reference:
http://dbaron.org/log/20090922-crashes-data/interesting-modules-windows-versions
Comment 4•15 years ago
|
||
(In reply to comment #3)
per timeless, renaming wiki page. Please review
http://code.google.com/p/socorro/wiki/ModuleCorrelationBySignatureDesign
Reporter | ||
Comment 5•15 years ago
|
||
I can't tell from that design doc what sort of calculations you're planning to do to compute the most important correlations, but the little that's there makes it sound substantially different from what's in comment 0 or in http://hg.mozilla.org/users/dbaron_mozilla.com/crash-data-tools/ . Are the differences intentional, or is the computation that you want to do just not fleshed out yet?
Comment 6•15 years ago
|
||
(In reply to comment #5)
This is accidental. What specifically sounds different?
The goal of the design is looking at porting your work to our existing system. Main differences are that crashes are analyzed in slices and aggregates are stored in a database table, to later be processed and displayed by the frontend.
What aspects are different?
Comment 7•15 years ago
|
||
(In reply to comment #6)
Sorry for the redundant question...
Reporter | ||
Comment 8•15 years ago
|
||
So after reading it again it seems a little closer than I thought the first time. The part that threw me was this paragraph:
> "For each signature this column contains a link with the highest correlated
> module." ... "NPSWF32.dll would be chosen because it had the highest
> correlation for this signature. 72% is the frequency in which this module
> appeared. "
I don't think showing the number 72% here is a particularly useful measure, unless you're doing analysis to show what percentage of the crashes are *explained* by the correlation. I haven't yet done that, and it's probably a necessary step to sorting the correlations well (which would also be needed to pick out the most important one to show).
Comment 9•15 years ago
|
||
(In reply to comment #8)
Okay, this sounds good. The link text is a "guess" at the most interesting module. I could make the text "module correlation" or something static instead.
The design doc wasn't to this level of detail, but I was going to base that on the module/os combo with the biggest value for
% occurrence for this signature - % occurrence across crashes
Example: Assuming the oleacc.dll module had the
Windows NT
nsCycleCollectingAutoRefCnt::decr(nsISupports*) (118 crashes)
78% (92/118) vs. 31% (2167/7100) oleacc.dll
It's value would be (92/118) - (2167/7100) =~ 0.47
oleacc.dll would beat shdocvw.dll
93% (110/118) vs. 80% (5667/7100) shdocvw.dll
which would have the value =~ 0.13
This piece would happen in the UI and the details can be changed as we get more feedback.
Reporter | ||
Comment 10•15 years ago
|
||
One other comment is that I hope you're planning to do the addon correlation reports too; addon correlation has turned out to be significantly more useful than module correlation (mainly because it's much less noisy).
Updated•15 years ago
|
Assignee: ozten.bugs → griswolf
Target Milestone: 1.2 → 1.3
Updated•15 years ago
|
Assignee: griswolf → deinspanjer
Target Milestone: 1.3 → 1.4
Comment 11•15 years ago
|
||
Daniel - we discussed this and this should be the first item for the hdfs/hadoop cluster. Since the database doesn't store module information we don't have a data set for this and crawling all crashes on our current setup is not feasible.
Target Milestone: 1.4 → 1.5
Assignee | ||
Comment 12•15 years ago
|
||
Should this bug or bug 470827 have priority for development?
Blocked on getting data into Hadoop for dev
Updated•15 years ago
|
Target Milestone: 1.5 → ---
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
Comment 13•13 years ago
|
||
Is there something in this that isn't covered by correlation reports/DLL directory? Let me know or I'll wontfix.
Comment 14•13 years ago
|
||
(In reply to Laura Thomson :laura from comment #13)
> Is there something in this that isn't covered by correlation reports/DLL
> directory?
I think that (1) in comment #0 is basically covered by the correlation reports, (2) would be something we can do based on the correlation data once it's fully in Socorro and searchable in a DB.
Reporter | ||
Comment 15•13 years ago
|
||
Yeah, we don't have item (2) in comment 0, but I'm no longer convinced it's useful; I'd rather have something like what I blogged about in http://dbaron.org/log/20101111-crash-future
Comment 16•13 years ago
|
||
(In reply to David Baron [:dbaron] from comment #15)
> Yeah, we don't have item (2) in comment 0, but I'm no longer convinced it's
> useful; I'd rather have something like what I blogged about in
> http://dbaron.org/log/20101111-crash-future
Sounds like we should close this bug and files others instead. Can we try to figure out concrete steps on what to improve and get those filed? Maybe we should do a session on this during the upcoming stability work week and figure out those concrete steps there?
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•