Closed
Bug 470827
Opened 16 years ago
Closed 12 years ago
create new module and source file based crash reports
Categories
(Socorro :: General, task)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: chofmann, Assigned: dre)
References
()
Details
(Whiteboard: cloud)
It would be great to for module owners to get a breakdown of crash hot spots reported in their area.
the simplest version of this would be for a given sample (last 10 days + specific product) to take the source directory and file name at the top of the stack then sort and count the instances of crashes found in each, then order the report based on freqency of reports.
7034 crashes in js/src/
705 @ dtoa.c:3014
5280 in js/src/nanojit/
3034 @ LIR.cpp:615 (new modification)
1545 @ LIR.cpp:434
740 @ LIR.cpp:644
3457 crashes in layout/style/
334 @ nsStyleSet.cpp:531
filename:line numbers should be linked to mxr, and an advance feature might put (new modification), or highlighted color next to line numbers that have changed withing the last 14 days.
After we start getting these first reports we might want to have some tagging or filtering features so we can mark source areas that we know aren't the source of the crash, but are further down the execution path from the bad/crashy code, but that can come later.
Reporter | ||
Comment 1•16 years ago
|
||
the crash count should link to the crash reports
and the crash number link to the mxr source
so for example the line
3034 @ LIR.cpp:615 (new modification)
3034 links to http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.1b2&query_search=signature&query_type=contains&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=nanojit%3A%3ALirBufWriter%3A%3AinsLink%28nanojit%3A%3ALOpcode%2C%20nanojit%3A%3ALIns*%29
and LIR.cpp:615 links to
http://mxr.mozilla.org/mozilla1.9.1/source/js/src/nanojit/LIR.cpp#615
or maybe something else that shows a clearer addtional picture of blame/recent change log activity
Comment 2•16 years ago
|
||
(In reply to comment #1)
> and LIR.cpp:615 links to
>
> http://mxr.mozilla.org/mozilla1.9.1/source/js/src/nanojit/LIR.cpp#615
>
> or maybe something else that shows a clearer addtional picture of blame/recent
> change log activity
Should link to the hgweb view (like http://hg.mozilla.org/mozilla-central/file/8a601ed6bc4c/js/src/nanojit/LIR.cpp#l615), the source filename has enough info to link to the exact version of the file in that build. Looks like source links might be broken right now? The frames tab is supposed to have working source links.
Comment 3•16 years ago
|
||
Filed bug 470841 on the broken source links, fwiw.
Comment 4•16 years ago
|
||
(In reply to comment #0)
> After we start getting these first reports we might want to have some tagging
> or filtering features so we can mark source areas that we know aren't the
> source of the crash, but are further down the execution path from the
> bad/crashy code, but that can come later.
Or so we can explicitly remove some crash reports from being counted because they're known to be happening due to extensions or plugins.
Reporter | ||
Comment 5•15 years ago
|
||
we should do the counts and reporting for extensions and plugins and hand out that data to the responsible parties. e.g. plugin vendors and addon developers.
Reporter | ||
Comment 6•15 years ago
|
||
from some discussion in security-group about how this might look in reports...
If you just search by crashing file you get an output that looks pretty scattered, like this for top instances of crashes.
BIGNUMBER js/src/jsinterp.cpp:4484
BIGNUMBER-1 modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:411
BIGNUMBER-2 layout/generic/nsFrame.cpp:3991
BIGNUMBER-3 js/src/xpconnect/src/nsXPConnect.cpp:62
I don't think that lends itself to an understand crash hotspots in particular areas of the code (roughly defined as modules).
with a bit of refinement on the sorting and some extra passes to build the report you get a better view of that.
BIGNUMBER js/src/jsinterp.cpp:4484
[other crashes in js/src/jsinterp.cpp ranked in order of freqency]
[other crashes in /js/src/[otherfiles] ranked in order of frequnecy]
BIGNUMBER-3 js/src/xpconnect/src/nsXPConnect.cpp:62
[other crashes in js/src/xpconnect as above..
BIGNUMBER-1 modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:411
[other libpr0n crashes]
BIGNUMBER-2 layout/generic/nsFrame.cpp:3991
[other layout crashes]
This later is what the old talkback system did. Is that more useful than the former long list sorted by instances in any one file and or file and line number?
jdaggett is also playing with some ways GFX crashes might be identified which could involve some tricker manipulatation of the data. see bug 513642
Comment 7•15 years ago
|
||
Who is the audience for your report? As a module owner/hacker in specific areas, it would be nice to get a list of "topcrashes in XPCOM code" or "topcrashes in toolkit/xre"... that's mainly a filter mechanism, so that I can ignore layout/content/xpconnect/JS crashes which I'm less experienced at.
But it sounds like you have something else in mind, if you want a generic picture of "crash hotspots in the code". What I don't understand is how knowing crash hotspots in general is going to particularly help.
Reporter | ||
Comment 8•15 years ago
|
||
> Who is the audience for your report? As a module owner/hacker in specific
> areas, it would be nice to get a list of "topcrashes in XPCOM code" or
> "topcrashes in toolkit/xre"... that's mainly a filter mechanism, so that I can
> ignore layout/content/xpconnect/JS crashes which I'm less experienced at.
right, the suggestion in comment 6 applies the filter and collation in the report so 80 module owners don't have to scan though many hundreds of lines of a reporter ordered strictly buy occurence of the top source file in the stack by doing it visually, or with some second pass at filtering on the report.
It also shows how "modules" compare to each other to provide some context.
A report like this helps to answer the question, where is the crashiest area of the code, and what specific source files/lincs of code are involved in that crashiness.
The combination of these things helps bring focus, attention, *and action* to look at most frequent crashing areas.
Comment 9•15 years ago
|
||
I don't think that's really helpful though. We already have topcrash lists for "places our users are crashing the most". Knowing that, say, Spidermonkey is where most of our crashes occur doesn't really help anyone, I don't think. What bsmedberg is asking for, and what John Daggett filed a bug about, is getting a report for "crashes in a specific module" based on crash location. This way module owners/peers could get a report of top crashes happening in code they work on, so they could see crashes that are important for them to fix, but that might not be quite popular enough to make the topcrash list.
Reporter | ||
Comment 10•15 years ago
|
||
I agree on the js comment.
we don't aways have exact correlation between the signature, the source file at the top of the stack.
but,
we have gaps in module ownership.
we have module owners that know code in more than the module they are active in.
All these things suggest we should have a report that at least has a few eyes on it that contains this over all view with a collation of crashes around module areas, to stimulate discuss and checking between module owners.
our crrent top crash list are more like
BIGNUMBER js/src/jsinterp.cpp:4484
BIGNUMBER-1 modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:411
BIGNUMBER-2 layout/generic/nsFrame.cpp:3991
BIGNUMBER-3 js/src/xpconnect/src/nsXPConnect.cpp:62
and its hard for module owners to use those to get at anwers like "just tell me about gfx crashes", or just tell me about plugin crashes, or flash crashes....
Reporter | ||
Comment 11•15 years ago
|
||
> we don't aways have exact correlation between the signature,
> the source file at the top of the stack.
we don't aways have exact correlation between the signature,
the source file at the top of the stack and the actual reason for the crash
Updated•15 years ago
|
Whiteboard: cloud
Reporter | ||
Comment 12•15 years ago
|
||
here is a rough cut at number of crashes on all signatures sorted by signature name. if we replaced the signature name with top sourcefile name we would have the first cut at some of the suggestions in this bug
http://people.mozilla.com/~chofmann/crash-data/full-signature-list.txt
Reporter | ||
Comment 13•15 years ago
|
||
http://people.mozilla.com/~chofmann/crash-data/gecko-signatures-20091025-3.5.3.txt isolates to just 3.5.3 crashes.
Reporter | ||
Comment 14•15 years ago
|
||
Reporter | ||
Comment 15•15 years ago
|
||
sdiff gecko-signatures-20091025-3.0.14.txt
gecko-signatures-20091025-3.5.3.txt
> gecko-sdiff-3.0.14-3.5.3.txt
give a rough idea of a side-by-side comparison between two releases as in
http://people.mozilla.com/~chofmann/crash-data/gecko-sdiff-3.0.14-3.5.3.txt
sdiff doesn't always get alignment quite right, but...
on the right side of that report you can find places where new signatures exist where they did not appear in 3.0.14.
if alignment was perfect you could see stuff like
260 GraphWalker::DoWalk(nsDeque&) | 847 GraphWalker::DoWalk(nsDeque&)
indicating that signature has tripled in crash volume 3.0.14 to 3.5.3
11 HashString(nsAString_internal const& | 66 HashString(nsAString_internal const&)
has 6 times the volume
11 XPCCallContext::XPCCallContext(XPCContext::LangType, JSC | 196 XPCCallContext::XPCCallContext(XPCContext::LangType, JSC
has 18 times the volume
1 nsHTMLDocument::Release() | 246 nsHTMLDocument::Release()
has 246 times the volume
Reporter | ||
Comment 16•15 years ago
|
||
> on the right side of that report you can find places where new signatures exist
where they did not appear in 3.0.14.
here is an example
> 1 imgRequest::Init(nsIURI*, nsIURI*, nsIRequest*, nsIChann
> 101 imgRequest::NotifyProxyListener(imgRequestProxy*)
> 1 imgRequest::OnDataAvailable(imgIRequest*, gfxIImageFrame
> 2 imgRequest::OnDataAvailable(nsIRequest*, nsISupports*, n
> 1 imgRequest::OnStartContainer(imgIRequest*, imgIContainer
these all appear to be new signatures in 3.5.3 v. nothing in 3.0.14
Reporter | ||
Comment 17•15 years ago
|
||
here is a sample of the simple form of the report using data from 2009 10 25
http://people.mozilla.com/~chofmann/crash-data/top-of-stack-sourcefile-353.csv
I have a few more easy tweaks that I think can improve it, but this is close to what ss is needing to find hotspot areas in each module that can be investigated and turned into bugs with a bit of research
Reporter | ||
Comment 18•15 years ago
|
||
I let the same script run overnight and produced about 10,000 sourcefile names and associated stack signatures
http://people.mozilla.com/~chofmann/crash-data/top-of-stack-sourcefile-biglist-353.csv
Reporter | ||
Comment 19•15 years ago
|
||
out of the quick look at the report above filed bugs
[Bug 524971] Crash Report [@ nsAppShell::ProcessNextNativeEvent(int) ]
[Bug 524961] Firefox 3.5.3 Crash Report [@ nsNPAPIPluginInstance::Stop() ]
[Bug 524958] 3.5.x crash [@ nsNPAPIPluginInstance::HandleEvent(nsPluginEvent*, int*) ]
Comment 20•15 years ago
|
||
Some of David's work does this. Dependent on bug 521917.
Blocks: 521917
Target Milestone: --- → 1.4
Comment 21•15 years ago
|
||
Sorry, blocks 521917.
Comment 22•15 years ago
|
||
Sorry, wrong bug. Meant bug 525785, which integrates David's correlation reports into the web UI.
Reporter | ||
Comment 23•15 years ago
|
||
correlating addons, plugings, and specific .dll's to top crashes is a different kind of analysis than being able to hand out lists of signatures for each of the module owners to investigate possible frequent of code in there area that live out side of the top 100 reports.
bug 525785 looks at the module/process list to draw correlations to the signature. dbaron's tool works pretty good for current needs.
this bug looks at the stacktrace to pull out source lines at the top of the stack to generate top crash lists based on module area. we don't have any thing like this right now other than the attempt in comment 18.
how are you thinking that bug 525785 is connected to the things presented here?
...it would be good to have more detail.
the quick and dirty report in comment 18 is something we could start acting on right away, but the screen scraping method to get that report is slow and cumbersome.
Reporter | ||
Comment 24•15 years ago
|
||
new update of this for 3.6b1 analysis at
http://people.mozilla.com/~chofmann/crash-data/sourcelines-36b1-20091103.html
Reporter | ||
Comment 25•15 years ago
|
||
and same list sort by module area with crash counts to show hotspots in each module.
http://people.mozilla.com/~chofmann/crash-data/sourcelines-bymodule-36b1-20091103.html
Updated•15 years ago
|
Assignee: nobody → deinspanjer
Target Milestone: 1.4 → 1.5
Assignee | ||
Comment 26•15 years ago
|
||
Should this bug or bug 470827 have priority for development?
Blocked on getting data into Hadoop for dev
URL: 538206 542855
Comment 27•15 years ago
|
||
I think you meant bug 464775. Chris or someone from CrashKill would be able to help determine priority.
Updated•15 years ago
|
Target Milestone: 1.5 → ---
Reporter | ||
Comment 28•15 years ago
|
||
it really two different audiences for this bug and bug 464775.
this bug is going to help engineers and module owners that work on mozilla code to get better insight into crashes affecting the areas they work in. to make ground on reducing the long tail crashes we need to distribute the crash fixing load across all module owners.
bug 464775 is going to help get 3 party plugins an binary extension people more engaged in understanding problems and engaged in fixes. I'm starting to work more seeing if we can do more outreach to the people that are building on the platform.
I guess I'd say lets do this one first if we have to set some priorities, but we really want to get both kinds of people involved.
Reporter | ||
Comment 29•15 years ago
|
||
I've pulled some data from the .csv files to show what one of these reports might look like
http://people.mozilla.com/~chofmann/crash-stats/20100607/daily-source-364.txt
we should probably remove the hang reports from these and push them out into different reports for analysis.
the top section of the report at the top that ranks each module area like
1381 js/src
1142 dom/plugins
732 widget/src
550 nsprpub/pr
461 layout/generic
402 ipc/chromium
395 obj-firefox/memory
381 modules/plugin
306 xpcom/base
287 ipc/glue
233 security/nss
229 dom/base
205 db/sqlite3
should be linkafied to point at the sections of the report below that shows combinations of top source lines and the signatures they are associated with.
The csv files don't have the computed line numbers but that data should be included as well. the source line, signature, and bug list should all link to the other references in mxr, crash reports, and bugzilla. The search link for singatures would be tricky and different from the capability we have now. The search would be on signature *and* top source line in the stack.
66 js/src/jsobj.cpp js_TraceObject buglist=483482,503772
61 js/src/jsgc.cpp JS_CallTracer buglist=481302,487271,495177,514734,523423,537011,540187,543386,544413,544446,544808,544911,545692,546764,568405
46 js/src/jsgc.cpp JS_TraceChildren buglist=474080,503767,545333,555563,556829
38 js/src/xpconnect/src/xpcwrappednative.cpp RtlpWaitForCriticalSection | RtlEnterCriticalSection buglist=511757,520639,536455
28 js/src/jsgc.cpp js_GC buglist=426162,431060,445204
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
Comment 30•12 years ago
|
||
I don't think this is valuable enough to work on.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•