Closed Bug 470827 Opened 16 years ago Closed 12 years ago

create new module and source file based crash reports

Categories

(Socorro :: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: chofmann, Assigned: dre)

References

()

Details

(Whiteboard: cloud)

It would be great to for module owners to get a breakdown of crash hot spots reported in their area. the simplest version of this would be for a given sample (last 10 days + specific product) to take the source directory and file name at the top of the stack then sort and count the instances of crashes found in each, then order the report based on freqency of reports. 7034 crashes in js/src/ 705 @ dtoa.c:3014 5280 in js/src/nanojit/ 3034 @ LIR.cpp:615 (new modification) 1545 @ LIR.cpp:434 740 @ LIR.cpp:644 3457 crashes in layout/style/ 334 @ nsStyleSet.cpp:531 filename:line numbers should be linked to mxr, and an advance feature might put (new modification), or highlighted color next to line numbers that have changed withing the last 14 days. After we start getting these first reports we might want to have some tagging or filtering features so we can mark source areas that we know aren't the source of the crash, but are further down the execution path from the bad/crashy code, but that can come later.
the crash count should link to the crash reports and the crash number link to the mxr source so for example the line 3034 @ LIR.cpp:615 (new modification) 3034 links to http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.1b2&query_search=signature&query_type=contains&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=nanojit%3A%3ALirBufWriter%3A%3AinsLink%28nanojit%3A%3ALOpcode%2C%20nanojit%3A%3ALIns*%29 and LIR.cpp:615 links to http://mxr.mozilla.org/mozilla1.9.1/source/js/src/nanojit/LIR.cpp#615 or maybe something else that shows a clearer addtional picture of blame/recent change log activity
(In reply to comment #1) > and LIR.cpp:615 links to > > http://mxr.mozilla.org/mozilla1.9.1/source/js/src/nanojit/LIR.cpp#615 > > or maybe something else that shows a clearer addtional picture of blame/recent > change log activity Should link to the hgweb view (like http://hg.mozilla.org/mozilla-central/file/8a601ed6bc4c/js/src/nanojit/LIR.cpp#l615), the source filename has enough info to link to the exact version of the file in that build. Looks like source links might be broken right now? The frames tab is supposed to have working source links.
Filed bug 470841 on the broken source links, fwiw.
(In reply to comment #0) > After we start getting these first reports we might want to have some tagging > or filtering features so we can mark source areas that we know aren't the > source of the crash, but are further down the execution path from the > bad/crashy code, but that can come later. Or so we can explicitly remove some crash reports from being counted because they're known to be happening due to extensions or plugins.
we should do the counts and reporting for extensions and plugins and hand out that data to the responsible parties. e.g. plugin vendors and addon developers.
from some discussion in security-group about how this might look in reports... If you just search by crashing file you get an output that looks pretty scattered, like this for top instances of crashes. BIGNUMBER js/src/jsinterp.cpp:4484 BIGNUMBER-1 modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:411 BIGNUMBER-2 layout/generic/nsFrame.cpp:3991 BIGNUMBER-3 js/src/xpconnect/src/nsXPConnect.cpp:62 I don't think that lends itself to an understand crash hotspots in particular areas of the code (roughly defined as modules). with a bit of refinement on the sorting and some extra passes to build the report you get a better view of that. BIGNUMBER js/src/jsinterp.cpp:4484 [other crashes in js/src/jsinterp.cpp ranked in order of freqency] [other crashes in /js/src/[otherfiles] ranked in order of frequnecy] BIGNUMBER-3 js/src/xpconnect/src/nsXPConnect.cpp:62 [other crashes in js/src/xpconnect as above.. BIGNUMBER-1 modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:411 [other libpr0n crashes] BIGNUMBER-2 layout/generic/nsFrame.cpp:3991 [other layout crashes] This later is what the old talkback system did. Is that more useful than the former long list sorted by instances in any one file and or file and line number? jdaggett is also playing with some ways GFX crashes might be identified which could involve some tricker manipulatation of the data. see bug 513642
Who is the audience for your report? As a module owner/hacker in specific areas, it would be nice to get a list of "topcrashes in XPCOM code" or "topcrashes in toolkit/xre"... that's mainly a filter mechanism, so that I can ignore layout/content/xpconnect/JS crashes which I'm less experienced at. But it sounds like you have something else in mind, if you want a generic picture of "crash hotspots in the code". What I don't understand is how knowing crash hotspots in general is going to particularly help.
> Who is the audience for your report? As a module owner/hacker in specific > areas, it would be nice to get a list of "topcrashes in XPCOM code" or > "topcrashes in toolkit/xre"... that's mainly a filter mechanism, so that I can > ignore layout/content/xpconnect/JS crashes which I'm less experienced at. right, the suggestion in comment 6 applies the filter and collation in the report so 80 module owners don't have to scan though many hundreds of lines of a reporter ordered strictly buy occurence of the top source file in the stack by doing it visually, or with some second pass at filtering on the report. It also shows how "modules" compare to each other to provide some context. A report like this helps to answer the question, where is the crashiest area of the code, and what specific source files/lincs of code are involved in that crashiness. The combination of these things helps bring focus, attention, *and action* to look at most frequent crashing areas.
I don't think that's really helpful though. We already have topcrash lists for "places our users are crashing the most". Knowing that, say, Spidermonkey is where most of our crashes occur doesn't really help anyone, I don't think. What bsmedberg is asking for, and what John Daggett filed a bug about, is getting a report for "crashes in a specific module" based on crash location. This way module owners/peers could get a report of top crashes happening in code they work on, so they could see crashes that are important for them to fix, but that might not be quite popular enough to make the topcrash list.
I agree on the js comment. we don't aways have exact correlation between the signature, the source file at the top of the stack. but, we have gaps in module ownership. we have module owners that know code in more than the module they are active in. All these things suggest we should have a report that at least has a few eyes on it that contains this over all view with a collation of crashes around module areas, to stimulate discuss and checking between module owners. our crrent top crash list are more like BIGNUMBER js/src/jsinterp.cpp:4484 BIGNUMBER-1 modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp:411 BIGNUMBER-2 layout/generic/nsFrame.cpp:3991 BIGNUMBER-3 js/src/xpconnect/src/nsXPConnect.cpp:62 and its hard for module owners to use those to get at anwers like "just tell me about gfx crashes", or just tell me about plugin crashes, or flash crashes....
> we don't aways have exact correlation between the signature, > the source file at the top of the stack. we don't aways have exact correlation between the signature, the source file at the top of the stack and the actual reason for the crash
Whiteboard: cloud
Blocks: 524507
here is a rough cut at number of crashes on all signatures sorted by signature name. if we replaced the signature name with top sourcefile name we would have the first cut at some of the suggestions in this bug http://people.mozilla.com/~chofmann/crash-data/full-signature-list.txt
sdiff gecko-signatures-20091025-3.0.14.txt gecko-signatures-20091025-3.5.3.txt > gecko-sdiff-3.0.14-3.5.3.txt give a rough idea of a side-by-side comparison between two releases as in http://people.mozilla.com/~chofmann/crash-data/gecko-sdiff-3.0.14-3.5.3.txt sdiff doesn't always get alignment quite right, but... on the right side of that report you can find places where new signatures exist where they did not appear in 3.0.14. if alignment was perfect you could see stuff like 260 GraphWalker::DoWalk(nsDeque&) | 847 GraphWalker::DoWalk(nsDeque&) indicating that signature has tripled in crash volume 3.0.14 to 3.5.3 11 HashString(nsAString_internal const& | 66 HashString(nsAString_internal const&) has 6 times the volume 11 XPCCallContext::XPCCallContext(XPCContext::LangType, JSC | 196 XPCCallContext::XPCCallContext(XPCContext::LangType, JSC has 18 times the volume 1 nsHTMLDocument::Release() | 246 nsHTMLDocument::Release() has 246 times the volume
> on the right side of that report you can find places where new signatures exist where they did not appear in 3.0.14. here is an example > 1 imgRequest::Init(nsIURI*, nsIURI*, nsIRequest*, nsIChann > 101 imgRequest::NotifyProxyListener(imgRequestProxy*) > 1 imgRequest::OnDataAvailable(imgIRequest*, gfxIImageFrame > 2 imgRequest::OnDataAvailable(nsIRequest*, nsISupports*, n > 1 imgRequest::OnStartContainer(imgIRequest*, imgIContainer these all appear to be new signatures in 3.5.3 v. nothing in 3.0.14
here is a sample of the simple form of the report using data from 2009 10 25 http://people.mozilla.com/~chofmann/crash-data/top-of-stack-sourcefile-353.csv I have a few more easy tweaks that I think can improve it, but this is close to what ss is needing to find hotspot areas in each module that can be investigated and turned into bugs with a bit of research
I let the same script run overnight and produced about 10,000 sourcefile names and associated stack signatures http://people.mozilla.com/~chofmann/crash-data/top-of-stack-sourcefile-biglist-353.csv
out of the quick look at the report above filed bugs [Bug 524971] Crash Report [@ nsAppShell::ProcessNextNativeEvent(int) ] [Bug 524961] Firefox 3.5.3 Crash Report [@ nsNPAPIPluginInstance::Stop() ] [Bug 524958] 3.5.x crash [@ nsNPAPIPluginInstance::HandleEvent(nsPluginEvent*, int*) ]
Some of David's work does this. Dependent on bug 521917.
Blocks: 521917
Target Milestone: --- → 1.4
Sorry, blocks 521917.
Sorry, wrong bug. Meant bug 525785, which integrates David's correlation reports into the web UI.
Blocks: 525785
No longer blocks: 521917
correlating addons, plugings, and specific .dll's to top crashes is a different kind of analysis than being able to hand out lists of signatures for each of the module owners to investigate possible frequent of code in there area that live out side of the top 100 reports. bug 525785 looks at the module/process list to draw correlations to the signature. dbaron's tool works pretty good for current needs. this bug looks at the stacktrace to pull out source lines at the top of the stack to generate top crash lists based on module area. we don't have any thing like this right now other than the attempt in comment 18. how are you thinking that bug 525785 is connected to the things presented here? ...it would be good to have more detail. the quick and dirty report in comment 18 is something we could start acting on right away, but the screen scraping method to get that report is slow and cumbersome.
and same list sort by module area with crash counts to show hotspots in each module. http://people.mozilla.com/~chofmann/crash-data/sourcelines-bymodule-36b1-20091103.html
Assignee: nobody → deinspanjer
Target Milestone: 1.4 → 1.5
Should this bug or bug 470827 have priority for development? Blocked on getting data into Hadoop for dev
I think you meant bug 464775. Chris or someone from CrashKill would be able to help determine priority.
Target Milestone: 1.5 → ---
it really two different audiences for this bug and bug 464775. this bug is going to help engineers and module owners that work on mozilla code to get better insight into crashes affecting the areas they work in. to make ground on reducing the long tail crashes we need to distribute the crash fixing load across all module owners. bug 464775 is going to help get 3 party plugins an binary extension people more engaged in understanding problems and engaged in fixes. I'm starting to work more seeing if we can do more outreach to the people that are building on the platform. I guess I'd say lets do this one first if we have to set some priorities, but we really want to get both kinds of people involved.
I've pulled some data from the .csv files to show what one of these reports might look like http://people.mozilla.com/~chofmann/crash-stats/20100607/daily-source-364.txt we should probably remove the hang reports from these and push them out into different reports for analysis. the top section of the report at the top that ranks each module area like 1381 js/src 1142 dom/plugins 732 widget/src 550 nsprpub/pr 461 layout/generic 402 ipc/chromium 395 obj-firefox/memory 381 modules/plugin 306 xpcom/base 287 ipc/glue 233 security/nss 229 dom/base 205 db/sqlite3 should be linkafied to point at the sections of the report below that shows combinations of top source lines and the signatures they are associated with. The csv files don't have the computed line numbers but that data should be included as well. the source line, signature, and bug list should all link to the other references in mxr, crash reports, and bugzilla. The search link for singatures would be tricky and different from the capability we have now. The search would be on signature *and* top source line in the stack. 66 js/src/jsobj.cpp js_TraceObject buglist=483482,503772 61 js/src/jsgc.cpp JS_CallTracer buglist=481302,487271,495177,514734,523423,537011,540187,543386,544413,544446,544808,544911,545692,546764,568405 46 js/src/jsgc.cpp JS_TraceChildren buglist=474080,503767,545333,555563,556829 38 js/src/xpconnect/src/xpcwrappednative.cpp RtlpWaitForCriticalSection | RtlEnterCriticalSection buglist=511757,520639,536455 28 js/src/jsgc.cpp js_GC buglist=426162,431060,445204
Component: Socorro → General
Product: Webtools → Socorro
Depends on: 656297
I don't think this is valuable enough to work on.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.