Closed Bug 776497 Opened 13 years ago Closed 12 years ago

crash in nsGlobalWindow::SetNewDocument

Categories

(Core :: DOM: Core & HTML, defect)

15 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla20
Tracking Status
firefox14 --- unaffected
firefox15 + wontfix
firefox16 - affected
firefox17 - affected

People

(Reporter: scoobidiver, Unassigned)

References

()

Details

(4 keywords)

Crash Data

Attachments

(3 files)

It's #40 top browser crasher in 15.0b1. It was a low volume crash across release builds but it first appeared for Nightly builds in 15.0a1/20120509. It's slightly correlated to Crossrider Apps and Babylon: * July 20: 36% (4/11) vs. 0% (4/3022) crossriderapp4639@crossrider.com 45% (5/11) vs. 5% (145/3022) ffxtlbr@babylon.com * July 21: 33% (7/21) vs. 3% (329/11823) crossriderapp2258@crossrider.com * July 22: 18% (6/34) vs. 3% (791/27771) crossriderapp2258@crossrider.com 12% (4/34) vs. 0% (4/27771) crossriderapp4982@crossrider.com 24% (8/34) vs. 7% (2018/27771) ffxtlbr@babylon.com * July 23: 28% (9/32) vs. 1% (140/27835) crossriderapp5060@crossrider.com 22% (7/32) vs. 7% (2068/27835) ffxtlbr@babylon.com Signature nsGlobalWindow::SetNewDocument(nsIDocument*, nsISupports*, bool) More Reports Search UUID 3c207311-5e81-496b-bee5-2b0f72120722 Date Processed 2012-07-22 20:27:04 Uptime 2510 Last Crash 2.8 weeks before submission Install Age 41.8 minutes since version was first installed. Install Time 2012-07-22 19:45:04 Product Firefox Version 17.0a1 Build ID 20120722030555 Release Channel nightly OS Windows NT OS Version 6.1.7601 Service Pack 1 Build Architecture x86 Build Architecture Info AuthenticAMD family 16 model 4 stepping 3 Crash Reason EXCEPTION_ACCESS_VIOLATION_READ Crash Address 0x154 App Notes AdapterVendorID: 0x10de, AdapterDeviceID: 0x06cd, AdapterSubsysID: 115319da, AdapterDriverVersion: 9.18.13.448 D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ EMCheckCompatibility True Adapter Vendor ID 0x10de Adapter Device ID 0x06cd Total Virtual Memory 4294836224 Available Virtual Memory 3507171328 System Memory Use Percentage 26 Available Page File 14754906112 Available Physical Memory 6320115712 Frame Module Signature Source 0 xul.dll nsGlobalWindow::SetNewDocument dom/base/nsGlobalWindow.cpp:1877 1 xul.dll DocumentViewerImpl::InitInternal layout/base/nsDocumentViewer.cpp:926 2 xul.dll DocumentViewerImpl::Close layout/base/nsDocumentViewer.cpp:1429 3 @0xcb4638f 4 xul.dll nsDocShell::Embed docshell/base/nsDocShell.cpp:5907 5 xul.dll nsDocShell::CreateAboutBlankContentViewer docshell/base/nsDocShell.cpp:6643 6 xul.dll nsDocShell::CreateAboutBlankContentViewer docshell/base/nsDocShell.cpp:6661 7 xul.dll nsGlobalWindow::SetOpenerScriptPrincipal dom/base/nsGlobalWindow.cpp:1529 8 xul.dll nsWindowWatcher::OpenWindowJSInternal embedding/components/windowwatcher/src/nsWindowWatcher.cpp:863 9 xul.dll nsWindowWatcher::OpenWindow embedding/components/windowwatcher/src/nsWindowWatcher.cpp:381 10 xul.dll NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:70 11 xul.dll XPCWrappedNative::CallMethod js/xpconnect/src/XPCWrappedNative.cpp:2382 12 xul.dll XPC_WN_CallMethod js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1474 13 mozjs.dll js::InvokeKernel js/src/jsinterp.cpp:345 14 mozjs.dll js::Interpret js/src/jsinterp.cpp:2426 15 mozjs.dll js::InvokeKernel js/src/jsinterp.cpp:356 16 mozjs.dll js::Invoke js/src/jsinterp.cpp:388 17 mozjs.dll JS_CallFunctionValue js/src/jsapi.cpp:5566 18 xul.dll nsXPCWrappedJSClass::CallMethod js/xpconnect/src/XPCWrappedJSClass.cpp:1436 19 xul.dll nsXPCWrappedJS::CallMethod js/xpconnect/src/XPCWrappedJS.cpp:580 20 xul.dll PrepareAndDispatch xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:85 21 xul.dll SharedStub xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:112 22 xul.dll DocumentViewerImpl::PermitUnload layout/base/nsDocumentViewer.cpp:1159 More reports at: https://crash-stats.mozilla.com/report/list?signature=nsGlobalWindow%3A%3ASetNewDocument%28nsIDocument*%2C+nsISupports*%2C+bool%29
Crash Signature: [@ nsGlobalWindow::SetNewDocument(nsIDocument*, nsISupports*, bool)] → [@ nsGlobalWindow::SetNewDocument(nsIDocument*, nsISupports*, bool)] [@ nsGlobalWindow::SetNewDocument]
OS: Windows XP → All
Hardware: x86 → All
If I recall correctly I think it was in the process of saving a draft copy of my wordpress blog. I had a lot of other windows open as well, so it could have potentially been stuff running in the background.
Kyle, didn't you hack SetNewDocument recently?
The stacks are useless here. Socorro is not linkifying them for some reason.
These crashes are happening on a line modified in Bug 730208, which landed for 15. sfink is on vacation though ...
Tracking this regressing crasher for 15 since it started in 15, assigning to dmandelin to see about getting someone to help on this since sfink is on vacation.
Assignee: nobody → dmandelin
This looks like just an NPE, so here's a patch to check for null. I suspect this is not really the right thing to do, because the assertion a few lines up implies that currentInner == nullptr is not expected inside this if, but I also don't see anything above that would prevent that from happening. Bobby, do you think we should patch this null check, or would that just mask a bug in SetNewDocument or one of its callers?
Attachment #648779 - Flags: review?(bobbyholley+bmo)
Comment on attachment 648779 [details] [diff] [review] Patch, just check for null So, the NPE is occurring in the branch were we decided to reUseInnerWindow. The code paths taking us here don't pass aForceReuseInnerWIndow AFAICT, so the fact that we're getting here means that WouldReuseInnerWindow() returned true. But this means that mDoc must be non-null. So we're getting into a situation where we've got a non-null mDoc but a null mInnerWindow. This means that an earlier call to SetNewDocument probably did an exceptional early-return, between here: http://hg.mozilla.org/mozilla-central/file/3199bc043da4/dom/base/nsGlobalWindow.cpp#l1845 and here: http://hg.mozilla.org/mozilla-central/file/3199bc043da4/dom/base/nsGlobalWindow.cpp#l1968 The most likely cause is that CreateNativeGlobalForInner failed. We assert against this, so I think it should be considered a bug until we understand it. CreateNativeGlobalForInner does a lot of stuff though, so I'm totally willing to believe that there's something fallible in there. The wallpaper fix is to just check for a null mInnerWindow in WouldReuseInnerWindow (which is why I'm rminusing the attached patch). The current fix involves determining why we're early-returning, which probably requires STR.
Attachment #648779 - Flags: review?(bobbyholley+bmo) → review-
Ms2ger also suggested nulling out mDoc if CreateNativeGlobalForInner fails, which might be more robust (but more nasty cleanup code - maybe an RAII class?) I'm starting to think this is related to bug 777875. They both appeared around the same time. I'm guessing somebody landed a patch somewhere that caused us to start tripping the assertion here in non-teardown cases: http://hg.mozilla.org/mozilla-central/rev/712bca8b8674#l1.39 This would cause CreateNativeGlobalForInner to fail. Can we maybe get regression windows and see what everything is correlated with?
(In reply to Bobby Holley (:bholley) from comment #10) > Can we maybe get regression windows and see what everything is correlated with? Without STR, it will be hard as it's discontinuous across builds. It might be: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1092c1a3ac50&tochange=dd29535bac5f or http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e794cef56df6&tochange=642d1a36702f
Given that I'm guessing this is also causing the top-orange in bug 777875, flagging qawanted to get some STR. Not sure whether it's more productive to start from the random (but frequent) mochitest oranges, or from the crashes.
Keywords: qawanted
Thanks for the analysis. I think I'm not the best person to fix the actual bug.
Assignee: dmandelin → nobody
I wasn't able to find and install the Crossrider apps mentioned in comment 0, but I was able to install a handful of Crossrider demo apps from http://crossrider.com/developer/demo along with Babylon 9.0. Using Firefox 15.0b3 on Windows XP I've been unable to reproduce any crashes so far. Many of the comments mention that the crashes happen after resuming from overnight idle. Here are the following scenarios I tried with Firefox running and content loaded: * Manually shutdown to standby and resume * Set stand-by timeout to 60 seconds, wait for standby and resume * Simulate a standby via power button and resume * Set hibernate timeout to 120 seconds, wait for hibernation and resume At least from the very minimal user scenarios I can think of, this bug is not reproducible. Bobby, I'm not sure how to QA this from a mochitest perspective. Some instruction would be appreciated.
With all the add-ons installed from before I tried opening each of the above URLs one by one in a new tab. Part way through Firefox crashed. I don't know whether the signature is the same because the report is still being processed. The ID is 925c0b89-e0ff-442d-8820-c33c82120809 just in case some else wants to check. One thing I did notice was the crash happened when clicking OK on one of the dialogs (each crossrider add-on executes JS and prompts on tab load). Again, I'm not yet certain the crash I saw was the same, and I've not yet re-encountered the crash.
Retried the same test as in comment 16 but this time having all the tabs loaded, entering stand-by mode, then resuming and switch around to different tabs; no crash reproduced.
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #16) > The ID is 925c0b89-e0ff-442d-8820-c33c82120809 just in case some > else wants to check. It's indeed the same crash: bp-925c0b89-e0ff-442d-8820-c33c82120809.
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #14) > Bobby, I'm not sure how to QA this from a mochitest > perspective. Some instruction would be appreciated. Not sure. Does QA have any experience reproducing random oranges from tinderbox? I don't have any great ideas offhand. Anyway, it sounds like you managed to at least reproduce the crash once, which is awesome! If we can hone in on that and get something reliable, I'll jump right in. Alternatively, if we get really stuck, we can just capitulate on this bug (and the random orange) and just add code to handle the failures that shouldn't be happening. I'd really like to avoid that though. :-(
Anthony, did you test this with a release build or a debug build? Using a debug build might help here. Bobby, could you add assertions that might help, maybe in a try build?
(In reply to Bill McCloskey (:billm) from comment #20) > Anthony, did you test this with a release build or a debug build? Using a > debug build might help here. Bobby, could you add assertions that might > help, maybe in a try build? If my theory is correct, we've already got the relevant assertion - the one being triggered in the random orange in bug 777875.
Philor suggests in bug 782167 comment 5 that reproducing might require triggering a tooltip.
Apologies for the delayed response. (In reply to Bill McCloskey (:billm) from comment #20) > Anthony, did you test this with a release build or a debug build? I was not using a debug build. I'm unable to get one working on Windows though. I've spent hours trying to get a Windows debug environment set up and it just won't work. That said, I've not been able to re-reproduce this crash.
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #23) > I was not using a debug build. I'm unable to get one working on Windows > though. I've spent hours trying to get a Windows debug environment set up > and it just won't work. > > That said, I've not been able to re-reproduce this crash. I think it should be possible to install a debug build without needing a build environment. They're available here, for example: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-08-14-mozilla-inbound-debug/ Although you might need an extra file from here: https://developer.mozilla.org/en-US/docs/Running_Windows_Debug_Builds Maybe that's where you had the problem? In general, doing testing with debug builds is usually a lot more valuable than testing with release builds. It's often a lot easier to reliably reproduce things.
(In reply to Bill McCloskey (:billm) from comment #24) > Although you might need an extra file from here: > https://developer.mozilla.org/en-US/docs/Running_Windows_Debug_Builds > Maybe that's where you had the problem? This is the exact problem that I always seem to have but installing the SDK as instructed via the link above does not resolve the issue for me. If you want to continue to help me troubleshoot my personal issues that would be great, but lets this onto IRC or email. Otherwise, I'm not sure that I can be much more help on this bug.
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #25) > This is the exact problem that I always seem to have but installing the SDK > as instructed via the link above does not resolve the issue for me. If you > want to continue to help me troubleshoot my personal issues that would be > great, but lets this onto IRC or email. Kyle helpfully updated the docs at: https://developer.mozilla.org/en-US/docs/Running_Windows_Debug_Builds It includes a new link to a file that should fix the problem. I'm posting this here in case anyone else has this problem.
I've stumbled on a reliably reproducible crash condition but I can't tell if it's related to this bug or not. Steps: 1. Install build from comment 24 in Windows XP 2. Install Babylon 9 Pro (including toolbar) 3. Install the Social Anywhere Crossrider App > http://crossrider.com/install/519-social-anywhere 4. Restart Firefox and open Google+ Firefox eventually crashes if I let it site there on Google+ for a minute. Allowing the session to restore on restart triggers the crash again. Debug output shows an assertion: ###!!! ASSERTION: JSEventListener has wrong script context?: 'stack && NS_SUCCEEDED(stack->Peek(&cx)) && cx && GetScriptContextFromJSContext(cx) == mContext', file e:/builds/moz2_slave/m-in-w32-dbg/build/dom/src/events/nsJSEventListener.cpp, line 182 I received the following crash reports: bp-243b8cd0-223d-4612-bb62-fdd5f2120816 bp-ce190b68-ebe6-4066-b18e-507522120816 bp-fe3f01d4-9161-4684-b903-1c8da2120816 bp-a7958e00-d207-4665-81f0-248e82120816 bp-a1f68248-03b5-4155-a8df-745ce2120816 This does not reproduce when using a non-debug build like Firefox 15.0b5.
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #27) > I've stumbled on a reliably reproducible crash condition but I can't tell if > it's related to this bug or not. What indicates that it's related? I don't see a stack in the crash reports. Were you able to get a stack some other way?
I don't know that it's a related at all, apart from the fact that it's a reproducible crash with Babylon toolbar and Crossrider Apps installed.
This crash is at #38 for FF15 and we're a day away from going to build on our final Beta so I'm wontfixing this for 15 and we can continue to investigate (and watch for more correlations) in 16, esp. once it goes to our Beta audience we might get some new leads here.
While it's unfortunate we have this regression, if we can't find STR soon, we'll likely untrack for FF16's release. This is not a top crasher.
A reliable set of steps to reproduce this bug still elude me. At this point I don't see anything that QA can do. Please re-add qawanted if a new lead comes to light.
Keywords: qawantedsteps-wanted
It's correlated with malware in FF 15: * DLLs: 27% (67/246) vs. 1% (823/150321) DataMngrHlpFF15.dll (used in Bandoo/iMesh/Discordia's extensions) 30% (73/246) vs. 4% (6469/150321) datamngr.dll (used in Bandoo/iMesh/Discordia's extensions) * Extensions: 27% (67/246) vs. 2% (2330/150321) {1FD91A9C-410C-4090-BBCC-55D3450EF433} (DataMngr) 12% (29/246) vs. 2% (2813/150321) wtxpcom@mybrowserbar.com (Widgi Toolbar Platform) 15% (36/246) vs. 5% (7333/150321) plugin@yontoo.com (Yontoo) 13% (33/246) vs. 4% (5939/150321) ffxtlbr@funmoods.com (Funmmods) 11% (27/246) vs. 2% (2711/150321) {BBDA0591-3099-440a-AA10-41764D9DB4DB} (Symantec IPS) 11% (26/246) vs. 2% (2326/150321) {2D3F3651-74B9-4795-BDEC-6DA2F431CB62} (Norton Toolbar) 11% (26/246) vs. 2% (2379/150321) {99079a25-328f-4bd4-be04-00955acaa0a7} (Searchqu Toolbar) 15% (38/246) vs. 7% (10908/150321) ffxtlbr@babylon.com (Babylon Toolbar) 9% (22/246) vs. 1% (1749/150321) ytd@mybrowserbar.com (YTD Toolbar)
I get frequent crashes in google docs presentation editor. Basically one crash every 15 minutes or so of work on creating a presentation. Most of the crashes look like the stack trace is wrong/impossible, just linux-gate.so@0x424 or such (at least I can't understand what that could mean), but now I got one that matches this bug. Perhaps using google docs presentation editor can help find STR (I have not seen a specific action though that causes it - looks like I could be clicking on any of their GUI elements, like changing font color etc., to trigger the crash).
linux-gate.so is a virtual shared library used for virtual system calls, FWIW: http://www.trilithium.com/johan/2005/08/linux-gate/
Interesting, thanks. So what does it mean when I get frequent crashes in a site that have that signature? Is there some way to see which specific syscall it is? The stack trace above linux-gate (that is, what would presumably be doing the syscall) doesn't seem useful for some reason.
Do you have a link to one of these crashes?
It's #30 top browser crasher in 15.0.1.
Keywords: topcrash
Depends on: 795248
With combined signatures, it's #41 top browser crasher in 16.0.1. It's now correlated to Babylon like comment 27 mentions it: nsGlobalWindow::SetNewDocument(nsIDocument*, nsISupports*, bool)|EXCEPTION_ACCESS_VIOLATION_READ (99 crashes) 43% (43/99) vs. 2% (1473/92192) browsemngr-16.0.dll 26% (26/99) vs. 1% (514/92192) 2.3.782.39 17% (17/99) vs. 1% (959/92192) 2.3.787.43 46% (46/99) vs. 5% (4416/92192) browsemngr.dll 8% (8/99) vs. 0% (455/92192) 2.3.762.17 18% (18/99) vs. 1% (930/92192) 2.3.765.24 20% (20/99) vs. 3% (2560/92192) 2.3.787.43 0% (0/99) vs. 0% (148/92192) 2.3.796.11
As written in Bug 803022 I have STR for the crash [1] but I can only reproduce it with a personal account on a vBulletin forum website. It may only be reproducible on Mac OS X and may only happen when using the touchpad swipe gesture (will check that again when I temporarily change the password). It happens with a clean(?) Fx 16.0.2 profile and Firebug installed (but if I recall correctly, it crashed without Firebug, too). Please tell me to whom I should send the credentials (via email) and STR (or check out Bug 803022 Comment 12) for the website to reproduce the crash. [1] https://crash-stats.mozilla.com/report/index/bp-197440a2-e017-42fd-ae06-f9b232121104
I am able to reproduce this crash 100% by loading http://m.whiskeymilitia.com/. Here is my crash report on Aurora using Mac 10.8: https://crash-stats.mozilla.com/report/index/bp-f375c33b-3e3a-4816-b00d-a14c72121108 Note that the crash doesn't happen right away so you have to be patient. I will point to this comment in the bug referenced in Comment 41 as well.
Keywords: reproducible
Confirm on Fx 16.0.2 on Mac OS X 10.8.2 on MBA 2012. UI stalled with "pagead2.googlesyndication.com" (or similar) in the status bar. Took some time to crash. https://crash-stats.mozilla.com/report/index/bp-01ba0fb6-4e7a-456e-be9b-770cf2121108
Oh yeah it was a mostly clean profile and the STR from Comment 42. I was only testing the stuff from bug 803022 for a couple of minutes, otherwise it's a fresh profile from like 30 minutes ago (no extensions, only QuickTime plugin). Bug 803022 did not turn out to be reproducible any more (concerning the crash).
Here's the stack I get on m.whiskeymilitia.com
So what's happening with whiskeymilitia is that the script is dispatching events in nested event handlers. Eventually it hits the recursion limit, but the exception doesn't propagate very far up, because HandleEventInternal squelches exceptions. Eventually, the slow script dialog tries to take over, but then _it_ runs into trouble, because of the native stack limit (mccr8 had an idea to let privileged code run with a higher native stack limit, did that every go anywhere?). Anyway, I wasn't able to reproduce the SetNewDocument crash myself, but see how it could happen (GetCurrentInnerWindow() might be returning null). Let's see if this patch does the trick.
Comment on attachment 683790 [details] [diff] [review] Check for null currentInner when deciding to reuse inner windows. v1 Not far from dmandelin's patch ;)
Attachment #683790 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #49) > Not far from dmandelin's patch ;) Yes, but at least now we understand it a little better :-) https://hg.mozilla.org/integration/mozilla-inbound/rev/62769304221f
> mccr8 had an idea to let privileged code run with a higher native stack limit, did that every go anywhere? This was Jesse's idea, not mine. :) Bug 813646, just filed.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Blocks: 803022
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: