Closed Bug 488850 Opened 16 years ago Closed 15 years ago

Crash [@ nsRootBoxFrame::RemoveFrame][@ nsFrame::Destroy][@@0x0 | nsFrameList::DestroyFrames()] with popup, menuitem and body onload removing window

Categories

(Core :: Graphics: ImageLib, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking1.9.2 --- needed
status1.9.2 --- .2-fixed
blocking1.9.1 --- needed
status1.9.1 --- .9-fixed

People

(Reporter: martijn.martijn, Assigned: bzbarsky)

Details

(4 keywords, Whiteboard: [sg:critical?][3.6.x] deleted shell obj.)

Crash Data

Attachments

(3 files)

Attached file zipped up testcase
See zipped up testcase, to reproduce the crash: Unzip the testcase, then open the file named 'parentframe.htm'. It should crash after 500ms (after a reload). It also crashes Firefox3.0.7, so marking security sensitive for now. http://crash-stats.mozilla.com/report/index/049381bf-35dc-4616-a4d9-146bc2090417?p=1 0 @0x0 1 xul.dll nsRootBoxFrame::RemoveFrame layout/xul/base/src/nsRootBoxFrame.cpp:209 2 xul.dll nsCSSFrameConstructor::ContentRemoved layout/base/nsCSSFrameConstructor.cpp:7234 3 xul.dll PresShell::ContentRemoved layout/base/nsPresShell.cpp:4895 4 xul.dll nsGenericElement::doRemoveChildAt content/base/src/nsGenericElement.cpp:3377 5 xul.dll nsDocument::RemoveChildAt content/base/src/nsDocument.cpp:3252 6 xul.dll nsGenericElement::doRemoveChild content/base/src/nsGenericElement.cpp:3984 7 xul.dll nsDocument::RemoveChild content/base/src/nsDocument.cpp:5462 8 xul.dll nsIDOMNode_RemoveChild obj-firefox/js/src/xpconnect/src/dom_quickstubs.cpp:4183 9 js3250.dll js_Interpret js/src/jsinterp.cpp:5147 10 js3250.dll js_Invoke js/src/jsinterp.cpp:1388 11 js3250.dll js_InternalInvoke js/src/jsinterp.cpp:1441 12 js3250.dll JS_CallFunctionValue js/src/jsapi.cpp:5196 13 xul.dll nsJSContext::CallEventHandler dom/base/nsJSEnvironment.cpp:2007 14 xul.dll nsJSEventListener::HandleEvent dom/src/events/nsJSEventListener.cpp:247 15 xul.dll nsEventListenerManager::HandleEventSubType content/events/src/nsEventListenerManager.cpp:1090 16 xul.dll nsEventListenerManager::HandleEvent content/events/src/nsEventListenerManager.cpp:1187 17 xul.dll nsEventTargetChainItem::HandleEventTargetChain content/events/src/nsEventDispatcher.cpp:315 18 xul.dll nsEventDispatcher::Dispatch content/events/src/nsEventDispatcher.cpp:508 19 xul.dll nsImageBoxFrameEvent::Run layout/xul/base/src/nsImageBoxFrame.cpp:130 20 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510 21 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170 22 xul.dll nsAppStartup::Run toolkit/components/startup/src/nsAppStartup.cpp:192 23 nspr4.dll PR_GetEnv 24 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:110 25 firefox.exe firefox.exe@0x21a7 26 kernel32.dll BaseProcessStart Firefox3.0.7 has a different stacktrace: http://crash-stats.mozilla.com/report/index/a1305891-19df-43f0-9aa4-8de482090417?p=1 0 xul.dll nsFrame::Destroy mozilla/layout/generic/nsFrame.cpp:500 1 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67 2 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257 3 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67 4 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257 5 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67 6 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257 7 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67 8 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257 9 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67 10 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257 11 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67 12 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257 13 xul.dll nsLineBox::DeleteLineList mozilla/layout/generic/nsLineBox.cpp:340 14 xul.dll nsBlockFrame::Destroy mozilla/layout/generic/nsBlockFrame.cpp:301 15 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67 16 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257 17 xul.dll nsBoxFrame::RemoveFrame mozilla/layout/xul/base/src/nsBoxFrame.cpp:1025 18 xul.dll nsRootBoxFrame::RemoveFrame mozilla/layout/xul/base/src/nsRootBoxFrame.cpp:212 19 xul.dll nsCSSFrameConstructor::ContentRemoved mozilla/layout/base/nsCSSFrameConstructor.cpp:9644 20 xul.dll PresShell::ContentRemoved mozilla/layout/base/nsPresShell.cpp:4724 etc...
Flags: wanted1.9.0.x?
Flags: blocking1.9.0.11?
Summary: Crash [@ nsRootBoxFrame::RemoveFrame] with popup, menuitem and boy onload removing window → Crash [@ nsRootBoxFrame::RemoveFrame] with popup, menuitem and body onload removing window
Flags: blocking1.9.2?
Flags: wanted1.9.1.x+
Flags: wanted1.9.0.x?
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.12?
Whiteboard: [sg:critical?]
dbaron: Is this your realm? If not, can you assign to the appropriate person?
Assignee: nobody → dbaron
Flags: wanted1.9.1.x+
Flags: blocking1.9.2? → wanted1.9.2+
On Mac trunk this is a null deref crash. Different code path than Windows, or has trunk just changed since this bug was filed? Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000000 0x1114ca78 in nsFrame::Destroy (this=0x1616580) at /build/m-c/src/layout/generic/nsFrame.cpp:441 441 shell->NotifyDestroyingFrame(this); (gdb) p shell $1 = (class nsIPresShell *) 0x0
(In reply to comment #1) > dbaron: Is this your realm? If not, can you assign to the appropriate person? Hmmm... just assigning bugs to me doesn't work if I don't see the bugmail (which I often don't). In any case, I can reproduce the crash (although it's a different stack), but it doesn't crash under valgrind.
(In reply to comment #1) > dbaron: Is this your realm? If not, can you assign to the appropriate person? To answer the question: not really, but I'm not sure who's more appropriate. In any case, this looks like a null dereference in both of Martijn's stacks (I actually see the latter one on trunk now; the first one might have been something that was on trunk for a bit and then fixed), so I'm not sure why it's security-sensitive. It's also pretty timing sensitive and thus hard to debug. But it seems like the problem is that the pres context in question has gotten its SetShell(null) call, so the mShell that we try to dereference is null.
Unhiding per comment 2 and comment 4.
Group: core-security
Whiteboard: [sg:critical?] → [sg:dos] null deref
Attachment #373323 - Attachment mime type: application/zip → application/java-archive
re-hiding. The 3.0.x crash might be a different bug, but the testcase in this bug exhibits a potentially critical crash on that branch.
Group: core-security
Whiteboard: [sg:dos] null deref → [sg:critical?] deleted mem in 3.0.x, null deref in newer branches?
Summary: Crash [@ nsRootBoxFrame::RemoveFrame] with popup, menuitem and body onload removing window → Crash [@ nsRootBoxFrame::RemoveFrame][@ nsFrame::Destroy] with popup, menuitem and body onload removing window
To expand upon comment 6, shell is a deleted object (0xdddddddd in a windows debug build) when we call shell->NotifyDestroyingFrame(this); In an opt build it may be a null in this testcase, but that doesn't mean a different testcase couldn't get a more useful value there. In 3.5.6pre I'm getting a slightly different stack than in comment 0 but like comment 0 its still trying to execute 0x0 which is scarier than merely dereferencing it. bp-f96a19bc-2d64-4c0d-aad3-9a8d02091112
Summary: Crash [@ nsRootBoxFrame::RemoveFrame][@ nsFrame::Destroy] with popup, menuitem and body onload removing window → Crash [@ nsRootBoxFrame::RemoveFrame][@ nsFrame::Destroy][@@0x0 | nsFrameList::DestroyFrames()] with popup, menuitem and body onload removing window
Whiteboard: [sg:critical?] deleted mem in 3.0.x, null deref in newer branches? → [sg:critical?] deleted mem in 3.0.x, executing @0x0 in newer branches
In a windows 3.5.6pre debug build I get the same stack and deleted shell as 3.0.x
blocking1.9.1: --- → ?
Flags: blocking1.9.0.17?
Whiteboard: [sg:critical?] deleted mem in 3.0.x, executing @0x0 in newer branches → [sg:critical?] deleted shell obj.
The menuitem in crash1.xul with an image gets an nsImageBoxFrame, it sets up a nsImageBoxFrameEvent to run when it is loaded that dispatches the load event. This calls the onload handler for the popup, we get a ContentRemoved on the root element and we start destroying frames. We get to nsImageBoxFrame::Destroy, which calls CancelAndForgetObserver on its imgImageRequest, this is the last thing blocking the load of the document, so it triggers the document's onload handler. So we remove the subdocument frame from the parent document and this nulls out the presshell on the subdocument's prescontext. When we get back to nsImageBoxFrame::Destroy it eventually calls nsFrame::Destroy where our presshell is null.
(In reply to comment #9) > The menuitem in crash1.xul with an image gets an nsImageBoxFrame, it sets up a > nsImageBoxFrameEvent to run when it is loaded that dispatches the load event. > This calls the onload handler for the popup, we get a ContentRemoved on the > root element and we start destroying frames. We get to > nsImageBoxFrame::Destroy, which calls CancelAndForgetObserver on its > imgImageRequest, this is the last thing blocking the load of the document, so > it triggers the document's onload handler. Clearly something between nsImageBoxFrame and the triggering of onload should be asynchronous so onload doesn't fire during frame destruction.
Changing the CancelAndForgetObserver call in nsImageBoxFrame::Destroy to a Cancel call and moving the nulling out of the frame of mListener to before the Cancel call would seem like the right way to fix this. Or am I missing something? Potentially every other call of CancelAndForgetObserver could have this same problem.
(In reply to comment #11) > Changing the CancelAndForgetObserver call in nsImageBoxFrame::Destroy to a > Cancel call and moving the nulling out of the frame of mListener to before the > Cancel call would seem like the right way to fix this. Or am I missing > something? Sounds good to me. I don't think that nulling out needs to be moved, though. > Potentially every other call of CancelAndForgetObserver could have this same > problem. Maybe. I'm not sure why CancelAndForgetObserver would want to cancel synchronously, but then I don't know this code at all.
The nulling out of the frame is because I think we can still get notifications sent to the listener object.
Oh, you mean synchronously during the Cancel() call? OK.
No, actually, asynchronously from the Cancel call.
If it's asynchronous then we don't need to move the nulling out, it's fine for it to be where it is after the Cancel() call.
I guess you're right if we don't have to worry about synchronous notifications.
> Or am I missing something? I think you are, and the something is that imagelib holds a raw pointer weak ref to mListener, which means that once nsImageBoxFrame::~nsImageBoxFrame happens it'll be holding a pointer to a dead object and then will proceed to asynchronously call virtual methods on it... We should really just fix that with some cycle collection smarts in imagelib, imho. But given that, you really do need to CancelAndForgetObserver(). The fact that this ends up synchronously removing from the loadgroup is really bad; it needs to be guaranteeing that doesn't happen. Joe, Bobby, can we fix that?
blocking1.9.1: ? → .7+
Flags: blocking1.9.0.17? → blocking1.9.0.17+
Whiteboard: [sg:critical?] deleted shell obj. → [sg:critical?][3.6.x] deleted shell obj.
bzbarsky said he'd be able to get to this in a week or two, I think, and he's a much better owner for it than I am.
Assignee: dbaron → bzbarsky
blocking1.9.1: .8+ → needed
Flags: blocking1.9.0.18+ → blocking1.9.0.19+
Moving over to the right component. There's no sane way to fix this in layout; it needs to happen in imagelib.
Component: Layout → ImageLib
QA Contact: layout → imagelib
Attached patch Proposed fixSplinter Review
This is about minimal. It fixes the crash I see on Mac. Martijn, can you try it on Windows?
Attachment #429049 - Flags: review?(joe)
(In reply to comment #18) > But given that, you really do need to CancelAndForgetObserver(). The fact that > this ends up synchronously removing from the loadgroup is really bad; it needs > to be guaranteeing that doesn't happen. Joe, Bobby, can we fix that? Sorry I somehow missed this before. As you proved, we *can* fix it. We have the technology. My question is: why should removing from the load group be asynchronous when everything else in that method is synchronous, by design? More broadly stated: how in the world can we get this *right*, aside from hooking imagelib up to the cycle collector?
> why should removing from the load group be asynchronous when everything else in > that method is synchronous, by design? The main thing that needs to be synchronous is setting the observer to null (since it's about to die). The rest of the synchronicity is due to not wanting to break imagelib api, and therefore needing to deliver notifications before nulling out the observer. A key part here is that the observer is what receives these notifications and thus has control over how they're handled. The loadgroup removal is NOT part of the imagelib API; it's part of the necko API. And necko defines that Cancel() on an nsIRequest is asynchronous. More importantly, though, the loadgroup removal can trigger arbitrary code to run that is not under the control of the observer, while we know for a fact that use of the CancelAndForgetObserver API means the observer is not in a good state (in the middle of a destructor, for one thing!). > More broadly stated: how in the world can we get this *right*, Hooking up to cycle collection is one approach, as you say. The other approach, which I think might be even better, is switching imagelib to an API more similar to Necko's, where the image would not keep a pointer to its observer for longer than absolutely necessary. In particular, if animations were pulled by consumers, not pushed by imagelib, we would not need long-term references to observers. All this modulo multipart jpegs, which we would need to do something with, of course, since right now they're exposed in imagelib API as multiple separate image loads, not as animations, right?
(In reply to comment #23) > The loadgroup removal is NOT part of the imagelib API; it's part of the necko > API. And necko defines that Cancel() on an nsIRequest is asynchronous. More > importantly, though, the loadgroup removal can trigger arbitrary code to run > that is not under the control of the observer, while we know for a fact that > use of the CancelAndForgetObserver API means the observer is not in a good > state (in the middle of a destructor, for one thing!). OK, I totally buy this! > if animations > were pulled by consumers, not pushed by imagelib, we would not need long-term > references to observers. This is probably possible. Because images don't change size, no reflow would be necessary; nsRefreshDriver could tell layout to redraw, and then layout could just get imagelib to draw the current frame. > All this modulo multipart jpegs, which we would need > to do something with, of course, since right now they're exposed in imagelib > API as multiple separate image loads, not as animations, right? Right. Which is a bit silly, but before nsRefreshDriver there was really no way to force redraws, I guess.
(In reply to comment #21) > Created an attachment (id=429049) [details] > Proposed fix > > This is about minimal. It fixes the crash I see on Mac. Martijn, can you try > it on Windows? Yes, that seems to fix it, thanks.
Joe: can you expedite review for this blocking bug?
Whiteboard: [sg:critical?][3.6.x] deleted shell obj. → [sg:critical?][3.6.x] deleted shell obj.[needs review jdrew]
Attachment #429049 - Flags: superreview?(vladimir)
Attachment #429049 - Flags: review?(joe)
Attachment #429049 - Flags: review+
Vlad, if you can't review this before Sunday, just switch the review request to roc - he can do it on Monday, aka North America's Sunday. :)
Attachment #429049 - Flags: superreview?(vladimir) → superreview+
This doesn't need to be fixed in 1.9.0.19 unless it's also fixed in 1.9.1.9. Since it hasn't landed on trunk yet, that's unlikely, so no longer blocking for 1.9.0.19.
Flags: blocking1.9.0.19+ → blocking1.9.0.19-
Gah. The only reason this hasn't landed on trunk yet is because I was waiting on vlad's review and of course I got no notification that it had happened... I'm landing it on trunk right now.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
That said, on 1.9.0 this looks like it needs a totally different fix, since CancelAndForgetRequest doesn't exist there. Joe?
Requesting approvals for what I think are the relevant branches...
Attachment #431158 - Flags: approval1.9.2.3?
Attachment #431158 - Flags: approval1.9.2.2?
Attachment #431158 - Flags: approval1.9.1.9?
Attachment #431158 - Flags: approval1.9.1.10?
blocking1.9.2: --- → ?
status1.9.2: --- → ?
I'm going to call this status1.9.0-wontfix, because fixing it will introduce more risk than not fixing it (since it would involve backporting all the Cancel*() fixes that bz and I have cooked up over the past two years).
Whiteboard: [sg:critical?][3.6.x] deleted shell obj.[needs review jdrew] → [sg:critical?][3.6.x] deleted shell obj.
Attachment #431158 - Flags: approval1.9.2.3?
Attachment #431158 - Flags: approval1.9.2.2?
Attachment #431158 - Flags: approval1.9.2.2+
Attachment #431158 - Flags: approval1.9.1.9?
Attachment #431158 - Flags: approval1.9.1.9+
Attachment #431158 - Flags: approval1.9.1.10?
b+a=beltzner for 1.9.2.2 and 1.9.1.9
blocking1.9.2: ? → needed
Verified for 1.9.1.9 using attached testcase and Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9pre) Gecko/20100311 Shiretoko/3.5.9pre (.NET CLR 3.5.30729). It no longer crashes as it does on 1.9.1.8. Verified for 1.9.2 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.2pre) Gecko/20100311 Namoroka/3.6.2pre (.NET CLR 3.5.30729).
Group: core-security
Crash Signature: [@ nsRootBoxFrame::RemoveFrame] [@ nsFrame::Destroy] [@@0x0 | nsFrameList::DestroyFrames()]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: