Closed
Bug 488850
Opened 16 years ago
Closed 15 years ago
Crash [@ nsRootBoxFrame::RemoveFrame][@ nsFrame::Destroy][@@0x0 | nsFrameList::DestroyFrames()] with popup, menuitem and body onload removing window
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
People
(Reporter: martijn.martijn, Assigned: bzbarsky)
Details
(4 keywords, Whiteboard: [sg:critical?][3.6.x] deleted shell obj.)
Crash Data
Attachments
(3 files)
592 bytes,
application/java-archive
|
Details | |
2.03 KB,
patch
|
joe
:
review+
vlad
:
superreview+
|
Details | Diff | Splinter Review |
2.17 KB,
patch
|
beltzner
:
approval1.9.2.2+
beltzner
:
approval1.9.1.9+
|
Details | Diff | Splinter Review |
See zipped up testcase, to reproduce the crash:
Unzip the testcase, then open the file named 'parentframe.htm'.
It should crash after 500ms (after a reload).
It also crashes Firefox3.0.7, so marking security sensitive for now.
http://crash-stats.mozilla.com/report/index/049381bf-35dc-4616-a4d9-146bc2090417?p=1
0 @0x0
1 xul.dll nsRootBoxFrame::RemoveFrame layout/xul/base/src/nsRootBoxFrame.cpp:209
2 xul.dll nsCSSFrameConstructor::ContentRemoved layout/base/nsCSSFrameConstructor.cpp:7234
3 xul.dll PresShell::ContentRemoved layout/base/nsPresShell.cpp:4895
4 xul.dll nsGenericElement::doRemoveChildAt content/base/src/nsGenericElement.cpp:3377
5 xul.dll nsDocument::RemoveChildAt content/base/src/nsDocument.cpp:3252
6 xul.dll nsGenericElement::doRemoveChild content/base/src/nsGenericElement.cpp:3984
7 xul.dll nsDocument::RemoveChild content/base/src/nsDocument.cpp:5462
8 xul.dll nsIDOMNode_RemoveChild obj-firefox/js/src/xpconnect/src/dom_quickstubs.cpp:4183
9 js3250.dll js_Interpret js/src/jsinterp.cpp:5147
10 js3250.dll js_Invoke js/src/jsinterp.cpp:1388
11 js3250.dll js_InternalInvoke js/src/jsinterp.cpp:1441
12 js3250.dll JS_CallFunctionValue js/src/jsapi.cpp:5196
13 xul.dll nsJSContext::CallEventHandler dom/base/nsJSEnvironment.cpp:2007
14 xul.dll nsJSEventListener::HandleEvent dom/src/events/nsJSEventListener.cpp:247
15 xul.dll nsEventListenerManager::HandleEventSubType content/events/src/nsEventListenerManager.cpp:1090
16 xul.dll nsEventListenerManager::HandleEvent content/events/src/nsEventListenerManager.cpp:1187
17 xul.dll nsEventTargetChainItem::HandleEventTargetChain content/events/src/nsEventDispatcher.cpp:315
18 xul.dll nsEventDispatcher::Dispatch content/events/src/nsEventDispatcher.cpp:508
19 xul.dll nsImageBoxFrameEvent::Run layout/xul/base/src/nsImageBoxFrame.cpp:130
20 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510
21 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:170
22 xul.dll nsAppStartup::Run toolkit/components/startup/src/nsAppStartup.cpp:192
23 nspr4.dll PR_GetEnv
24 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:110
25 firefox.exe firefox.exe@0x21a7
26 kernel32.dll BaseProcessStart
Firefox3.0.7 has a different stacktrace:
http://crash-stats.mozilla.com/report/index/a1305891-19df-43f0-9aa4-8de482090417?p=1
0 xul.dll nsFrame::Destroy mozilla/layout/generic/nsFrame.cpp:500
1 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67
2 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257
3 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67
4 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257
5 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67
6 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257
7 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67
8 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257
9 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67
10 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257
11 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67
12 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257
13 xul.dll nsLineBox::DeleteLineList mozilla/layout/generic/nsLineBox.cpp:340
14 xul.dll nsBlockFrame::Destroy mozilla/layout/generic/nsBlockFrame.cpp:301
15 xul.dll nsFrameList::DestroyFrames mozilla/layout/generic/nsFrameList.cpp:67
16 xul.dll nsContainerFrame::Destroy mozilla/layout/generic/nsContainerFrame.cpp:257
17 xul.dll nsBoxFrame::RemoveFrame mozilla/layout/xul/base/src/nsBoxFrame.cpp:1025
18 xul.dll nsRootBoxFrame::RemoveFrame mozilla/layout/xul/base/src/nsRootBoxFrame.cpp:212
19 xul.dll nsCSSFrameConstructor::ContentRemoved mozilla/layout/base/nsCSSFrameConstructor.cpp:9644
20 xul.dll PresShell::ContentRemoved mozilla/layout/base/nsPresShell.cpp:4724
etc...
Updated•16 years ago
|
Flags: wanted1.9.0.x?
Flags: blocking1.9.0.11?
Reporter | ||
Updated•16 years ago
|
Summary: Crash [@ nsRootBoxFrame::RemoveFrame] with popup, menuitem and boy onload removing window → Crash [@ nsRootBoxFrame::RemoveFrame] with popup, menuitem and body onload removing window
Reporter | ||
Updated•16 years ago
|
Flags: blocking1.9.2?
Updated•16 years ago
|
Flags: wanted1.9.1.x+
Flags: wanted1.9.0.x?
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.12?
Whiteboard: [sg:critical?]
Comment 1•16 years ago
|
||
dbaron: Is this your realm? If not, can you assign to the appropriate person?
Flags: blocking1.9.2? → wanted1.9.2+
Comment 2•15 years ago
|
||
On Mac trunk this is a null deref crash. Different code path than Windows, or has trunk just changed since this bug was filed?
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
0x1114ca78 in nsFrame::Destroy (this=0x1616580) at /build/m-c/src/layout/generic/nsFrame.cpp:441
441 shell->NotifyDestroyingFrame(this);
(gdb) p shell
$1 = (class nsIPresShell *) 0x0
(In reply to comment #1)
> dbaron: Is this your realm? If not, can you assign to the appropriate person?
Hmmm... just assigning bugs to me doesn't work if I don't see the bugmail (which I often don't).
In any case, I can reproduce the crash (although it's a different stack), but it doesn't crash under valgrind.
(In reply to comment #1)
> dbaron: Is this your realm? If not, can you assign to the appropriate person?
To answer the question: not really, but I'm not sure who's more appropriate.
In any case, this looks like a null dereference in both of Martijn's stacks (I actually see the latter one on trunk now; the first one might have been something that was on trunk for a bit and then fixed), so I'm not sure why it's security-sensitive.
It's also pretty timing sensitive and thus hard to debug. But it seems like the problem is that the pres context in question has gotten its SetShell(null) call, so the mShell that we try to dereference is null.
Comment 5•15 years ago
|
||
Group: core-security
Whiteboard: [sg:critical?] → [sg:dos] null deref
Updated•15 years ago
|
Attachment #373323 -
Attachment mime type: application/zip → application/java-archive
Comment 6•15 years ago
|
||
re-hiding. The 3.0.x crash might be a different bug, but the testcase in this bug exhibits a potentially critical crash on that branch.
Group: core-security
Whiteboard: [sg:dos] null deref → [sg:critical?] deleted mem in 3.0.x, null deref in newer branches?
Updated•15 years ago
|
Summary: Crash [@ nsRootBoxFrame::RemoveFrame] with popup, menuitem and body onload removing window → Crash [@ nsRootBoxFrame::RemoveFrame][@ nsFrame::Destroy] with popup, menuitem and body onload removing window
Comment 7•15 years ago
|
||
To expand upon comment 6, shell is a deleted object (0xdddddddd in a windows debug build) when we call
shell->NotifyDestroyingFrame(this);
In an opt build it may be a null in this testcase, but that doesn't mean a different testcase couldn't get a more useful value there.
In 3.5.6pre I'm getting a slightly different stack than in comment 0 but like comment 0 its still trying to execute 0x0 which is scarier than merely dereferencing it.
bp-f96a19bc-2d64-4c0d-aad3-9a8d02091112
Updated•15 years ago
|
Summary: Crash [@ nsRootBoxFrame::RemoveFrame][@ nsFrame::Destroy] with popup, menuitem and body onload removing window → Crash [@ nsRootBoxFrame::RemoveFrame][@ nsFrame::Destroy][@@0x0 | nsFrameList::DestroyFrames()] with popup, menuitem and body onload removing window
Updated•15 years ago
|
Whiteboard: [sg:critical?] deleted mem in 3.0.x, null deref in newer branches? → [sg:critical?] deleted mem in 3.0.x, executing @0x0 in newer branches
Comment 8•15 years ago
|
||
In a windows 3.5.6pre debug build I get the same stack and deleted shell as 3.0.x
blocking1.9.1: --- → ?
Flags: blocking1.9.0.17?
Whiteboard: [sg:critical?] deleted mem in 3.0.x, executing @0x0 in newer branches → [sg:critical?] deleted shell obj.
Comment 9•15 years ago
|
||
The menuitem in crash1.xul with an image gets an nsImageBoxFrame, it sets up a nsImageBoxFrameEvent to run when it is loaded that dispatches the load event. This calls the onload handler for the popup, we get a ContentRemoved on the root element and we start destroying frames. We get to nsImageBoxFrame::Destroy, which calls CancelAndForgetObserver on its imgImageRequest, this is the last thing blocking the load of the document, so it triggers the document's onload handler. So we remove the subdocument frame from the parent document and this nulls out the presshell on the subdocument's prescontext. When we get back to nsImageBoxFrame::Destroy it eventually calls nsFrame::Destroy where our presshell is null.
(In reply to comment #9)
> The menuitem in crash1.xul with an image gets an nsImageBoxFrame, it sets up a
> nsImageBoxFrameEvent to run when it is loaded that dispatches the load event.
> This calls the onload handler for the popup, we get a ContentRemoved on the
> root element and we start destroying frames. We get to
> nsImageBoxFrame::Destroy, which calls CancelAndForgetObserver on its
> imgImageRequest, this is the last thing blocking the load of the document, so
> it triggers the document's onload handler.
Clearly something between nsImageBoxFrame and the triggering of onload should be asynchronous so onload doesn't fire during frame destruction.
Comment 11•15 years ago
|
||
Changing the CancelAndForgetObserver call in nsImageBoxFrame::Destroy to a Cancel call and moving the nulling out of the frame of mListener to before the Cancel call would seem like the right way to fix this. Or am I missing something?
Potentially every other call of CancelAndForgetObserver could have this same problem.
(In reply to comment #11)
> Changing the CancelAndForgetObserver call in nsImageBoxFrame::Destroy to a
> Cancel call and moving the nulling out of the frame of mListener to before the
> Cancel call would seem like the right way to fix this. Or am I missing
> something?
Sounds good to me. I don't think that nulling out needs to be moved, though.
> Potentially every other call of CancelAndForgetObserver could have this same
> problem.
Maybe. I'm not sure why CancelAndForgetObserver would want to cancel synchronously, but then I don't know this code at all.
Comment 13•15 years ago
|
||
The nulling out of the frame is because I think we can still get notifications sent to the listener object.
Oh, you mean synchronously during the Cancel() call? OK.
Comment 15•15 years ago
|
||
No, actually, asynchronously from the Cancel call.
If it's asynchronous then we don't need to move the nulling out, it's fine for it to be where it is after the Cancel() call.
Comment 17•15 years ago
|
||
I guess you're right if we don't have to worry about synchronous notifications.
![]() |
Assignee | |
Comment 18•15 years ago
|
||
> Or am I missing something?
I think you are, and the something is that imagelib holds a raw pointer weak ref to mListener, which means that once nsImageBoxFrame::~nsImageBoxFrame happens it'll be holding a pointer to a dead object and then will proceed to asynchronously call virtual methods on it... We should really just fix that with some cycle collection smarts in imagelib, imho.
But given that, you really do need to CancelAndForgetObserver(). The fact that this ends up synchronously removing from the loadgroup is really bad; it needs to be guaranteeing that doesn't happen. Joe, Bobby, can we fix that?
Updated•15 years ago
|
blocking1.9.1: ? → .7+
Flags: blocking1.9.0.17? → blocking1.9.0.17+
Whiteboard: [sg:critical?] deleted shell obj. → [sg:critical?][3.6.x] deleted shell obj.
bzbarsky said he'd be able to get to this in a week or two, I think, and he's a much better owner for it than I am.
Assignee: dbaron → bzbarsky
Updated•15 years ago
|
blocking1.9.1: .8+ → needed
Updated•15 years ago
|
Flags: blocking1.9.0.18+ → blocking1.9.0.19+
![]() |
Assignee | |
Comment 20•15 years ago
|
||
Moving over to the right component. There's no sane way to fix this in layout; it needs to happen in imagelib.
Component: Layout → ImageLib
QA Contact: layout → imagelib
![]() |
Assignee | |
Comment 21•15 years ago
|
||
This is about minimal. It fixes the crash I see on Mac. Martijn, can you try it on Windows?
Attachment #429049 -
Flags: review?(joe)
Comment 22•15 years ago
|
||
(In reply to comment #18)
> But given that, you really do need to CancelAndForgetObserver(). The fact that
> this ends up synchronously removing from the loadgroup is really bad; it needs
> to be guaranteeing that doesn't happen. Joe, Bobby, can we fix that?
Sorry I somehow missed this before.
As you proved, we *can* fix it. We have the technology. My question is: why should removing from the load group be asynchronous when everything else in that method is synchronous, by design?
More broadly stated: how in the world can we get this *right*, aside from hooking imagelib up to the cycle collector?
![]() |
Assignee | |
Comment 23•15 years ago
|
||
> why should removing from the load group be asynchronous when everything else in
> that method is synchronous, by design?
The main thing that needs to be synchronous is setting the observer to null (since it's about to die). The rest of the synchronicity is due to not wanting to break imagelib api, and therefore needing to deliver notifications before nulling out the observer. A key part here is that the observer is what receives these notifications and thus has control over how they're handled.
The loadgroup removal is NOT part of the imagelib API; it's part of the necko API. And necko defines that Cancel() on an nsIRequest is asynchronous. More importantly, though, the loadgroup removal can trigger arbitrary code to run that is not under the control of the observer, while we know for a fact that use of the CancelAndForgetObserver API means the observer is not in a good state (in the middle of a destructor, for one thing!).
> More broadly stated: how in the world can we get this *right*,
Hooking up to cycle collection is one approach, as you say. The other approach, which I think might be even better, is switching imagelib to an API more similar to Necko's, where the image would not keep a pointer to its observer for longer than absolutely necessary. In particular, if animations were pulled by consumers, not pushed by imagelib, we would not need long-term references to observers. All this modulo multipart jpegs, which we would need to do something with, of course, since right now they're exposed in imagelib API as multiple separate image loads, not as animations, right?
Comment 24•15 years ago
|
||
(In reply to comment #23)
> The loadgroup removal is NOT part of the imagelib API; it's part of the necko
> API. And necko defines that Cancel() on an nsIRequest is asynchronous. More
> importantly, though, the loadgroup removal can trigger arbitrary code to run
> that is not under the control of the observer, while we know for a fact that
> use of the CancelAndForgetObserver API means the observer is not in a good
> state (in the middle of a destructor, for one thing!).
OK, I totally buy this!
> if animations
> were pulled by consumers, not pushed by imagelib, we would not need long-term
> references to observers.
This is probably possible. Because images don't change size, no reflow would be necessary; nsRefreshDriver could tell layout to redraw, and then layout could just get imagelib to draw the current frame.
> All this modulo multipart jpegs, which we would need
> to do something with, of course, since right now they're exposed in imagelib
> API as multiple separate image loads, not as animations, right?
Right. Which is a bit silly, but before nsRefreshDriver there was really no way to force redraws, I guess.
Reporter | ||
Comment 25•15 years ago
|
||
(In reply to comment #21)
> Created an attachment (id=429049) [details]
> Proposed fix
>
> This is about minimal. It fixes the crash I see on Mac. Martijn, can you try
> it on Windows?
Yes, that seems to fix it, thanks.
Comment 26•15 years ago
|
||
Joe: can you expedite review for this blocking bug?
Whiteboard: [sg:critical?][3.6.x] deleted shell obj. → [sg:critical?][3.6.x] deleted shell obj.[needs review jdrew]
Updated•15 years ago
|
Attachment #429049 -
Flags: superreview?(vladimir)
Attachment #429049 -
Flags: review?(joe)
Attachment #429049 -
Flags: review+
Comment 27•15 years ago
|
||
Vlad, if you can't review this before Sunday, just switch the review request to roc - he can do it on Monday, aka North America's Sunday. :)
Attachment #429049 -
Flags: superreview?(vladimir) → superreview+
Comment 28•15 years ago
|
||
This doesn't need to be fixed in 1.9.0.19 unless it's also fixed in 1.9.1.9. Since it hasn't landed on trunk yet, that's unlikely, so no longer blocking for 1.9.0.19.
Flags: blocking1.9.0.19+ → blocking1.9.0.19-
![]() |
Assignee | |
Comment 29•15 years ago
|
||
Gah. The only reason this hasn't landed on trunk yet is because I was waiting on vlad's review and of course I got no notification that it had happened... I'm landing it on trunk right now.
![]() |
Assignee | |
Comment 30•15 years ago
|
||
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
![]() |
Assignee | |
Comment 31•15 years ago
|
||
That said, on 1.9.0 this looks like it needs a totally different fix, since CancelAndForgetRequest doesn't exist there. Joe?
![]() |
Assignee | |
Comment 32•15 years ago
|
||
Requesting approvals for what I think are the relevant branches...
Attachment #431158 -
Flags: approval1.9.2.3?
Attachment #431158 -
Flags: approval1.9.2.2?
Attachment #431158 -
Flags: approval1.9.1.9?
Attachment #431158 -
Flags: approval1.9.1.10?
![]() |
Assignee | |
Updated•15 years ago
|
blocking1.9.2: --- → ?
status1.9.2:
--- → ?
Comment 33•15 years ago
|
||
I'm going to call this status1.9.0-wontfix, because fixing it will introduce more risk than not fixing it (since it would involve backporting all the Cancel*() fixes that bz and I have cooked up over the past two years).
Updated•15 years ago
|
Whiteboard: [sg:critical?][3.6.x] deleted shell obj.[needs review jdrew] → [sg:critical?][3.6.x] deleted shell obj.
Updated•15 years ago
|
Attachment #431158 -
Flags: approval1.9.2.3?
Attachment #431158 -
Flags: approval1.9.2.2?
Attachment #431158 -
Flags: approval1.9.2.2+
Attachment #431158 -
Flags: approval1.9.1.9?
Attachment #431158 -
Flags: approval1.9.1.9+
Attachment #431158 -
Flags: approval1.9.1.10?
![]() |
Assignee | |
Comment 35•15 years ago
|
||
Comment 36•15 years ago
|
||
Verified for 1.9.1.9 using attached testcase and Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.9pre) Gecko/20100311 Shiretoko/3.5.9pre (.NET CLR 3.5.30729). It no longer crashes as it does on 1.9.1.8.
Verified for 1.9.2 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.2pre) Gecko/20100311 Namoroka/3.6.2pre (.NET CLR 3.5.30729).
Keywords: verified1.9.1,
verified1.9.2
Updated•15 years ago
|
Group: core-security
Updated•14 years ago
|
Crash Signature: [@ nsRootBoxFrame::RemoveFrame]
[@ nsFrame::Destroy]
[@@0x0 | nsFrameList::DestroyFrames()]
You need to log in
before you can comment on or make changes to this bug.
Description
•