Crash [@ nsXULDocument::RemoveElement] with DOMAttrmodified, loadOverlay and removeelement

RESOLVED FIXED in mozilla26

Status

()

--
critical
RESOLVED FIXED
9 years ago
5 years ago

People

(Reporter: martijn.martijn, Assigned: Gijs)

Tracking

(Blocks: 1 bug, {crash, sec-other, testcase})

Trunk
mozilla26
x86
Windows XP
crash, sec-other, testcase
Points:
---
Bug Flags:
wanted1.9.2 +
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [sg:nse null deref], crash signature, URL)

Attachments

(4 attachments, 2 obsolete attachments)

(Reporter)

Description

9 years ago
Created attachment 393780 [details]
zipped up testcase

See testcase, which crashes current trunk build and Firefox3.5.2 on load.

http://crash-stats.mozilla.com/report/index/48af9333-8637-4022-b710-861ed2090811?p=1
0  	xul.dll  	nsXULDocument::RemoveElement  	 content/xul/document/src/nsXULDocument.cpp:4501
1 	xul.dll 	xul.dll@0x3ff5dc
(Reporter)

Updated

9 years ago
Flags: blocking1.9.2?
Created attachment 399479 [details]
crash stack

It's a null dereference.
Flags: blocking1.9.2? → wanted1.9.2+

Updated

9 years ago
Group: core-security
Whiteboard: [sg:nse null deref]
Crash Signature: [@ nsXULDocument::RemoveElement]
(Assignee)

Comment 4

6 years ago
Created attachment 756995 [details] [diff] [review]
Patch with crashtest

AFAICT this is the only consumer of this function anyway, so this should do. It turns out just trying to remove the top-level element of a XUL document will get you this error, so the crashtest was easy to make... I verified locally that this failed before and passed after the nullcheck.
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Attachment #756995 - Flags: review?(dbaron)
Comment on attachment 756995 [details] [diff] [review]
Patch with crashtest

Review of attachment 756995 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/xul/document/src/XULDocument.cpp
@@ +4044,5 @@
>          // so remove it from the actual document.
>          if (attr == nsGkAtoms::removeelement &&
>              value.EqualsLiteral("true")) {
>  
>              nsCOMPtr<nsIContent> parent = aTargetNode->GetParent();

Or maybe this should be GetParentNode()?
(Assignee)

Comment 6

6 years ago
(In reply to :Ms2ger from comment #5)
> Comment on attachment 756995 [details] [diff] [review]
> Patch with crashtest
> 
> Review of attachment 756995 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: content/xul/document/src/XULDocument.cpp
> @@ +4044,5 @@
> >          // so remove it from the actual document.
> >          if (attr == nsGkAtoms::removeelement &&
> >              value.EqualsLiteral("true")) {
> >  
> >              nsCOMPtr<nsIContent> parent = aTargetNode->GetParent();
> 
> Or maybe this should be GetParentNode()?

I don't actually know much about this code. For my understanding, can you explain why this would be better than what the code (and/or patch) does right now?
nsINode is a base class of nsIContent.  It's also a base class of nsIDocument.  So if you:

change nsXULDocument::RemoveElement so its first parameter is nsINode* instead of nsIContent* (probably also change InsertElement), and then change this code to call GetParentNode() (which will return the document when the parent is the document instead of returning null), you might be able to make this case use the normal codepath.

I also don't particularly feel like I'm the best reviewer here, though I'm also not sure who's better.
Comment on attachment 756995 [details] [diff] [review]
Patch with crashtest

Marking review- because I think you should at least try the other approach.  (There might be something wrong with it, though, in which case this patch might be ok.)

You should probably also find a DOM peer to review the patch rather than me.
Attachment #756995 - Flags: review?(dbaron) → review-
(Assignee)

Comment 9

6 years ago
Created attachment 757306 [details] [diff] [review]
Patch v2

(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #8)
> Comment on attachment 756995 [details] [diff] [review]
> Patch with crashtest
> 
> Marking review- because I think you should at least try the other approach. 
> (There might be something wrong with it, though, in which case this patch
> might be ok.)
> 
> You should probably also find a DOM peer to review the patch rather than me.

OK. The testcase that Martijn attached actually doesn't apply removeelement to the toplevel node; instead it does some funny business with onDOMAttrModified and repeated overlay loading, leading to the element that was to be removed to no longer have a parentNode by the time the overlay is applied (a second time). I figured the testcase I added (just trying to remove the top level element, which was what someone on IRC ran into, which led me to find this bug) was simpler and just as effective, and needed the same fix.

However, you made a very good point of saying that for my (somewhat simpler) case, another fix would be more appropriate, so I went back and modified RemoveElement and InsertElement as you described (although I left nsIContent* as the type for the child in the InsertElement case, as we call GetAttr on it, and that's not available on nsINode, AFAICT?). This passes the crashtest in my original patch, but not the one in Martijn's original test (which I've now added as a second crashtest).

So I then added my nullcheck back. I left the refactoring as well as I presumed this would still be desirable. If that needs to be reverted, let me know. Setting review? to bz this time. :-)
Attachment #756995 - Attachment is obsolete: true
Attachment #757306 - Flags: review?(bzbarsky)
(Assignee)

Comment 10

6 years ago
Created attachment 757309 [details] [diff] [review]
Patch v2.1

Of course, it sorta helps if I attach the right patch...
Attachment #757306 - Attachment is obsolete: true
Attachment #757306 - Flags: review?(bzbarsky)
Attachment #757309 - Flags: review?(bzbarsky)
Comment on attachment 757309 [details] [diff] [review]
Patch v2.1

Seems reasonable.  r=me
Attachment #757309 - Flags: review?(bzbarsky) → review+
(Assignee)

Comment 14

6 years ago
So these assertions all look the same, and also happen if you have a more innocuous <script> document.loadOverlay()</script> loaded overlay. The top of the stack looks like this (from https://tbpl.mozilla.org/php/getParsedLog.php?id=23717465&full=1&branch=mozilla-inbound ):

###!!! ASSERTION: forward references have already been resolved: 'Error', file ../../../../../content/xul/document/src/XULDocument.cpp, line 1152
mozilla::dom::XULDocument::AddForwardReference(nsForwardReference*) [content/xul/document/src/XULDocument.cpp:1153]
mozilla::dom::XULDocument::CreateOverlayElement(nsXULPrototypeElement*, mozilla::dom::Element**) [content/xul/document/src/XULDocument.cpp:3754]
mozilla::dom::XULDocument::ResumeWalk() [content/xul/document/src/XULDocument.cpp:3010]
mozilla::dom::XULDocument::EndLoad() [content/xul/document/src/XULDocument.cpp:587]
XULContentSinkImpl::DidBuildModel(bool) [obj-firefox/dist/include/nsCOMPtr.h:1211]
nsParser::DidBuildModel(tag_nsresult) [parser/htmlparser/src/nsParser.cpp:964]
nsParser::ResumeParse(bool, bool, bool) [parser/htmlparser/src/nsParser.cpp:1569]
nsParser::OnStopRequest(nsIRequest*, nsISupports*, tag_nsresult) [parser/htmlparser/src/nsParser.cpp:1958]
nsBaseChannel::OnStopRequest(nsIRequest*, nsISupports*, tag_nsresult) [netwerk/base/src/nsBaseChannel.cpp:737]
nsInputStreamPump::OnStateStop() [obj-firefox/dist/include/nsCOMPtr.h:1211]
nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) [netwerk/base/src/nsInputStreamPump.cpp:376]
nsInputStreamReadyEvent::Run() [xpcom/io/nsStreamUtils.cpp:82]
nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:627]

The actual assert is here:

http://mxr.mozilla.org/mozilla-central/source/content/xul/document/src/XULDocument.cpp?mark=1152#1142

IMHO, we shouldn't NS_ERROR in this case. The delete still happens (and presumably should?), and avoiding the NS_ERROR case would mean teaching ResumeWalk about forward reference phases, which sounds like code duplication that we should avoid, especially in a method that is already pretty long (to the point where, at least for this crash fix, I gave up trying to understand how exactly it all works).

OTOH, perhaps we think loadOverlay before the document has finished loading is really evil... In which case we should probably do something more elaborate, but that sounds like it's beyond the scope of this bug.

bz, do you think it makes sense to nuke the NS_ERROR here? Do a warning instead (what is the correct macro in that case)? Or am I missing something? (totally possible, I'm hardly at home in this code...)
Flags: needinfo?(bzbarsky)
I can't claim to be at home in this code either...  Is the NS_ERROR because the forward reference will get ignored because it's being added too late?

Neil?
Flags: needinfo?(bzbarsky) → needinfo?(enndeakin)

Comment 16

5 years ago
What's happening here is that:

- The main document loads and calls loadOverlay, starting an overlay load
- The main document finishes loading and calls ResolveForwardReferences which sets mResolutionPhase to 'done'
- The overlay loads, calls LoadOverlay again and tries to add it but fails because mResolutionPhase is set to 'done'

I don't know what this code should be doing though. It seems however that resolving references shouldn't be done until, or should be done again when the overlays have loaded.
Flags: needinfo?(enndeakin)
(Assignee)

Comment 17

5 years ago
(In reply to Neil Deakin from comment #16)
> What's happening here is that:
> 
> - The main document loads and calls loadOverlay, starting an overlay load
> - The main document finishes loading and calls ResolveForwardReferences
> which sets mResolutionPhase to 'done'
> - The overlay loads, calls LoadOverlay again and tries to add it but fails
> because mResolutionPhase is set to 'done'
> 
> I don't know what this code should be doing though. It seems however that
> resolving references shouldn't be done until, or should be done again when
> the overlays have loaded.

Neil, would you be OK with checking this in with the assertions marked as expected, and doing work to fix this issue with the overlays in a separate bug? I think we should probably not hold back the crash fix in order to fix existing assertions.
Flags: needinfo?(enndeakin)
(Assignee)

Comment 18

5 years ago
(In reply to :Gijs Kruitbosch from comment #17)
> Neil, would you be OK with checking this in with the assertions marked as
> expected, and doing work to fix this issue with the overlays in a separate
> bug? I think we should probably not hold back the crash fix in order to fix
> existing assertions.

Caught Neil on IRC, he said this would be OK. Stupidly enough, this bug doesn't have a record of how many assertions fired... Locally on my OS X box, I see 3. I've fired off a try run on Linux to see what happens there: https://tbpl.mozilla.org/?tree=Try&rev=df4eaede0a6c
Flags: needinfo?(enndeakin)
Flags: in-testsuite?
Flags: in-testsuite+
(Assignee)

Updated

5 years ago
Blocks: 909819
(Assignee)

Comment 19

5 years ago
(In reply to :Gijs Kruitbosch from comment #18)
> Locally on my OS X
> box, I see 3. I've fired off a try run on Linux to see what happens there:
> https://tbpl.mozilla.org/?tree=Try&rev=df4eaede0a6c

Those gave 3, and my local Windows test also had 3. Noted 3 asserts and landed as http://hg.mozilla.org/integration/mozilla-inbound/rev/e7aed8a5aff6
Flags: in-testsuite? → in-testsuite+
https://hg.mozilla.org/mozilla-central/rev/e7aed8a5aff6
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
You need to log in before you can comment on or make changes to this bug.