Closed Bug 509719 Opened 16 years ago Closed 12 years ago

Crash [@ nsXULDocument::RemoveElement] with DOMAttrmodified, loadOverlay and removeelement

Categories

(Core :: XUL, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla26

People

(Reporter: martijn.martijn, Assigned: Gijs)

References

()

Details

(Keywords: crash, sec-other, testcase, Whiteboard: [sg:nse null deref])

Crash Data

Attachments

(4 files, 2 obsolete files)

Attached file zipped up testcase
See testcase, which crashes current trunk build and Firefox3.5.2 on load. http://crash-stats.mozilla.com/report/index/48af9333-8637-4022-b710-861ed2090811?p=1 0 xul.dll nsXULDocument::RemoveElement content/xul/document/src/nsXULDocument.cpp:4501 1 xul.dll xul.dll@0x3ff5dc
Flags: blocking1.9.2?
Attached file crash stack
It's a null dereference.
Flags: blocking1.9.2? → wanted1.9.2+
Group: core-security
Whiteboard: [sg:nse null deref]
Crash Signature: [@ nsXULDocument::RemoveElement]
Attached patch Patch with crashtest (obsolete) — Splinter Review
AFAICT this is the only consumer of this function anyway, so this should do. It turns out just trying to remove the top-level element of a XUL document will get you this error, so the crashtest was easy to make... I verified locally that this failed before and passed after the nullcheck.
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Attachment #756995 - Flags: review?(dbaron)
Comment on attachment 756995 [details] [diff] [review] Patch with crashtest Review of attachment 756995 [details] [diff] [review]: ----------------------------------------------------------------- ::: content/xul/document/src/XULDocument.cpp @@ +4044,5 @@ > // so remove it from the actual document. > if (attr == nsGkAtoms::removeelement && > value.EqualsLiteral("true")) { > > nsCOMPtr<nsIContent> parent = aTargetNode->GetParent(); Or maybe this should be GetParentNode()?
(In reply to :Ms2ger from comment #5) > Comment on attachment 756995 [details] [diff] [review] > Patch with crashtest > > Review of attachment 756995 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: content/xul/document/src/XULDocument.cpp > @@ +4044,5 @@ > > // so remove it from the actual document. > > if (attr == nsGkAtoms::removeelement && > > value.EqualsLiteral("true")) { > > > > nsCOMPtr<nsIContent> parent = aTargetNode->GetParent(); > > Or maybe this should be GetParentNode()? I don't actually know much about this code. For my understanding, can you explain why this would be better than what the code (and/or patch) does right now?
nsINode is a base class of nsIContent. It's also a base class of nsIDocument. So if you: change nsXULDocument::RemoveElement so its first parameter is nsINode* instead of nsIContent* (probably also change InsertElement), and then change this code to call GetParentNode() (which will return the document when the parent is the document instead of returning null), you might be able to make this case use the normal codepath. I also don't particularly feel like I'm the best reviewer here, though I'm also not sure who's better.
Comment on attachment 756995 [details] [diff] [review] Patch with crashtest Marking review- because I think you should at least try the other approach. (There might be something wrong with it, though, in which case this patch might be ok.) You should probably also find a DOM peer to review the patch rather than me.
Attachment #756995 - Flags: review?(dbaron) → review-
Attached patch Patch v2 (obsolete) — Splinter Review
(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #8) > Comment on attachment 756995 [details] [diff] [review] > Patch with crashtest > > Marking review- because I think you should at least try the other approach. > (There might be something wrong with it, though, in which case this patch > might be ok.) > > You should probably also find a DOM peer to review the patch rather than me. OK. The testcase that Martijn attached actually doesn't apply removeelement to the toplevel node; instead it does some funny business with onDOMAttrModified and repeated overlay loading, leading to the element that was to be removed to no longer have a parentNode by the time the overlay is applied (a second time). I figured the testcase I added (just trying to remove the top level element, which was what someone on IRC ran into, which led me to find this bug) was simpler and just as effective, and needed the same fix. However, you made a very good point of saying that for my (somewhat simpler) case, another fix would be more appropriate, so I went back and modified RemoveElement and InsertElement as you described (although I left nsIContent* as the type for the child in the InsertElement case, as we call GetAttr on it, and that's not available on nsINode, AFAICT?). This passes the crashtest in my original patch, but not the one in Martijn's original test (which I've now added as a second crashtest). So I then added my nullcheck back. I left the refactoring as well as I presumed this would still be desirable. If that needs to be reverted, let me know. Setting review? to bz this time. :-)
Attachment #756995 - Attachment is obsolete: true
Attachment #757306 - Flags: review?(bzbarsky)
Attached patch Patch v2.1Splinter Review
Of course, it sorta helps if I attach the right patch...
Attachment #757306 - Attachment is obsolete: true
Attachment #757306 - Flags: review?(bzbarsky)
Attachment #757309 - Flags: review?(bzbarsky)
Comment on attachment 757309 [details] [diff] [review] Patch v2.1 Seems reasonable. r=me
Attachment #757309 - Flags: review?(bzbarsky) → review+
So these assertions all look the same, and also happen if you have a more innocuous <script> document.loadOverlay()</script> loaded overlay. The top of the stack looks like this (from https://tbpl.mozilla.org/php/getParsedLog.php?id=23717465&full=1&branch=mozilla-inbound ): ###!!! ASSERTION: forward references have already been resolved: 'Error', file ../../../../../content/xul/document/src/XULDocument.cpp, line 1152 mozilla::dom::XULDocument::AddForwardReference(nsForwardReference*) [content/xul/document/src/XULDocument.cpp:1153] mozilla::dom::XULDocument::CreateOverlayElement(nsXULPrototypeElement*, mozilla::dom::Element**) [content/xul/document/src/XULDocument.cpp:3754] mozilla::dom::XULDocument::ResumeWalk() [content/xul/document/src/XULDocument.cpp:3010] mozilla::dom::XULDocument::EndLoad() [content/xul/document/src/XULDocument.cpp:587] XULContentSinkImpl::DidBuildModel(bool) [obj-firefox/dist/include/nsCOMPtr.h:1211] nsParser::DidBuildModel(tag_nsresult) [parser/htmlparser/src/nsParser.cpp:964] nsParser::ResumeParse(bool, bool, bool) [parser/htmlparser/src/nsParser.cpp:1569] nsParser::OnStopRequest(nsIRequest*, nsISupports*, tag_nsresult) [parser/htmlparser/src/nsParser.cpp:1958] nsBaseChannel::OnStopRequest(nsIRequest*, nsISupports*, tag_nsresult) [netwerk/base/src/nsBaseChannel.cpp:737] nsInputStreamPump::OnStateStop() [obj-firefox/dist/include/nsCOMPtr.h:1211] nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) [netwerk/base/src/nsInputStreamPump.cpp:376] nsInputStreamReadyEvent::Run() [xpcom/io/nsStreamUtils.cpp:82] nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:627] The actual assert is here: http://mxr.mozilla.org/mozilla-central/source/content/xul/document/src/XULDocument.cpp?mark=1152#1142 IMHO, we shouldn't NS_ERROR in this case. The delete still happens (and presumably should?), and avoiding the NS_ERROR case would mean teaching ResumeWalk about forward reference phases, which sounds like code duplication that we should avoid, especially in a method that is already pretty long (to the point where, at least for this crash fix, I gave up trying to understand how exactly it all works). OTOH, perhaps we think loadOverlay before the document has finished loading is really evil... In which case we should probably do something more elaborate, but that sounds like it's beyond the scope of this bug. bz, do you think it makes sense to nuke the NS_ERROR here? Do a warning instead (what is the correct macro in that case)? Or am I missing something? (totally possible, I'm hardly at home in this code...)
Flags: needinfo?(bzbarsky)
I can't claim to be at home in this code either... Is the NS_ERROR because the forward reference will get ignored because it's being added too late? Neil?
Flags: needinfo?(bzbarsky) → needinfo?(enndeakin)
What's happening here is that: - The main document loads and calls loadOverlay, starting an overlay load - The main document finishes loading and calls ResolveForwardReferences which sets mResolutionPhase to 'done' - The overlay loads, calls LoadOverlay again and tries to add it but fails because mResolutionPhase is set to 'done' I don't know what this code should be doing though. It seems however that resolving references shouldn't be done until, or should be done again when the overlays have loaded.
Flags: needinfo?(enndeakin)
(In reply to Neil Deakin from comment #16) > What's happening here is that: > > - The main document loads and calls loadOverlay, starting an overlay load > - The main document finishes loading and calls ResolveForwardReferences > which sets mResolutionPhase to 'done' > - The overlay loads, calls LoadOverlay again and tries to add it but fails > because mResolutionPhase is set to 'done' > > I don't know what this code should be doing though. It seems however that > resolving references shouldn't be done until, or should be done again when > the overlays have loaded. Neil, would you be OK with checking this in with the assertions marked as expected, and doing work to fix this issue with the overlays in a separate bug? I think we should probably not hold back the crash fix in order to fix existing assertions.
Flags: needinfo?(enndeakin)
(In reply to :Gijs Kruitbosch from comment #17) > Neil, would you be OK with checking this in with the assertions marked as > expected, and doing work to fix this issue with the overlays in a separate > bug? I think we should probably not hold back the crash fix in order to fix > existing assertions. Caught Neil on IRC, he said this would be OK. Stupidly enough, this bug doesn't have a record of how many assertions fired... Locally on my OS X box, I see 3. I've fired off a try run on Linux to see what happens there: https://tbpl.mozilla.org/?tree=Try&rev=df4eaede0a6c
Flags: needinfo?(enndeakin)
Flags: in-testsuite?
Flags: in-testsuite+
Blocks: 909819
(In reply to :Gijs Kruitbosch from comment #18) > Locally on my OS X > box, I see 3. I've fired off a try run on Linux to see what happens there: > https://tbpl.mozilla.org/?tree=Try&rev=df4eaede0a6c Those gave 3, and my local Windows test also had 3. Noted 3 asserts and landed as http://hg.mozilla.org/integration/mozilla-inbound/rev/e7aed8a5aff6
Flags: in-testsuite? → in-testsuite+
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: