Closed Bug 284330 Opened 20 years ago Closed 20 years ago

Trunk crash [@ nsXULDocument::OverlayForwardReference::Merge] (remove and replace broken in XUL overlays)

Categories

(Core :: XUL, defect, P1)

x86
All
defect

Tracking

()

VERIFIED FIXED
mozilla1.8beta2

People

(Reporter: mnyromyr, Assigned: bugs)

References

Details

(Keywords: crash, regression, topcrash+)

Crash Data

Attachments

(1 file, 1 obsolete file)

[already 18 similar reports in SM/TB/FF since 2005-02-24: <http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=1&searchby=stacksig&match=contains&searchfor=nsXULDocument%3A%3AOverlayForwardReference%3A%3AMerge&vendor=All&product=All&platform=All&buildid=&sdate=&stime=&edate=&etime=&sortby=bbid>] Mozilla 1.8b2 [Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8b2) Gecko/20050228] crashes when started by "mozilla -mail", e.g. with the Mnenhy extension is installed. I could debug the crash up to this piece of code where an <overlay> child is applied to the document tree [http://lxr.mozilla.org/mozilla/source/content/xul/document/src/nsXULDocument.cpp##3963]: 3963 nsCOMPtr<nsIDOMNode> nodeParent; 3964 rv = nodeInDocument->GetParentNode(getter_AddRefs(nodeParent)); 3965 if (NS_FAILED(rv)) return rv; 3966 nsCOMPtr<nsIDOMElement> elementParent(do_QueryInterface(nodeParent)); 3967 3968 nsAutoString parentID; 3969 elementParent->GetAttribute(NS_LITERAL_STRING("id"), parentID); GetParentNode in 3964 sets nodeParent to null, but does not set rv into an error state. QIing doesn't help, ergo BOOM in 3969... JFTR: In the case of Mnenhy, the triggering XUL code is this [http://www.mozdev.org/source/browse/mnenhy/src/bin/chrome/mnenhy/content/mnenhy/headers/mnenhy-headers-msgHdrViewOverlay.xul?rev=1.9&content-type=text/x-cvsweb-markup]: <overlay xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul" id ="mnenhyHeadersMsgHdrViewOverlay" > <script type="application/x-javascript" src="chrome://mnenhy/content/mnenhy-loader.js"/> <script type="application/x-javascript" src="chrome://mnenhy-headers/content/mnenhy-headers-msgHdrViewOverlay-loader.js"/> <script type="application/x-javascript"> <![CDATA[ [... skipped a formerly valid skript ...] ]]> </script> <!-- expanded view --> <hbox id="expandedHeaderView"> [... skipped more formerly valid XUL ...] </hbox> </overlay>
Wrt the example XUL code: the crash appears when checking the hbox wit id="expandedHeaderView".
> GetParentNode in 3964 sets nodeParent to null, but does not set rv into an > error state That's the right behavior for GetParentNode. It looks like the only crash on 2005-02-24 was with a Firefox 1.0 build. All the other crashes with this stack are from builds with bug 282103 fixed... Karsten can you confirm that the crash starts when that patch is checked in? Or was it something else?
Assignee: nobody → bugs
The nightly Mozilla 1.8b2 [Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b2) Gecko/20050225; BuildID=2005022504] does *not* crash, but BuildID=2005022605 does. This period does indeed contain the checkins to bug 282103: <http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2005-02-24+05%3A00%3A00&maxdate=2005-02-25+07%3A00%3A00&cvsroot=%2Fcvsroot> Backing out attachment 175054 [details] [diff] [review] fixes the crash.
Blocks: 282103
Keywords: topcrash
Fixing summary and adding a few sets of recent Trunk Talkback data for future reference: Count Offset Real Signature [ 17 nsXULDocument::OverlayForwardReference::Merge 27de07fa - nsXULDocument::OverlayForwardReference::Merge ] Crash date range: 01-MAR-05 to 28-FEB-05 Min/Max Seconds since last crash: 1 - 4953 Min/Max Runtime: 3 - 5008 Count Platform List 13 Windows XP [Windows NT 5.1 build 2600] 4 Windows 2K [Windows NT 5.0 build 2195] Count Build Id List 7 2005022705 3 2005030305 3 2005030105 2 2005022805 1 2005030405 1 2005022504 No of Unique Users 6 Stack trace(Frame) nsXULDocument::OverlayForwardReference::Merge [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/content/xul/document/src/nsXULDocument.cpp line 3969] nsXULDocument::OverlayForwardReference::Resolve [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/content/xul/document/src/nsXULDocument.cpp line 3831] nsXULDocument::ResolveForwardReferences [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/content/xul/document/src/nsXULDocument.cpp line 1352] nsXULDocument::ResumeWalk [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/content/xul/document/src/nsXULDocument.cpp line 3155] nsXULDocument::EndLoad [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/content/xul/document/src/nsXULDocument.cpp line 742] XULContentSinkImpl::DidBuildModel [c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/content/xul/document/src/nsXULContentSink.cpp line 407] (4101645) Comments: Start MailNews after installing mnenhy-0.7.1.10005.xpi. (4045178) Comments: mnenhy + mail news (4044933) Comments: mnenhy + mail news (4035880) Comments: try to open mail/news (4035772) Comments: Crashed while try to open mail/news (4016548) Comments: -mail +mnenhy (3934753) Comments: Swtch from browser to mailnews (mnenhy installed as sole extension) ==================================================================================================== Count Offset Real Signature [ 2 nsXULDocument::OverlayForwardReference::Merge() 90c92935 - nsXULDocument::OverlayForwardReference::Merge() ] Crash date range: 01-MAR-05 to 01-MAR-05 Min/Max Seconds since last crash: 0 - 0 Min/Max Runtime: 1 - 1 Count Platform List 2 [Linux 2.6.8.1-10mdk] Count Build Id List 2 2005022801 No of Unique Users 1 Stack trace(Frame) nsXULDocument::OverlayForwardReference::Merge() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULDocument.cpp line 848] nsXULDocument::OverlayForwardReference::Resolve() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULDocument.cpp line 3831] nsXULDocument::ResolveForwardReferences() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULDocument.cpp line 1352] nsXULDocument::ResumeWalk() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULDocument.cpp line 3154] nsXULDocument::OnPrototypeLoadDone() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULDocument.cpp line 768] nsXULPrototypeDocument::NotifyLoadDone() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULPrototypeDocument.cpp line 752] nsXULDocument::EndLoad() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULDocument.cpp line 733] XULContentSinkImpl::DidBuildModel() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULContentSink.cpp line 713] nsExpatDriver::DidBuildModel() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/parser/htmlparser/src/nsExpatDriver.cpp line 713] nsParser::DidBuildModel() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/parser/htmlparser/src/nsParser.cpp line 842] nsParser::ResumeParse() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/parser/htmlparser/src/nsParser.cpp line 1994] nsParser::OnStopRequest() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/parser/htmlparser/src/nsParser.cpp line 830] nsJARChannel::OnStopRequest() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/modules/libjar/nsJARChannel.cpp line 713] nsInputStreamPump::OnStateStop() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/netwerk/base/src/nsInputStreamPump.cpp line 713] nsInputStreamPump::OnInputStreamReady() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/netwerk/base/src/nsInputStreamPump.cpp line 343] nsInputStreamReadyEvent::EventHandler() PL_HandleEvent() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/xpcom/threads/plevent.c line 698] PL_ProcessPendingEvents() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/xpcom/threads/plevent.c line 633] nsEventQueueImpl::ProcessPendingEvents() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/xpcom/threads/nsEventQueue.cpp line 417] event_processor_callback() [/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/widget/src/gtk2/nsAppShell.cpp line 71] This is a topcrasher for both Mozilla and Thunderbird Trunks. Here is a link to all current Talkback data for this crash: http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=1&searchby=stacksig&match=contains&searchfor=nsXULDocument%3A%3AOverlayForwardReference%3A%3AMerge&vendor=All&product=All&platform=All&buildid=&sdate=&stime=&edate=&etime=&sortby=bbid Looks like a regression that happened around 2/25-2/27...adding regression and zt4newcrash keywords. Ben had a checkin around the area of the crash on 2/25: 1.647 <ben@bengoodger.com> 2005-02-25 01:06 282103 - dynamic overlays: implement "loadOverlay" method on nsIDOMXULDocument which loads and merges an overlay into a XUL document. r=jst sr=bryner
Summary: Crash in nsXULDocument::OverlayForwardReference::Merge [@27de07fa] → Trunk crash [@ nsXULDocument::OverlayForwardReference::Merge]
I'm building my current nightlys applying the patch from http://www.triffids.de/pub/mozilla/patch/patch-mnenh_crash which reverts https://bugzilla.mozilla.org/attachment.cgi?id=175054. The crash is gone. Couldn't wait longer because mozilla without mnenhy is, eh, ... urks. ;)
Flags: blocking-aviary1.1?
Keywords: topcrashtopcrash+
So its seems, that we all have to wait until you're back in town... Is there nobody else 'in town' how can fix this bug?
After some more digging down, it seems that the "killing factor" here is this now broken XUL construction in the overlay: <hbox id="expandedHeaderView"> <!-- exchange the expandedHeaders vbox to avoid mismatching item types --> <vbox id="expandedHeaders" removeelement="true"/> <vbox id="expandedHeaders" persist="MnenhyState" flex="1" position="2"> ... The nodeInDocument (= the element to be removed in the actual document) in line 3964 does not have a parent... If I remove the "removeelement" attribute, no crash appears (here, but at the next "removeelement" attribute :-/)... tbc...
So, I guess I found what's causing the problem: The changes if (attr == nsXULAtoms::removeelement && value.EqualsLiteral("true")) { - rv = RemoveElement(aTargetNode->GetParent(), aTargetNode); + rv = RemoveElement(aTargetNode->GetParent(), aTargetNode, aNotify); if (NS_FAILED(rv)) return rv; and -nsXULDocument::RemoveElement(nsIContent* aParent, nsIContent* aChild) +nsXULDocument::RemoveElement(nsIContent* aParent, nsIContent* aChild, PRBool aNotify) { PRInt32 nodeOffset = aParent->IndexOf(aChild); - return aParent->RemoveChildAt(nodeOffset, PR_TRUE); + return aParent->RemoveChildAt(nodeOffset, aNotify); remove an element with 'removeelement' attribute just from the childNodes, but not from the mElementMap: aNotify is false, resulting in ContentRemoved *not* to be called by nsXULElement::RemoveChildAt [nsXULElement.cpp(1176)]: if (aNotify && doc) { doc->ContentRemoved(this, oldKid, aIndex); } Then GetElementById still finds the node in its map, but the node itself has no parent anymore ==> BOOM!
A possible solution.
Attachment #177957 - Flags: superreview?(bryner)
Attachment #177957 - Flags: review?(bugs)
Assignee: bugs → nobody
Assignee: nobody → bugs
Status: NEW → ASSIGNED
Flags: blocking-aviary1.1? → blocking-aviary1.1+
Priority: -- → P1
Um... We're doing RemoveElementAt() while the DOM is in an inconsistent state (some content has been added, but not notified on yet)? And using DOM methods while we're in that state? Both sound like really bad ideas to me... (that is, all the code involved looks really suspect; while this patch probably fixes the immediate crash, the code in general probably has other issues).
Be it as it may (I'm really absolutely no expert in this area of code), crashing on basic XUL overlay features like deleting and replacing elements for four weeks now is just intolerable. So we either should get at least the crash fixed or the patch of bug 282103 backed out, IMO. Blocking aviary 1.1 doesn't really help, because that may leave the trunk broken for some more months... Ben?
Summary: Trunk crash [@ nsXULDocument::OverlayForwardReference::Merge] → Trunk crash [@ nsXULDocument::OverlayForwardReference::Merge] (remove and replace broken in XUL overlays)
This needs to block 1.8b2, actually...
Flags: blocking1.8b2?
Actually if i look at the policy for the zt4newcrash keyword (this crash here is #2 on topcrasher list), the checkin from Bug 282103 needs to be backed out now.
I'll look at this this week.
ben is working on this today
I believe the solution to be pretty much what Karsten posted. The reason why is that the change to using aNotify for RemoveElement was probably a mistake. When implementing the Dynamic Overlay patch, I used aNotify to allow the various sections of code that would otherwise not notify the FE of their creation to do so so that when overlays were loaded after the document was displayed that the dynamic modifications would be shown. Basically, aNotify was used to allow PR_TRUE to be passed in some cases, where PR_FALSE had been supplied exclusively in the past. With RemoveElement, PR_TRUE was the only value ever passed. An alternative is to do the RemoveSubtreeFromDocument call manually in OverlayForwardReference::Resolve when aNotify is PR_FALSE. That may be more desirable in the pre-construction case where we may not want to notify (bz? -- the implementation of "removeelement" has worked like this for years...) At any rate, rolling back the use of aNotify in this instance seems to fix the bug.
Attachment #177957 - Attachment is obsolete: true
Attachment #179224 - Flags: superreview?(bryner)
Attachment #179224 - Flags: review?(bzbarsky)
> bz? -- the implementation of "removeelement" has worked like this for years... Arguably, so has XUL in general, and I can probably point you to a half-dozen crashers in core XUL without any effort. Just because people usually don't do something, doesn't mean assuming they never will is a good idea.... I'd definitely like to see a followup bug on reconciling what this code is trying to do (which then needs documenting) with what invariants need to be maintained to keep the DOM from crashing.
Comment on attachment 179224 [details] [diff] [review] more complete patch r=bzbarsky, I guess....
Attachment #179224 - Flags: review?(bzbarsky) → review+
Attachment #179224 - Flags: superreview?(bryner) → superreview+
Attachment #177957 - Flags: superreview?(bryner)
Attachment #177957 - Flags: review?(bugs)
So this topcrasher has sufficient review flags to be checked in for a halfweek now - anything else missing to get the trunk usable again? ;-)
I can confirm the crash with current trunk builds of Thunderbird (Linux). OS => ALL.
OS: Windows 2000 → All
(In reply to comment #21) > I can confirm the crash with current trunk builds of Thunderbird (Linux). No wonder, cause nobody check in the existing patch. Do we realy need to wait until Ben do that? Rue -just asking- diger
Attachment #179224 - Flags: approval1.8b2?
Comment on attachment 179224 [details] [diff] [review] more complete patch Approved for 1.8b2 trunk landing -- Ben, please do this ASAP. /be
Attachment #179224 - Flags: approval1.8b2? → approval1.8b2+
checked in.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Flags: blocking1.8b2?
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla1.8beta2
Component: XP Toolkit/Widgets: XUL → XUL
QA Contact: xptoolkit.widgets
Crash Signature: [@ nsXULDocument::OverlayForwardReference::Merge]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: