Closed Bug 284330 Opened 20 years ago Closed 20 years ago

Trunk crash [@ nsXULDocument::OverlayForwardReference::Merge] (remove and replace broken in XUL overlays)

Categories

(Core :: XUL, defect, P1)

x86
All
defect

Tracking

()

VERIFIED FIXED
mozilla1.8beta2

People

(Reporter: mnyromyr, Assigned: bugs)

References

Details

(Keywords: crash, regression, topcrash+)

Crash Data

Attachments

(1 file, 1 obsolete file)

[already 18 similar reports in SM/TB/FF since 2005-02-24:
<http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=1&searchby=stacksig&match=contains&searchfor=nsXULDocument%3A%3AOverlayForwardReference%3A%3AMerge&vendor=All&product=All&platform=All&buildid=&sdate=&stime=&edate=&etime=&sortby=bbid>]

Mozilla 1.8b2 [Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8b2)
Gecko/20050228] crashes when started by "mozilla -mail", e.g. with the Mnenhy
extension is installed. I could debug the crash up to this piece of code where
an <overlay> child is applied to the document tree
[http://lxr.mozilla.org/mozilla/source/content/xul/document/src/nsXULDocument.cpp##3963]:

3963             nsCOMPtr<nsIDOMNode> nodeParent;
3964             rv = nodeInDocument->GetParentNode(getter_AddRefs(nodeParent));
3965             if (NS_FAILED(rv)) return rv;
3966             nsCOMPtr<nsIDOMElement>
                     elementParent(do_QueryInterface(nodeParent));
3967 
3968             nsAutoString parentID;
3969             elementParent->GetAttribute(NS_LITERAL_STRING("id"), parentID);

GetParentNode in 3964 sets nodeParent to null, but does not set rv into an error
state. QIing doesn't help, ergo BOOM in 3969...




JFTR: In the case of Mnenhy, the triggering XUL code is this
[http://www.mozdev.org/source/browse/mnenhy/src/bin/chrome/mnenhy/content/mnenhy/headers/mnenhy-headers-msgHdrViewOverlay.xul?rev=1.9&content-type=text/x-cvsweb-markup]:

<overlay xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul"
         id   ="mnenhyHeadersMsgHdrViewOverlay" >
  <script type="application/x-javascript"
src="chrome://mnenhy/content/mnenhy-loader.js"/>
  <script type="application/x-javascript"
src="chrome://mnenhy-headers/content/mnenhy-headers-msgHdrViewOverlay-loader.js"/>
  <script type="application/x-javascript">
  <![CDATA[
[... skipped a formerly valid skript ...]
  ]]>
  </script>
  <!-- expanded view -->
  <hbox id="expandedHeaderView">
[... skipped more formerly valid XUL ...]
  </hbox>
</overlay>
Wrt the example XUL code: the crash appears when checking the hbox wit
id="expandedHeaderView".
> GetParentNode in 3964 sets nodeParent to null, but does not set rv into an
> error state

That's the right behavior for GetParentNode.

It looks like the only crash on 2005-02-24 was with a Firefox 1.0 build.  All
the other crashes with this stack are from builds with bug 282103 fixed...
Karsten can you confirm that the crash starts when that patch is checked in?  Or
was it something else?
Assignee: nobody → bugs
The nightly Mozilla 1.8b2 [Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
rv:1.8b2) Gecko/20050225; BuildID=2005022504] does *not* crash, but
BuildID=2005022605 does.

This period does indeed contain the checkins to bug 282103:
<http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2005-02-24+05%3A00%3A00&maxdate=2005-02-25+07%3A00%3A00&cvsroot=%2Fcvsroot>

Backing out attachment 175054 [details] [diff] [review] fixes the crash.
Blocks: 282103
Keywords: topcrash
Fixing summary and adding a few sets of recent Trunk Talkback data for future
reference:

Count   Offset    Real Signature
[ 17   nsXULDocument::OverlayForwardReference::Merge 27de07fa -
nsXULDocument::OverlayForwardReference::Merge ]
 
     Crash date range: 01-MAR-05 to 28-FEB-05
     Min/Max Seconds since last crash: 1 - 4953
     Min/Max Runtime: 3 - 5008
 
     Count   Platform List 
     13   Windows XP [Windows NT 5.1 build 2600] 
     4   Windows 2K [Windows NT 5.0 build 2195] 
 
     Count   Build Id List 
     7   2005022705
     3   2005030305
     3   2005030105
     2   2005022805
     1   2005030405
     1   2005022504
 
     No of Unique Users         6
 
 Stack trace(Frame) 

	 nsXULDocument::OverlayForwardReference::Merge
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/content/xul/document/src/nsXULDocument.cpp
 line 3969] 
	 nsXULDocument::OverlayForwardReference::Resolve
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/content/xul/document/src/nsXULDocument.cpp
 line 3831] 
	 nsXULDocument::ResolveForwardReferences
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/content/xul/document/src/nsXULDocument.cpp
 line 1352] 
	 nsXULDocument::ResumeWalk
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/content/xul/document/src/nsXULDocument.cpp
 line 3155] 
	 nsXULDocument::EndLoad
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/content/xul/document/src/nsXULDocument.cpp
 line 742] 
	 XULContentSinkImpl::DidBuildModel
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/content/xul/document/src/nsXULContentSink.cpp
 line 407]  
 
     (4101645)	Comments: Start MailNews after installing mnenhy-0.7.1.10005.xpi.
     (4045178)	Comments: mnenhy + mail news
     (4044933)	Comments: mnenhy + mail news
     (4035880)	Comments: try to open mail/news
     (4035772)	Comments: Crashed while try to open mail/news
     (4016548)	Comments: -mail +mnenhy
     (3934753)	Comments: Swtch from browser to mailnews (mnenhy installed as
sole extension)
 
====================================================================================================
     Count   Offset    Real Signature
[ 2   nsXULDocument::OverlayForwardReference::Merge() 90c92935 -
nsXULDocument::OverlayForwardReference::Merge() ]
 
     Crash date range: 01-MAR-05 to 01-MAR-05
     Min/Max Seconds since last crash: 0 - 0
     Min/Max Runtime: 1 - 1
 
     Count   Platform List 
     2   [Linux 2.6.8.1-10mdk]      
 
     Count   Build Id List 
     2   2005022801
 
     No of Unique Users         1
 
 Stack trace(Frame) 

	 nsXULDocument::OverlayForwardReference::Merge()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULDocument.cpp
 line 848] 
	 nsXULDocument::OverlayForwardReference::Resolve()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULDocument.cpp
 line 3831] 
	 nsXULDocument::ResolveForwardReferences()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULDocument.cpp
 line 1352] 
	 nsXULDocument::ResumeWalk()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULDocument.cpp
 line 3154] 
	 nsXULDocument::OnPrototypeLoadDone()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULDocument.cpp
 line 768] 
	 nsXULPrototypeDocument::NotifyLoadDone()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULPrototypeDocument.cpp
 line 752] 
	 nsXULDocument::EndLoad()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULDocument.cpp
 line 733] 
	 XULContentSinkImpl::DidBuildModel()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/content/xul/document/src/nsXULContentSink.cpp
 line 713] 
	 nsExpatDriver::DidBuildModel()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/parser/htmlparser/src/nsExpatDriver.cpp
 line 713] 
	 nsParser::DidBuildModel()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/parser/htmlparser/src/nsParser.cpp
 line 842] 
	 nsParser::ResumeParse()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/parser/htmlparser/src/nsParser.cpp
 line 1994] 
	 nsParser::OnStopRequest()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/parser/htmlparser/src/nsParser.cpp
 line 830] 
	 nsJARChannel::OnStopRequest()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/modules/libjar/nsJARChannel.cpp
 line 713] 
	 nsInputStreamPump::OnStateStop()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/netwerk/base/src/nsInputStreamPump.cpp
 line 713] 
	 nsInputStreamPump::OnInputStreamReady()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/netwerk/base/src/nsInputStreamPump.cpp
 line 343] 
	 nsInputStreamReadyEvent::EventHandler()  
	 PL_HandleEvent()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/xpcom/threads/plevent.c
 line 698] 
	 PL_ProcessPendingEvents()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/xpcom/threads/plevent.c
 line 633] 
	 nsEventQueueImpl::ProcessPendingEvents()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/xpcom/threads/nsEventQueue.cpp
 line 417] 
	 event_processor_callback()
[/builds/tinderbox/SeaMonkey-Release/Linux_2.4.20-28.8_Depend/mozilla/widget/src/gtk2/nsAppShell.cpp
 line 71]  
 
This is a topcrasher for both Mozilla and Thunderbird Trunks. Here is a link to
all current Talkback data for this crash:
http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=1&searchby=stacksig&match=contains&searchfor=nsXULDocument%3A%3AOverlayForwardReference%3A%3AMerge&vendor=All&product=All&platform=All&buildid=&sdate=&stime=&edate=&etime=&sortby=bbid

Looks like a regression that happened around 2/25-2/27...adding regression and
zt4newcrash keywords. Ben had a checkin around the area of the crash on 2/25:
1.647 <ben@bengoodger.com> 2005-02-25 01:06
282103 - dynamic overlays: implement "loadOverlay" method on nsIDOMXULDocument
which loads and merges an overlay into a XUL document. r=jst sr=bryner
Summary: Crash in nsXULDocument::OverlayForwardReference::Merge [@27de07fa] → Trunk crash [@ nsXULDocument::OverlayForwardReference::Merge]
I'm building my current nightlys applying the patch from
http://www.triffids.de/pub/mozilla/patch/patch-mnenh_crash which reverts
https://bugzilla.mozilla.org/attachment.cgi?id=175054.

The crash is gone. Couldn't wait longer because mozilla without mnenhy is, eh,
... urks. ;)
Flags: blocking-aviary1.1?
Keywords: topcrashtopcrash+
So its seems, that we all have to wait until you're back in town...

Is there nobody else 'in town' how can fix this bug?
After some more digging down, it seems that the "killing factor" here is this
now broken XUL construction in the overlay:

  <hbox id="expandedHeaderView">
    <!-- exchange the expandedHeaders vbox to avoid mismatching item types -->
    <vbox id="expandedHeaders" removeelement="true"/>
    <vbox id="expandedHeaders" persist="MnenhyState" flex="1" position="2">
      ...

The nodeInDocument (= the element to be removed in the actual document) in line
3964 does not have a parent...

If I remove the "removeelement" attribute, no crash appears (here, but at the
next "removeelement" attribute :-/)...

tbc...
So, I guess I found what's causing the problem:

The changes

         if (attr == nsXULAtoms::removeelement &&
             value.EqualsLiteral("true")) {
 
-            rv = RemoveElement(aTargetNode->GetParent(), aTargetNode);
+            rv = RemoveElement(aTargetNode->GetParent(), aTargetNode, aNotify);
             if (NS_FAILED(rv)) return rv;

and

-nsXULDocument::RemoveElement(nsIContent* aParent, nsIContent* aChild)
+nsXULDocument::RemoveElement(nsIContent* aParent, nsIContent* aChild, PRBool
aNotify)
 {
     PRInt32 nodeOffset = aParent->IndexOf(aChild);
 
-    return aParent->RemoveChildAt(nodeOffset, PR_TRUE);
+    return aParent->RemoveChildAt(nodeOffset, aNotify);

remove an element with 'removeelement' attribute just from the childNodes, but
not from the mElementMap: aNotify is false, resulting in ContentRemoved *not* to
be called by nsXULElement::RemoveChildAt [nsXULElement.cpp(1176)]:

    if (aNotify && doc) {
        doc->ContentRemoved(this, oldKid, aIndex);
    }

Then GetElementById still finds the node in its map, but the node itself has no
parent anymore ==> BOOM!
A possible solution.
Attachment #177957 - Flags: superreview?(bryner)
Attachment #177957 - Flags: review?(bugs)
Assignee: bugs → nobody
Assignee: nobody → bugs
Status: NEW → ASSIGNED
Flags: blocking-aviary1.1? → blocking-aviary1.1+
Priority: -- → P1
Um...  We're doing RemoveElementAt() while the DOM is in an inconsistent state
(some content has been added, but not notified on yet)?  And using DOM methods
while we're in that state?

Both sound like really bad ideas to me... (that is, all the code involved looks
really suspect; while this patch probably fixes the immediate crash, the code in
general probably has other issues).
Be it as it may (I'm really absolutely no expert in this area of code), crashing
on basic XUL overlay features like deleting and replacing elements for four
weeks now is just intolerable.

So we either should get at least the crash fixed or the patch of bug 282103
backed out, IMO.

Blocking aviary 1.1 doesn't really help, because that may leave the trunk broken
for some more months...

Ben?
Summary: Trunk crash [@ nsXULDocument::OverlayForwardReference::Merge] → Trunk crash [@ nsXULDocument::OverlayForwardReference::Merge] (remove and replace broken in XUL overlays)
This needs to block 1.8b2, actually...
Flags: blocking1.8b2?
Actually if i look at the policy for the zt4newcrash keyword (this crash here is
#2 on topcrasher list), the checkin from Bug 282103 needs to be backed out now.
I'll look at this this week. 
ben is working on this today
I believe the solution to be pretty much what Karsten posted. 

The reason why is that the change to using aNotify for RemoveElement was
probably a mistake. When implementing the Dynamic Overlay patch, I used aNotify
to allow the various sections of code that would otherwise not notify the FE of
their creation to do so so that when overlays were loaded after the document was
displayed that the dynamic modifications would be shown. Basically, aNotify was
used to allow PR_TRUE to be passed in some cases, where PR_FALSE had been
supplied exclusively in the past. 

With RemoveElement, PR_TRUE was the only value ever passed. An alternative is to
do the RemoveSubtreeFromDocument call manually in
OverlayForwardReference::Resolve when aNotify is PR_FALSE. That may be more
desirable in the pre-construction case where we may not want to notify (bz? --
the implementation of "removeelement" has worked like this for years...) 

At any rate, rolling back the use of aNotify in this instance seems to fix the bug.

Attachment #177957 - Attachment is obsolete: true
Attachment #179224 - Flags: superreview?(bryner)
Attachment #179224 - Flags: review?(bzbarsky)
> bz? -- the implementation of "removeelement" has worked like this for years...

Arguably, so has XUL in general, and I can probably point you to a half-dozen
crashers in core XUL without any effort.  Just because people usually don't do
something, doesn't mean assuming they never will is a good idea.... 

I'd definitely like to see a followup bug on reconciling what this code is
trying to do (which then needs documenting) with what invariants need to be
maintained to keep the DOM from crashing.
Comment on attachment 179224 [details] [diff] [review]
more complete patch

r=bzbarsky, I guess....
Attachment #179224 - Flags: review?(bzbarsky) → review+
Attachment #179224 - Flags: superreview?(bryner) → superreview+
Attachment #177957 - Flags: superreview?(bryner)
Attachment #177957 - Flags: review?(bugs)
So this topcrasher has sufficient review flags to be checked in for a halfweek
now  - anything else missing to get the trunk usable again? ;-)
I can confirm the crash with current trunk builds of Thunderbird (Linux).

OS => ALL.
OS: Windows 2000 → All
(In reply to comment #21)
> I can confirm the crash with current trunk builds of Thunderbird (Linux).

No wonder, cause nobody check in the existing patch. Do we realy need to wait
until Ben do that?

Rue -just asking- diger
Attachment #179224 - Flags: approval1.8b2?
Comment on attachment 179224 [details] [diff] [review]
more complete patch

Approved for 1.8b2 trunk landing -- Ben, please do this ASAP.

/be
Attachment #179224 - Flags: approval1.8b2? → approval1.8b2+
checked in. 
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Flags: blocking1.8b2?
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla1.8beta2
Component: XP Toolkit/Widgets: XUL → XUL
QA Contact: xptoolkit.widgets
Crash Signature: [@ nsXULDocument::OverlayForwardReference::Merge]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: