Closed Bug 380872 Opened 17 years ago Closed 17 years ago

Compose (e.g. Right click -> send page) crashes [@ FindBodyContent()]

Categories

(Core :: XUL, defect)

x86
All
defect
Not set
blocker

Tracking

()

RESOLVED FIXED

People

(Reporter: bes.wll, Assigned: sicking)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(5 files, 4 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a5pre) Gecko/20070516 SeaMonkey/1.5a
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a5pre) Gecko/20070516 SeaMonkey/1.5a

Open any URL. Right click-> send page. Crash!

Reproducible: Always

Steps to Reproduce:
1. Open any URL
2. Right click and then click send page
3. Crash
Actual Results:  
Crash 

Expected Results:  
Composer window should open and no crash should occur

This happens with the nightly build 2007051601. See talkback incidence TB32193615Q
Keywords: crash
Summary: Right click -> send page causes crash → Right click -> send page causes crash [@ FindBodyContent()]
I'm assuming that this is a recent regression?

Incident ID: 32193615
Stack Signature	FindBodyContent() 30452eec
Product ID	MozillaTrunk
Build ID	2007051601
Trigger Time	2007-05-16 06:27:24.0
Platform	LinuxIntel
Operating System	Linux 2.6.20-1.2948.fc6
Module	libgklayout.so + (0022b003)
URL visited	any page
User Comments	Right click -> send page. Crash!
Since Last Crash	0 sec
Total Uptime	0 sec
Trigger Reason	SIGSEGV: Segmentation Fault: (signal 11)
Source File, Line No.	/builds/tinderbox/SeaMonkey-Trunk/Linux_2.6.9-42.ELsmp_Depend/mozilla/objdir/layout/xul/base/src//builds/tinderbox/SeaMonkey-Trunk/Linux_2.6.9-42.ELsmp_Depend/mozilla/layout/xul/base/src/nsListBoxObject.cpp, line 172
Stack Trace 	
FindBodyContent()   FindBodyContent()   nsListBoxObject::GetListBoxBody()   nsListBoxObject::GetIndexOfFirstVisibleRow()   NS_InvokeByIndex_P()
XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)()   XPC_WN_CallMethod()   js_Invoke()   js_Interpret()   js_Execute()   JS_EvaluateUCScriptForPrincipals()   nsJSContext::EvaluateString()   nsGlobalWindow::RunTimeout()   nsGlobalWindow::TimerCallback()   nsTimerImpl::Fire()   nsTimerEvent::Run()   nsThread::ProcessNextEvent()   NS_ProcessNextEvent_P()  [mozilla/objdir/xpcom/build/nsThreadUtils.cpp, line 227]
nsBaseAppShell::Run()   nsAppStartup::Run()   main1()   main()   libc.so.6 + 0x15f2c (0x00991f2c)
Assignee: general → nobody
Component: General → XP Toolkit/Widgets: XUL
Product: Mozilla Application Suite → Core
QA Contact: general → xptoolkit.xul
Version: unspecified → Trunk
Keywords: regression
Attached patch wip (obsolete) — Splinter Review
Not sure what the right fix is here, but this doesn't crash at least.
Status: UNCONFIRMED → NEW
Ever confirmed: true
So I'm a little confused.  frame->GetContent()->GetDocument() is not null?  But the document of one of the xbl child nodes _is_ null?  How can that happen?

For what it's worth, I think it would make the most sense to get the binding manager off the ownerdocument, but the above inconsistency seriously worries me.

I also wonder whether this would fix the non-crash parts of bug 380866.
Blocks: 53901
Flags: blocking1.9?
Simply opening a blank message compose window in Thunderbird crashes with the same stack. Changing OS to all and severity to blocker since composing messages is now completely disabled.
Severity: critical → blocker
OS: Linux → All
Should this be moved to MailNews:Composition since the "Send Page" thing is not necessary to reproduce this, and it happens in Thunderbird as well? 
Investigating...
Assignee: nobody → mats.palmgren
Priority: -- → P1
I'm pretty sure I know what the problem is and I'm working on a patch.
Flags: blocking1.9? → blocking1.9+
Ok, thanks.
Just to answer Boris' questions (comment 5) from memory:
Yes. Yes. Don't know.  As I recall, changing FindBodyContent()
to use GetOwnerDocument() also fixed it.
Assignee: mats.palmgren → jonas
Priority: P1 → --
This bug is also for thunderbird standalone operating with windows 2000 professional and with trunk version 20070517
This should call BindToTree on all anonymous children that we need to do it on. I don't have a thunderbird build so I haven't been able to reproduce the problem, would be great if someone else could try it out.
Attachment #265066 - Attachment is obsolete: true
Comment on attachment 265323 [details] [diff] [review]
Will hopefully do it.

bz: so I think what is happening is that we have anonymous content bound to one of these evil clones, and then the clone is attached to the tree and rendered.

This used to work fine since the evil clone had in-doc set, so the in-doc flag propagated to the anonymous content.

However now that in-doc is not set on the evil clone, it won't be on the anonymous content either. When the clone is inserted into the tree we don't call BindToTree on the anonymous content, and so it remains "out of doc".

I hope I got the patch right, it feels like the right thing to do either way.

Would be especially great if you could confirm that the parameters given to BindToTree makes sense.
Attachment #265323 - Flags: superreview?(bzbarsky)
Attachment #265323 - Flags: review?(bzbarsky)
I can confirm that the patch fixes the crash in SeaMonkey for me.
I get an assertion on exit though, although it might be unrelated:

###!!! ASSERTION: attempt to remove an element that was never added: 'hep != nsnull && *hep != nsnull', file nsElementMap.cpp, line 281
I need to do some digging to review this; with any luck I can do it by Sunday.
Stops Thunderbird from crashing, which is  nice, but the address autocomplete
widget is pretty completely broken.

Opening compose gives

Error: uncaught exception: [Exception... "Component returned failure code:
0x80004005 (NS_ERROR_FAILURE) [nsIDOMXULElement.boxObject]"  nsresult:
"0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame ::
chrome://global/content/bindings/textbox.xml ::  :: line 117"  data: no]

Error: uncaught exception: [Exception... "Component returned failure code:
0x80004005 (NS_ERROR_FAILURE) [nsIDOMXULElement.boxObject]"  nsresult:
"0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame ::
chrome://global/content/autocomplete.xml :: set_view :: line 1497"  data: no]

Error: uncaught exception: [Exception... "Component returned failure code:
0x80004005 (NS_ERROR_FAILURE) [nsIDOMXULElement.boxObject]"  nsresult:
"0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame ::
chrome://global/content/bindings/menulist.xml :: <TOP_LEVEL> :: line 125" 
data: no]

and the address widget's only maybe a quarter of the window's width. Start
typing an address, and you get an empty autocomplete popup the same too-small
width, and mousing over the empty space gives a stream of 

Error: this.textbox.view.treeBoxObject has no properties
Source File: chrome://global/content/autocomplete.xml
Line: 1667
erh...

With latest updated version (20 of may), my thunderbird is *still* crashing when clicking on "new message" icon on toolbar.

Seeing it with Windows Vista.

So it is a windows "crappysta" bug ?!
It's effecting Linux, Windows 2000 and Vista. I think its safe to say its effecting all OS. See previous comments.
this patch stops the crash, and you can see the address widget, but the display of address you type or autocomplete to is clipped after about 20 chars. Maybe that's just something wrong with my build, but it's something to look for.
Yeah, i know that problem is still there. I'll land this patch tomorrow (assuming i get review) and look into what that is. If I can't fix the remaining problem pretty quickly I'll back out the original cause of these regressions.
I notice from the comments above that there seem to be several separate problems here (a) it crashes when you try to write a message,(b) there is nothing in the address area, (c) even when the above is fixed there is a problem with the address area being limited to about 20 characters.  

Can't whatever changes were made a few days ago be regressed until the people who made them have working code that doesn't cause the system not to work?  I know the trunk nightly is an Alpha, but how did an upgrade which caused something as basic as composing messages not to work even get into Alpha?
IMHO, the crash fix should go in fast (I hope bz comes around to reviewing it soon), then there should be a followup filed or multiple followups filed for remaining issues, if they are not fixed by the fix here.

Peter:
Trunk is not even Alpha, it's bleeding-edge current development code. Neither Thunderbird nor SeaMonkey have released an Alpha based on Gecko 1.9 trunk yet.
Maybe we're at cross purposes here.  My Thunderbird says it is "version 3 alpha 1" followed by a version/date number in brackets.
(In reply to comment #29)
> My Thunderbird says it is "version 3 alpha
> 1" followed by a version/date number in brackets.

As Robert wrote, it is not officially _released_:
the _nightly_, which I guess you use, only tells you what (goal) is being worked on...
In Firefox (aka Minefield) they are using Minefield 3.0a5pre, which means the goal is Alpha 5.

Maybe we should have the same thing in Thunderbird? I'm surprised Thunderbird is not using this already. Although this is a separate issue. Should I file a new bug for this?
(Yes, it's TB (Trunk) issue. Searching/Filing a bug should do no harm.)
Summary: Right click -> send page causes crash [@ FindBodyContent()] → Compose (e.g. Right click -> send page) crashes [@ FindBodyContent()]
Comment on attachment 265323 [details] [diff] [review]
Will hopefully do it.

>Index: content/base/src/nsGenericElement.cpp
> nsGenericElement::BindToTree(nsIDocument* aDocument, nsIContent* aParent,

>+        nsCOMPtr<nsIContent> anonRoot = contBinding->GetAnonymousContent();

I assume the strong ref is just in case?  Seems reasonable.

>+      // ...then check if we have content in insertion points
>+      if (aBindingParent) {
>+        nsXBLBinding* binding = bmgr->GetBinding(aBindingParent);
>+        NS_ASSERTION(binding, "huh, no binding?");

I'm not sure you can make this assertion.  For example, |this| could be a Text node in a textarea, and aBindingParent would be the anonymous <div> inside the textarea, no?

>+                  nsCOMPtr<nsIContent> child = insertRoot->GetChildAt(j);

Again, I assume this is a just in case?

This looks reasonable... I should have told you to test tbird with the other patch; I recall now that the composition UI is a continuous XBL trouble spot.  :(  Sorry about that.
Attachment #265323 - Flags: superreview?(bzbarsky)
Attachment #265323 - Flags: superreview+
Attachment #265323 - Flags: review?(bzbarsky)
Attachment #265323 - Flags: review+
New nightly update 20070521 09 still crashed when trying to write a message
(In reply to comment #32)
> (Yes, it's TB (Trunk) issue. Searching/Filing a bug should do no harm.)
> 

https://bugzilla.mozilla.org/show_bug.cgi?id=381418 Bug filed and patch should appear in the next nightly.

Michael
(In reply to comment #37)
> New nightly update 20070521 09 still crashed when trying to write a message
> 

It landed after the nightly (20070521) was released. It should work with the latest hourly and the next nightly build (0522)
Attached patch addressingWidget patch (obsolete) — Splinter Review
I just found this bug report. Was running into the same issues described above - composing a new message would crash; replying to a message would work but the addressingWidget wouldn't display. The error console showed 3 uncaught exceptions (already noted in above comments) because in CompFields2Recipients() the newListBoxNode didn't have a valid boxObject.

I came up with this quick hack of appending newListBoxNode to the listbox's parent first to avoid the exceptions. Then popping it off again so it doesn't get in there twice at the end. Just for reference; if the above patch works you can ignore this.
There still seems to be some nodes that don't get there IsInDoc flag set as needed, so lets try to fix that before creating workarounds all over. Feel free to file separate bugs on unrelated issues that you find though, of course.
Oh, hmm.   I guess GetBoxObjectFor() bails out if GetCurrentDoc() != this.  We might need to hack this if the XBL bit is set, perhaps, if XBL constructors/fields/whatever sue the box object.  And we'd need a way to deal with bug 330181, maybe.  But it would be no worse than when nodes lied about their document, so...
Status: NEW → ASSIGNED
If there is a <children> as a direct child of <content> the bound element can have insertion points in its binding. This should take care of such insertion points as well.

I'm still getting some layout weirdness, and some assertions about the XUL-document element map, but at least with this patch we get into a dogfoodable state while I keep looking for the remaining regressions.
Attachment #265596 - Flags: superreview?(bzbarsky)
Attachment #265596 - Flags: review?(bzbarsky)
Comment on attachment 265596 [details] [diff] [review]
Fix insertion points directly below <content>

>Index: content/base/src/nsGenericElement.cpp
>+BindNodesInInsertPoints(nsXBLBinding* aBinding, nsIContent* aInsertParent,
>+                        nsIDocument* aDocument, PRBool aAllowScripts)

I'd get rid of the aAllowScripts arg and just use aBinding->AllowScripts().  If you really feel this is faster or something (which I doubt), at least assert that the two are equal.

Also, assert that aBinding and aInsertParent are not null?

r+sr=bzbarsky with that.
Attachment #265596 - Flags: superreview?(bzbarsky)
Attachment #265596 - Flags: superreview+
Attachment #265596 - Flags: review?(bzbarsky)
Attachment #265596 - Flags: review+
Yeah, I at first used aBinding->AllowScripts, but the implementation of AllowScripts looked big and bulky. But I suspect you're right that it's fast enough so I'll use that.
[Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a5pre) Gecko/2007052204 SeaMonkey/1.5a] (suiterunner, tinderbox-builds) (W2Ksp4)

Simple confirmation that the Compose crash part is fixed.
tested the new nightly update 2007052204.  Thunderbird standalone does not crash but in the compose area still the address space filling has problems. In reply to all the detailed addresses sometime are cut in half missing the last past.  The same in the reply where the address maybe cut or missing a piece of it.
When writing a message, the address preestablished do not appear complete, the box now is too narrow to contain the full lists of address that match with the first letters introduced. Should I fill a new bug, or it not necessary?
tested the new nightly update 2007052204.  Thunderbird standalone does not crash but in the compose area still the address space filling has problems. In reply to all the detailed addresses sometime are cut in half missing the last past.  The same in the reply where the address maybe cut or missing a piece of it.
When writing a message, the address preestablished do not appear complete, the box now is too narrow to contain the full lists of address that match with the first letters introduced. Should I fill a new bug, or it not necessary?
Fernando,
that is dependent bug 381553 !
Attached patch Fix more issues (obsolete) — Splinter Review
The new AddSubtreeToDocument call fixes some assertions I was seeing and seems like the right thing to do.

The GetBoxObject changes fixes some warnings, not sure if they are strictly needed or not, but lets do this until we've fixed all regressions at least.

The nsNodeUtils.cpp change fixes the sizing problem of the address widget. It seems really unfortunate, but I'd like to investigate why it's needed after we've fixed the regressions.

Still having issues with the dropdown, but this should at least get us to a better state.
Attachment #265866 - Flags: superreview?(bzbarsky)
Attachment #265866 - Flags: review?(bzbarsky)
Attached patch Fix more issuesSplinter Review
Fixes a compile error in previous patch.
Attachment #265867 - Flags: superreview?(bzbarsky)
Attachment #265867 - Flags: review?(bzbarsky)
Attachment #265866 - Attachment is obsolete: true
Attachment #265866 - Flags: superreview?(bzbarsky)
Attachment #265866 - Flags: review?(bzbarsky)
Comment on attachment 265867 [details] [diff] [review]
Fix more issues

>Index: content/base/src/nsDocument.cpp
>+  nsIDocument* doc = content->HasFlag(NODE_FORCE_XBL_BINDINGS) ?
>+    content->GetOwnerDoc() : content->GetCurrentDoc();

We should file a followup bug to sort out how box objects should work... again.  We made this use the current document because of security issues in the first place; of course those issues remained with XUL.  :(

>Index: content/base/src/nsGenericElement.cpp

>+            xulDoc->AddSubtreeToDocument(child);

Nice catch.

>Index: content/base/src/nsNodeUtils.cpp
>+      /* For elements that have the NODE_FORCE_XBL_BINDINGS flag  \
>+         set we need to notify the document */                    

Ugh.  Again, followup bug?  I _really_ want to know what depends on this!

Thanks for digging into this, Jonas!

r+sr=bzbarsky
Attachment #265867 - Flags: superreview?(bzbarsky)
Attachment #265867 - Flags: superreview+
Attachment #265867 - Flags: review?(bzbarsky)
Attachment #265867 - Flags: review+
The only remaining issue that I'm seeing is that the dropdown with completations is only a single row high. This issue is fixed by Howards patch in attachment 265569 [details] [diff] [review].

Howard, can you explain what that patch does. Do you know why it's needed? What fails if we don't do that?
With the patch the dropdown on autocompete does show more than one row.
Winxp sp2 but horizontal and vertical scrollbars are rendered.
See: http://img406.imageshack.us/img406/6907/addpanemu5.gif
The patch I was referring to was not Howard's but just with your's Jonas.
test via houry Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a5pre) Gecko/20070524 Thunderbird/3.0a1pre ID:2007052417 [cairo]


I just filed bug 381826 which might be a connected to this bug. (Mail sorting and filtering not working)
No longer blocks: 381826
Depends on: 381826
No longer depends on: 381826
[Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a5pre) Gecko/20070525 SeaMonkey/1.5a] (nightly) (W2Ksp4)

(In reply to comment #55)
> With the patch the dropdown on autocompete does show more than one row.

I don't know which patch...
But I confirm that the dropdown (now) shows multiple addresses :-)

> Winxp sp2 but horizontal and vertical scrollbars are rendered.

Vertical sb is expected.
Horizontal sb is a regression :-(
(In reply to comment #58)
> [Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a5pre) Gecko/20070525
> SeaMonkey/1.5a] (nightly) (W2Ksp4)
> 
> (In reply to comment #55)
> > With the patch the dropdown on autocompete does show more than one row.
> 
> I don't know which patch...
> But I confirm that the dropdown (now) shows multiple addresses :-)
> 
> > Winxp sp2 but horizontal and vertical scrollbars are rendered.
> 
> Vertical sb is expected.
> Horizontal sb is a regression :-(
> 


Submitted bug 382083 to track this last remaining regression.
No longer blocks: 380913, 381553, 381723
Depends on: 380913, 381553, 381723, 382083
Attached file t (obsolete) —
Attachment #266342 - Attachment filename: toad.txt → sdsdsds
Attachment #266342 - Attachment is obsolete: true
Comment on attachment 265569 [details] [diff] [review]
addressingWidget patch

This patch from Howard makes the horizontal scrollbar in the popup go away for me. Jonas and I were talking about this patch and we'd like to take it.

I think Neil knows this code best though, asking him for a review.
Attachment #265569 - Flags: review?(neil)
Comment on attachment 265569 [details] [diff] [review]
addressingWidget patch

>@@ -213,6 +213,8 @@
>     var msgRandomHeaders = msgCompFields.otherRandomHeaders;
>     var msgNewsgroups = msgCompFields.newsgroups;
>     var msgFollowupTo = msgCompFields.followupTo;
>+    var parent = listbox.parentNode;
>+    parent.appendChild(newListBoxNode);

Do you want to make this
parent.appendBefore(newListBoxNode, listbox);

>     // dump("replacing child in comp fields 2 recips \n");
>-    var parent = listbox.parentNode;
>+    parent.removeChild(newListBoxNode);
>     parent.replaceChild(newListBoxNode, listbox);
>     awFitDummyRows(2);

... and this parent.removeChild(listbox) to save yourself some shuffling of the nodes?
(In reply to comment #40)
> Created an attachment (id=265569) [details]
> addressingWidget patch
> 
> I came up with this quick hack of appending newListBoxNode to the listbox's
> parent first to avoid the exceptions. Then popping it off again so it doesn't

Is that (still) the right fix ?
The 3 exceptions have been fixed already.
Now, doesn't this look too much like a hack rather than a fix ?

NB: While there, could you add a space after '//' ?
Comment on attachment 265569 [details] [diff] [review]
addressingWidget patch

>+    var parent = listbox.parentNode;
>+    parent.appendChild(newListBoxNode);

>-    var parent = listbox.parentNode;
>+    parent.removeChild(newListBoxNode);
>     parent.replaceChild(newListBoxNode, listbox);
This is way too complicated. Just move both lines.
Also I'd prefer them moved to just after the var templateNode line.
Attachment #265569 - Flags: review?(neil) → review-
Flags: in-testsuite?
Neil: Does this look better?

Re comment 63: I don't believe this is a hack, XUL generally don't work when used outside of the document since XBL only binds to nodes inside the document, this is a known problem and has always existed. The fact that this code ever worked is due to a big and problematic hack involving cloned XUL nodes.

When I first fixed (part of) that hack we got some scary inconsistencies that I've fixed in the initial patches in this bug.

With that fixed cloned XUL elements now behave more like normal XUL elements. And since this widget seems to be the only one affected by this so far, I'd rather change the widget than try to add back more hacks into XUL.
Attachment #265569 - Attachment is obsolete: true
Attachment #266662 - Flags: review?(neil)
(All right, I understand.
The new patch "looks" much better than the previous, though.)
Comment on attachment 266662 [details] [diff] [review]
Fix review comments

Got sr=bienvenu over irl
Attachment #266662 - Flags: superreview+
Waiting to get Neils official r before closing this one.
[Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9a5pre) Gecko/2007053102
SeaMonkey/2.0a1pre] (nightly) (W2Ksp4)

You can count this very comment as a
"V.Fixed, as all blockers are now fixed."
when you resolve this bug.
Attachment #266662 - Flags: review?(neil) → review+
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Depends on: 386854
Depends on: 398492
> The nsNodeUtils.cpp change fixes the sizing problem of the address widget. It
> seems really unfortunate, but I'd like to investigate why it's needed after
> we've fixed the regressions.

Bug 398492 is a crash regression from this change.  It has some thoughts on why it was maybe needed and how to get rid of it.
Component: XP Toolkit/Widgets: XUL → XUL
QA Contact: xptoolkit.xul → xptoolkit.widgets
Crash Signature: [@ FindBodyContent()]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: