Closed Bug 96911 Opened 23 years ago Closed 23 years ago

app crash when bringing up composition window (for both new or reply to msgs) -- gcc -O2

Categories

(Core :: DOM: Editor, defect)

x86
Linux
defect
Not set
blocker

Tracking

()

VERIFIED FIXED
mozilla0.9.4

People

(Reporter: cls, Assigned: cls)

References

Details

(Keywords: crash, regression, Whiteboard: possible fix attached; need testing)

Attachments

(6 files)

I'm sure this is a dupe but I can't seem to find the previous bugs that I've
seen on this topic.  The entire app crashes whenever I try to "Reply To" an
existing mail message.  It also crashes when I try to compose a new message. 
First noticed in a custom build from 1am.  I repulled at 12:30pm and clobbered.
 I do not see the problem in the egcs nightly build but the gcc 2.95.2 -O2
nightly build crashes as well.  My local builds are gcc 2.95.2 -O2 & -O3.
Attached file stdout log + backtrace
Keywords: crash, smoketest
I am on vacation. Reassign to varada.
Assignee: ducarroz → varada
Ok, after backing out jfrancis' checkin from 8/21 22:32, the crash went away.
Reassigning.
Assignee: varada → brade
Component: Composition → Editor
Product: MailNews → Browser
QA Contact: sheelar → sujay
Moving to jfrancis.
Assignee: brade → jfrancis
doh!  Investigating.  
Status: NEW → ASSIGNED
Any other platforms seeing this?  Is there a known problem with functions that 
return nsCOMPtrs on gcc?  From the stack trace I can only assume gcc is doing the 
wrong thing in this situation.  I am unable to reproduce this on mac.

Scott, do you know of gcc probs with functions that return nsCOMPtrs?
Afaik, it's only gcc and only with a higher optimization level (-O2 or higher).
 I thought it was similar to bug 80988 and bug 61501 but I can't seem to find a
quick fix for this one.  The previous bugs were caused by gcc mis-optimizing
nsCOMPtrs when they were declared inside a do-while loop.  This problem appears
to be fixed in the gcc 3.x releases but we haven't upgraded to them.

Summary: app crash when bringing up composition window (for both new or reply to msgs) → app crash when bringing up composition window (for both new or reply to msgs) -- gcc -O2
adding regression keyword, and making "the -O[2,3] bug" dependent on this...
Blocks: O2
Keywords: regression
Well... maybe a stupid idea... what about using |volatile| to tell the compiler
to turn all optimisations off for that variable (assuming that gcc listens to
|volatile| ...) ?
It's a return value.  Is
volatile nsCOMPtr<nsIFoo> Bar();
a valid function declaration?  I haven't seen volatile used that way before, nor 
do I know what it would mean there.
Does anyone have a better stack-trace on this?  With arguments, perhaps?  And
where in nsSupportsArray::AppendElement it is?  This may very well be the
compiler bug, but I can't be sure from the stack trace.

Even better, a disasm of the code?
The problem won't be seen by looking at disasm of AppendElement.  The real 
problem is inside nsEditor::GetNextNode().  It is calling a routine in there that 
returns an nsCOMPtr.  Before the comptr was an outParam of the function (ie, an *
nsCOMPtr).  Now it's the return value. GCC seems to be destroying the comptr 
prematurely somewhere along the way - perhaps the temporary generated by the 
compiler is getting destroyed before it is used to set the returned value.
Note that
http://mozilla.org/projects/xpcom/nsCOMPtr.html#guide_nsCOMPtr_in_APIs
says that |nsCOMPtr|s should not be used as return values of functions.
This is not seen in the release builds for today, removing smoketest keyword. 
Keywords: smoketest
In an effort to get the tree open quickly, couldn't we just change GetNextNode() 
and GetPriorNode() to assign into COMPtrs when 
calling GetLeftmostChild()/GetRightmostChild()?

I believe right now they are assigning directly into raw pointers, which could 
be why things are getting released prematurely?
Ok, I misread the code ... it's been a long day already ... disregard my 
previous statements above. I have a clue now. ;-)
I disagree with the guideline DBaron mentions.  I wouldn't disagree if the
"already addrefed" return type worked correctly, but kin experienced problems
wen trying to use it.  In the meantime, comptrs are the lesser of evils. 
Non-comptrs have greater opportunity for usage error (not agreeing on ownership)
than comptrs (assigning into non-comptrs, which I never do anyway).
is there a way to crank down gcc opts in the souce with pragmas?
fix in hand; need reviews, and need a= if we want for 094
Whiteboard: fix in hand; need r/sr/a
Target Milestone: --- → mozilla0.9.4
also, if a linux gcc user could verify that this fixes things that would be nice.
oops, didn't diff all the files, new patch coming shortly
After the patch, it still crashes with the same stack.  I've tried compiling a
couple of the files with -g to get some extra debugger info but that seems to
make the crash go away. *sigh*

I created a new profile and started up Mail/News. I hit the "New Msg" button and
the composer window opened fine along with the account creation wizard, in which
I created a imap account instance. When I hit finish, Mozilla core dumped. Here
are the last few lines of gdb's output. I'll attach the full output separately.

Loaded symbols for
/export/src/mozilla/builds/mozilla-trunk/dist/bin/components/libmsgimap.so
Reading symbols from
/export/src/mozilla/builds/mozilla-trunk/dist/bin/components/libmsgdb.so...done.
Loaded symbols for
/export/src/mozilla/builds/mozilla-trunk/dist/bin/components/libmsgdb.so
#0  0x40b1a0a5 in nsDocument::~nsDocument ()
   from /export/src/mozilla/builds/mozilla-trunk/dist/bin/components/libgkcontent.so
(gdb) 

I did all of this WITH the latest patch in this bug thread.
*** Bug 97081 has been marked as a duplicate of this bug. ***
attaching possible fix from scc and kin
Whiteboard: fix in hand; need r/sr/a → possible fix attached; need testing
The latest patch fails on compile (Linux/gcc2.95.3) with the following:

c++ -o nsEditor.o -c -DENABLE_EDITOR_API_LOG=1 -DOSTYPE=\"Linux2.4\"
-DOSARCH=\"Linux\" -DOJI   -I../../dist/include -I../../dist/include
-I/export/src/mozilla/builds/mozilla-trunk/dist/include/nspr     
-I/usr/X11R6/include   -fPIC  -I/usr/X11R6/include -fno-rtti -fno-exceptions
-Wall -Wconversion -Wpointer-arith -Wbad-function-cast -Wcast-align
-Woverloaded-virtual -Wsynth -pedantic -Wno-long-long -pipe -pthread -O2
-march=i686  -DNDEBUG -DTRIMMED -I/usr/X11R6/include -DMOZILLA_CLIENT -include
../../config-defs.h -Wp,-MD,.deps/nsEditor.pp
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp: In method `struct
already_AddRefed<nsIDOMNode> nsEditor::GetRightmostChild(nsIDOMNode *, int = 0)':
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp:3398: conversion from
`nsCOMPtr<nsIDOMNode>' to non-scalar type `already_AddRefed<nsIDOMNode>' requested
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp:3388: warning: unused
variable `nsresult result'
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp: In method `struct
already_AddRefed<nsIDOMNode> nsEditor::GetLeftmostChild(nsIDOMNode *, int = 0)':
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp:3425: conversion from
`nsCOMPtr<nsIDOMNode>' to non-scalar type `already_AddRefed<nsIDOMNode>' requested
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp:3415: warning: unused
variable `nsresult result'
gmake[3]: *** [nsEditor.o] Error 1
gmake[3]: Leaving directory `/export/src/mozilla/builds/mozilla-trunk/editor/base'
gmake[2]: *** [install] Error 2
gmake[2]: Leaving directory `/export/src/mozilla/builds/mozilla-trunk/editor'
gmake[1]: *** [install] Error 2
gmake[1]: Leaving directory `/export/src/mozilla/builds/mozilla-trunk'
gmake: *** [build] Error 2
With the 8/28 23:00 gcc295 nightly, I'm not seeing this problem any longer.  I'm
also not seeing it in my local builds any longer.  Not sure what "fixed" it.  Is
anyone else still seeing the problem?

Hmm, my build from 2001-08-28, between 12:00 and 13:00 PST (I think, it was
around 21:30 MEST here when I pulled) does work OK as well though (I think)
about 24 hours earlier I saw this still happening. What has happened here?
I pulled the latest changes at 2:30 and did a depend build. A relatively new
profile (the one that caused the crash in my message of 2001-08-27 17:58) was
able to start up composer and send an email just fine.

The same build, with my normal profile (which had been imported from NN4.7 into
Mozilla 0.9 or so) freezes with the same symptoms it has been having for the
last week or so: whole app freezes and I have to Control-C to kill it.

Just to be clear: I only had an actual core-dump once. Every other time it has
just hung the whole app.

Here's my .mozconfig, in case that might help:

ac_add_options --disable-logging
ac_add_options --disable-editor-api-log
ac_add_options --disable-tests
ac_add_options --disable-debug
ac_add_options --enable-crypto
ac_add_options --enable-optimize="-O2 -march=i686"
ac_add_options --with-jpeg
ac_add_options --with-zlib
ac_add_options --with-png
ac_add_options --with-gtk
Ok, I verified that it was kin's checkin to nsEditor.{h,cpp} yesterday that
"fixed" the problem.  I also noticed, that with a pre-kin version of
nsEditor.cpp, if I move the declaration of node outside of the do-while loops in
GetPriorNode() & GetNextNode(), then the app doesn't crash when the window comes
up.  It crashes when the first character is typed in the window.  

Ok, I found the pre-kin fix for the original crash problem that supported my
original hypothesis of not declaring nsCOMPtrs inside of do-while loops. 
cool beans.  I'm much happier knowing that Chris has isolated the problem rather 
than have me keep guessing at it.  r=jfrancis
sr=kin@netscape.com

Chris, go ahead and check that in, if you can get approval.
Assignee: jfrancis → cls
Status: ASSIGNED → NEW
a=blizzard on behalf of drivers for 0.9.4
Patch checked in.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
thanks to chris, kin, and scott for al lthe help on this.
*** Bug 97585 has been marked as a duplicate of this bug. ***
Christopher, can you verify this fix and mark this verified-fixed?

thanks.
Setting QA Contact to sheelar for verification.
Status: RESOLVED → VERIFIED
QA Contact: sujay → sheelar
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: