app crash when bringing up composition window (for both new or reply to msgs) -- gcc -O2

VERIFIED FIXED in mozilla0.9.4

Status

()

Core
Editor
--
blocker
VERIFIED FIXED
17 years ago
17 years ago

People

(Reporter: cls, Assigned: cls)

Tracking

({crash, regression})

Trunk
mozilla0.9.4
x86
Linux
crash, regression
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: possible fix attached; need testing)

Attachments

(6 attachments)

(Assignee)

Description

17 years ago
I'm sure this is a dupe but I can't seem to find the previous bugs that I've
seen on this topic.  The entire app crashes whenever I try to "Reply To" an
existing mail message.  It also crashes when I try to compose a new message. 
First noticed in a custom build from 1am.  I repulled at 12:30pm and clobbered.
 I do not see the problem in the egcs nightly build but the gcc 2.95.2 -O2
nightly build crashes as well.  My local builds are gcc 2.95.2 -O2 & -O3.
(Assignee)

Comment 1

17 years ago
Created attachment 47073 [details]
stdout log + backtrace
(Assignee)

Updated

17 years ago
Keywords: crash, smoketest
I am on vacation. Reassign to varada.
Assignee: ducarroz → varada
(Assignee)

Comment 3

17 years ago
Ok, after backing out jfrancis' checkin from 8/21 22:32, the crash went away.
Reassigning.
Assignee: varada → brade
Component: Composition → Editor
Product: MailNews → Browser
QA Contact: sheelar → sujay
(Assignee)

Comment 4

17 years ago
Moving to jfrancis.
Assignee: brade → jfrancis

Comment 5

17 years ago
doh!  Investigating.  
Status: NEW → ASSIGNED

Comment 6

17 years ago
Any other platforms seeing this?  Is there a known problem with functions that 
return nsCOMPtrs on gcc?  From the stack trace I can only assume gcc is doing the 
wrong thing in this situation.  I am unable to reproduce this on mac.

Scott, do you know of gcc probs with functions that return nsCOMPtrs?
(Assignee)

Comment 7

17 years ago
Afaik, it's only gcc and only with a higher optimization level (-O2 or higher).
 I thought it was similar to bug 80988 and bug 61501 but I can't seem to find a
quick fix for this one.  The previous bugs were caused by gcc mis-optimizing
nsCOMPtrs when they were declared inside a do-while loop.  This problem appears
to be fixed in the gcc 3.x releases but we haven't upgraded to them.

Summary: app crash when bringing up composition window (for both new or reply to msgs) → app crash when bringing up composition window (for both new or reply to msgs) -- gcc -O2

Comment 8

17 years ago
adding regression keyword, and making "the -O[2,3] bug" dependent on this...
Blocks: 53486
Keywords: regression

Comment 9

17 years ago
Well... maybe a stupid idea... what about using |volatile| to tell the compiler
to turn all optimisations off for that variable (assuming that gcc listens to
|volatile| ...) ?

Comment 10

17 years ago
It's a return value.  Is
volatile nsCOMPtr<nsIFoo> Bar();
a valid function declaration?  I haven't seen volatile used that way before, nor 
do I know what it would mean there.
Does anyone have a better stack-trace on this?  With arguments, perhaps?  And
where in nsSupportsArray::AppendElement it is?  This may very well be the
compiler bug, but I can't be sure from the stack trace.

Even better, a disasm of the code?

Comment 12

17 years ago
The problem won't be seen by looking at disasm of AppendElement.  The real 
problem is inside nsEditor::GetNextNode().  It is calling a routine in there that 
returns an nsCOMPtr.  Before the comptr was an outParam of the function (ie, an *
nsCOMPtr).  Now it's the return value. GCC seems to be destroying the comptr 
prematurely somewhere along the way - perhaps the temporary generated by the 
compiler is getting destroyed before it is used to set the returned value.
Note that
http://mozilla.org/projects/xpcom/nsCOMPtr.html#guide_nsCOMPtr_in_APIs
says that |nsCOMPtr|s should not be used as return values of functions.

Comment 14

17 years ago
This is not seen in the release builds for today, removing smoketest keyword. 
Keywords: smoketest

Comment 15

17 years ago
In an effort to get the tree open quickly, couldn't we just change GetNextNode() 
and GetPriorNode() to assign into COMPtrs when 
calling GetLeftmostChild()/GetRightmostChild()?

I believe right now they are assigning directly into raw pointers, which could 
be why things are getting released prematurely?

Comment 16

17 years ago
Ok, I misread the code ... it's been a long day already ... disregard my 
previous statements above. I have a clue now. ;-)

Comment 17

17 years ago
I disagree with the guideline DBaron mentions.  I wouldn't disagree if the
"already addrefed" return type worked correctly, but kin experienced problems
wen trying to use it.  In the meantime, comptrs are the lesser of evils. 
Non-comptrs have greater opportunity for usage error (not agreeing on ownership)
than comptrs (assigning into non-comptrs, which I never do anyway).

Comment 18

17 years ago
is there a way to crank down gcc opts in the souce with pragmas?

Comment 19

17 years ago
Created attachment 47257 [details] [diff] [review]
diffs to work around gcc prob

Comment 20

17 years ago
fix in hand; need reviews, and need a= if we want for 094
Whiteboard: fix in hand; need r/sr/a
Target Milestone: --- → mozilla0.9.4

Comment 21

17 years ago
also, if a linux gcc user could verify that this fixes things that would be nice.

Comment 22

17 years ago
oops, didn't diff all the files, new patch coming shortly

Comment 23

17 years ago
Created attachment 47259 [details] [diff] [review]
more complete diffs
(Assignee)

Comment 24

17 years ago
After the patch, it still crashes with the same stack.  I've tried compiling a
couple of the files with -g to get some extra debugger info but that seems to
make the crash go away. *sigh*

Comment 25

17 years ago
I created a new profile and started up Mail/News. I hit the "New Msg" button and
the composer window opened fine along with the account creation wizard, in which
I created a imap account instance. When I hit finish, Mozilla core dumped. Here
are the last few lines of gdb's output. I'll attach the full output separately.

Loaded symbols for
/export/src/mozilla/builds/mozilla-trunk/dist/bin/components/libmsgimap.so
Reading symbols from
/export/src/mozilla/builds/mozilla-trunk/dist/bin/components/libmsgdb.so...done.
Loaded symbols for
/export/src/mozilla/builds/mozilla-trunk/dist/bin/components/libmsgdb.so
#0  0x40b1a0a5 in nsDocument::~nsDocument ()
   from /export/src/mozilla/builds/mozilla-trunk/dist/bin/components/libgkcontent.so
(gdb) 

I did all of this WITH the latest patch in this bug thread.

Comment 26

17 years ago
Created attachment 47297 [details]
stack trace from an editor crash in a new profile

Comment 27

17 years ago
*** Bug 97081 has been marked as a duplicate of this bug. ***

Comment 28

17 years ago
attaching possible fix from scc and kin
Whiteboard: fix in hand; need r/sr/a → possible fix attached; need testing

Comment 29

17 years ago
Created attachment 47480 [details] [diff] [review]
patch for editor base

Comment 30

17 years ago
The latest patch fails on compile (Linux/gcc2.95.3) with the following:

c++ -o nsEditor.o -c -DENABLE_EDITOR_API_LOG=1 -DOSTYPE=\"Linux2.4\"
-DOSARCH=\"Linux\" -DOJI   -I../../dist/include -I../../dist/include
-I/export/src/mozilla/builds/mozilla-trunk/dist/include/nspr     
-I/usr/X11R6/include   -fPIC  -I/usr/X11R6/include -fno-rtti -fno-exceptions
-Wall -Wconversion -Wpointer-arith -Wbad-function-cast -Wcast-align
-Woverloaded-virtual -Wsynth -pedantic -Wno-long-long -pipe -pthread -O2
-march=i686  -DNDEBUG -DTRIMMED -I/usr/X11R6/include -DMOZILLA_CLIENT -include
../../config-defs.h -Wp,-MD,.deps/nsEditor.pp
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp: In method `struct
already_AddRefed<nsIDOMNode> nsEditor::GetRightmostChild(nsIDOMNode *, int = 0)':
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp:3398: conversion from
`nsCOMPtr<nsIDOMNode>' to non-scalar type `already_AddRefed<nsIDOMNode>' requested
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp:3388: warning: unused
variable `nsresult result'
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp: In method `struct
already_AddRefed<nsIDOMNode> nsEditor::GetLeftmostChild(nsIDOMNode *, int = 0)':
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp:3425: conversion from
`nsCOMPtr<nsIDOMNode>' to non-scalar type `already_AddRefed<nsIDOMNode>' requested
/export/src/mozilla/mozilla/editor/base/nsEditor.cpp:3415: warning: unused
variable `nsresult result'
gmake[3]: *** [nsEditor.o] Error 1
gmake[3]: Leaving directory `/export/src/mozilla/builds/mozilla-trunk/editor/base'
gmake[2]: *** [install] Error 2
gmake[2]: Leaving directory `/export/src/mozilla/builds/mozilla-trunk/editor'
gmake[1]: *** [install] Error 2
gmake[1]: Leaving directory `/export/src/mozilla/builds/mozilla-trunk'
gmake: *** [build] Error 2
(Assignee)

Comment 31

17 years ago
With the 8/28 23:00 gcc295 nightly, I'm not seeing this problem any longer.  I'm
also not seeing it in my local builds any longer.  Not sure what "fixed" it.  Is
anyone else still seeing the problem?

Comment 32

17 years ago
Hmm, my build from 2001-08-28, between 12:00 and 13:00 PST (I think, it was
around 21:30 MEST here when I pulled) does work OK as well though (I think)
about 24 hours earlier I saw this still happening. What has happened here?

Comment 33

17 years ago
I pulled the latest changes at 2:30 and did a depend build. A relatively new
profile (the one that caused the crash in my message of 2001-08-27 17:58) was
able to start up composer and send an email just fine.

The same build, with my normal profile (which had been imported from NN4.7 into
Mozilla 0.9 or so) freezes with the same symptoms it has been having for the
last week or so: whole app freezes and I have to Control-C to kill it.

Just to be clear: I only had an actual core-dump once. Every other time it has
just hung the whole app.

Here's my .mozconfig, in case that might help:

ac_add_options --disable-logging
ac_add_options --disable-editor-api-log
ac_add_options --disable-tests
ac_add_options --disable-debug
ac_add_options --enable-crypto
ac_add_options --enable-optimize="-O2 -march=i686"
ac_add_options --with-jpeg
ac_add_options --with-zlib
ac_add_options --with-png
ac_add_options --with-gtk
(Assignee)

Comment 34

17 years ago
Ok, I verified that it was kin's checkin to nsEditor.{h,cpp} yesterday that
"fixed" the problem.  I also noticed, that with a pre-kin version of
nsEditor.cpp, if I move the declaration of node outside of the do-while loops in
GetPriorNode() & GetNextNode(), then the app doesn't crash when the window comes
up.  It crashes when the first character is typed in the window.  

(Assignee)

Comment 35

17 years ago
Created attachment 47565 [details] [diff] [review]
pre-kin fix for composition crash
(Assignee)

Comment 36

17 years ago
Ok, I found the pre-kin fix for the original crash problem that supported my
original hypothesis of not declaring nsCOMPtrs inside of do-while loops. 

Comment 37

17 years ago
cool beans.  I'm much happier knowing that Chris has isolated the problem rather 
than have me keep guessing at it.  r=jfrancis

Comment 38

17 years ago
sr=kin@netscape.com

Chris, go ahead and check that in, if you can get approval.
Assignee: jfrancis → cls
Status: ASSIGNED → NEW
a=blizzard on behalf of drivers for 0.9.4
(Assignee)

Comment 40

17 years ago
Patch checked in.
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED

Comment 41

17 years ago
thanks to chris, kin, and scott for al lthe help on this.

Comment 42

17 years ago
*** Bug 97585 has been marked as a duplicate of this bug. ***

Comment 43

17 years ago
Christopher, can you verify this fix and mark this verified-fixed?

thanks.

Comment 44

17 years ago
Setting QA Contact to sheelar for verification.
Status: RESOLVED → VERIFIED
QA Contact: sujay → sheelar
You need to log in before you can comment on or make changes to this bug.