Closed Bug 52808 Opened 20 years ago Closed 20 years ago

freeze exiting composer with File|Close, File|Quit or Cmd/Ctrl+W

Categories

(Core :: DOM: Navigation, defect, P1, critical)

x86
All
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: doronr, Assigned: sfraser_bugs)

References

Details

(Whiteboard: [rtm++][p:1])

Attachments

(5 files)

Exiting composer causes all 3 platforms to crash. This is a smoketest blocker.
cc: cmanske

Putting this in editor for triange.
keyword magic. nominating for nsbeta3
File | Close causes browser and composer to freeze....not File | Quit...

happens in today's 9/15 build...
adding: hewitt@netscape.com,nbhatla@netscape.com,hyatt@netscape.com, who have
changed composer stuff in the past 24 hours.
on win32, I freeze also when I close composer as well. Could be a different bug
though
File|Quit causes a freeze for me too.  If I File|Quit, the console is left open
with the last message in EditorShutdown, just like the File|Close menu item.
Is this all platforms? Adding sfraser and kin they're very good at figuring out 
this type of problem
I see it in windows - it's stuck in editor shutdown at:
NTDLL! 77f6829b()
KERNEL32! 77f04f41()
_PR_WaitCondVar(PRThread * 0x036c9510, PRCondVar * 0x02faec40, PRLock * 
0x02faecf0, unsigned int 4294967295) line 185 + 23 bytes
PR_Wait(PRMonitor * 0x02faeda0, unsigned int 4294967295) line 155 + 29 bytes
nsAutoMonitor::Wait(unsigned int 4294967295) line 197 + 17 bytes
nsThreadPool::GetRequest(nsIThread * 0x036c9680) line 458 + 10 bytes
nsThreadPoolRunnable::Run(nsThreadPoolRunnable * const 0x036c96d0) line 685 + 27 
bytes
nsThread::Main(void * 0x036c9680) line 84 + 26 bytes
_PR_NativeRunThread(void * 0x036c9510) line 399 + 13 bytes
Updating summary and lowering severity to Critical.  Workarounds for win32 are 
to use the X or couble-click on the mozilla icon in the composer titlebar or 
right click on the icon in the windows taskbar and chose Close.  Workaround on 
Linux and Mac are similar.  Note, Alt+F4 on win32 seems to work for me too but 
Command+w on Mac and Ctrl+w on linux and win32 do not seem to work.
Severity: blocker → critical
Summary: exiting composer causes app to freezee → freeze exiting composer with File|Close, File|Quit or Cmd/Ctrl+W
I disagree: Using "X" on my Windows debug build hangs.
ok, I killed my profile, and clicking the x is now closing it correctly.
I traced through the editor closing code:
After exiting correctly from nsEditorShell::Shutdown(), 
we freeze in:
NS_IMETHODIMP nsWebShell::Destroy()
{
  nsresult rv = NS_OK;

  //Fire unload event before we blow anything away.
  rv = FireUnloadEvent();
*** WE FREEZE HERE:
  nsDocShell::Destroy();

  SetContainer(nsnull);

  return NS_OK;
}
So we have a reference to a docshell that's not released? Doesn't this sound
like last night's problem?
More specifically, in  nsDocShell::Destroy(),
we freeze at:
mScriptContext->SetOwner(nsnull);
same thing that freezes when tring to close "Search mail/News messages" dialog?
Yes, exactly the same problem.
Kicking over to XPFE team since it doesn't seem to be app-specific.
Assignee: beppe → trudelle
Docshell problem.
Component: Editor → Embedding: Docshell
etc.
Assignee: trudelle → adamlock
QA Contact: sujay → adamlock
*** Bug 53015 has been marked as a duplicate of this bug. ***
This sounds pretty ugly, but it also sounds like there are work-arounds. I could 
go minus, but it sounds like the work-arounds are hard to discover, and cost at 
least several crashes :-/
Marking dogfood-plus, but we could change if the work-arounds were considered 
obvious by folks.
This "freeze" is close enough to "crash" that I think this should get a higher 
priority than P3 during beta3 triage.  I'm adding the "crash" keyword.
Keywords: crash
Whiteboard: [dogfood+]
I think this is an editor bug.

The freeze appears to be occurring in the JS garbage collection happening at 
this line:

http://lxr.mozilla.org/seamonkey/source/js/src/jsapi.c#730

Step over the garbage collection line and the exit continues.

The stack trace:

JS_BeginRequest(JSContext * 0x032e1660) line 730
DoPreScriptEvaluated(JSContext * 0x032e1660) line 49 + 10 bytes
nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJSClass * const 0x033ab350, 
nsXPCWrappedJS * 0x033ab2c0, unsigned short 0x0004, const nsXPTMethodInfo * 
0x010e50d4, nsXPTCMiniVariant * 0x0012d508) line 543 + 9 bytes
nsXPCWrappedJS::CallMethod(nsXPCWrappedJS * const 0x033ab2c0, unsigned short 
0x0004, const nsXPTMethodInfo * 0x010e50d4, nsXPTCMiniVariant * 0x0012d508) line 
319
PrepareAndDispatch(nsXPTCStubBase * 0x033ab2c0, unsigned int 0x00000004, 
unsigned int * 0x0012d5bc, unsigned int * 0x0012d5a8) line 100 + 31 bytes
SharedStub() line 125

The method being invoked by this thread is "NotifyDocumentWillBeDestroyed". A 
search for implementations of this method suggests the call is trying to call 
through to the JS implementation of  DocumentStateListener in editor.js. 

Could it be that the object is not there any more, in which case:

A. Shouldn't it be removed from the state listener list when the document is 
destroyed? editor.js should call UnregisterDocumentStateListener during the 
unload.

B. Why doesn't XPC know it's not there any more and return an error of some 
sort? Waiting forever for garbage collection on another thread doesn't seem safe 
somehow.


That patch might prevent doc listeners from getting the 'will be destroyed' call 
at all. We should probably move the place where we are firing this notification 
(currently in the dtor of nsEditorShell, I think). I'll take this.
Assignee: adamlock → sfraser
What's happenning here is this:
Composer is being torn down, with the last nsEditorShell reference going away as 
a result of GC. This in turn released the nsEditor, whose dtor calls 
DocumentStateListeners to notify them that the document is going away. And we 
have a doc state listener implmeneted in JS, which lives in the JS context that 
is being killed right now.

I can fix this in editor, but I wonder if it's a code pattern that XPConnect/JS 
needs to handle better?
Status: NEW → ASSIGNED
sfraser, adamlock: what's the main thread doing while the other thread is
blocking in JS_BeginRequest?  Did the main thread crash, but not the whole app
(process)?

If this is a deadlock, and the main crash is waiting on some other lock, let's
identify both halves of the deadly embrace.

sfraser: finalization order is random, but if native code is holding weak refs
(unrooted pointer) into the JS GC heap, it's cruising for a bruising.  Or else I
am, because I don't understand this bug (but you need to say more).

/be

Brendan,

This is the main thread. No others are doing anything interesting. I can't see 
down the stack in MSVC past SharedStub, but I bet there is a nested gc and the 
gc/Request logic is broken
Ah, I didn't recognize that thread as main precisely because its stack didn't go
down through main.

The gc/request logic is not broken-as-defined, but we are badly breaking it with
finalizers that try to run within requests.  Argh!  Perhaps we can extend the
request model to make JS_BeginRequest aware of when the GC is running on the
current thread.  Patch coming up.

/be
r=jband

Looks right and works for me.
I have patches to fix teh bad editor logic here too, which I'd like to get in. So 
by all means fix the JS engine, but I'll be fixing at a higher level.
jsapi.c fix checked in.

/be
*** Bug 53185 has been marked as a duplicate of this bug. ***
setting milestone
Target Milestone: --- → M18
marking fixed, 2000091905 win98 and linux. Someone on a mac please verify.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
I need to use this bug to fix bad editor shutdown behaviour as well.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I need to use this bug to fix bad editor shutdown behaviour, including the 
following problems
1. The JS doc state listener's NotifyDocumentDestroyed method will get called too
   late, while the editor shell is being destroyed. Therefore implmeentations
   can't do anything with the editor shell -- if they try, it could cause bad
   things to happen.

2. I noticed that although mailcompose is using <editor>, they are also making
   an editorShell in JS, which means there are *two* editorShells there; this may
   be causing bugs

3. AIM needs to change from using <iframe> to <editor>, with associcated minor
   JS changes.

This stuff needs to be fixed for rtm.
Whiteboard: [dogfood+]
setting to rtm+ and p1 per review
Priority: P3 → P1
Whiteboard: [rtm+][p:1]
Simon, please include the required information per the rtm checkin rules
Whiteboard: [rtm+][p:1] → [rtm+ NEED INFO][p:1]
m19
Target Milestone: M18 → M19
Patch coming up, that will require review from syd, ducarroz, and a super review
I attach diffs (see the second set of mozilla diffs), which are long, but pretty 
mundane. These diffs do the following:

1. Ensure that all users of <editor> get their editorShell from the 
nsEditorBoxObject, rather than making one of their own in JS. This should reduce 
bloat, and the potential for bugs where the wrong editorShell is being addressed.

2. Ensure that editorShell teardown occurs early enough to fire the "document 
will be destroyed" notification on observers (this is the notification that 
caused the original bug, since it came too late), and that teardown is the same 
between composer/text widgets/mail/aim.

3. Move the responsibility for calling editorShell->Init from JS to the 
nsEditorBoxObject (with some double-call protection in nsEditorShell::Init()).

Incidental fixes are:
1. Fix nsEditorBoxObject to use the correct NS_IMPL_ISUPPORTS_INHERITED macro
2. Remove some redundant code from nsGfxTextControlFrame2::Destroy()

Changes per file:
nsIEditor.h, nsEditor.h, nsEditor.cpp: Add a PreDestroy() method that is called 
before editor destruction.

nsGfxTextControlFrame2.cpp: call editor->PreDestroy(), and remove some redundant 
code.

nsEditorShell.h, nsEditorShell.cpp: add protection against Init() being called 
twice, and call editor->PreDestroy() from the Shutdown() method.

nsEditorBoxObject.cpp: convert to using NS_IMPL_ISUPPORTS_INHERITED, since this 
inherits from nsBoxObject. Override nsBoxObject's Init() method, and create the 
editor shell there (instead of on the first call to GetEditorShell). Call 
editorShell->Shutdown() from the SetDocument() call, when we know the editor is 
going away.

editor.js: No longer need to call editorShell.Shutdown(), since the 
editorBoxObject does this now.

MsgComposeCommands.js: remove the explicit creation of the editorShell, instead 
getting it from the editorElement. Remove the editorShell.Init() call since this 
is now in C++.

The AIM diffs are similar to the MsgComposeCommands.js changes: replace explicit 
editorShell creation with code to get it from the editor element, and remove 
.Init() calls.
Status: REOPENED → ASSIGNED
Just peeking, a few nits:

>   rv = mEditorShell->Init();
>   if (NS_FAILED(rv)) return rv;
>   
>   return NS_OK;

Isn't this better said via:

>   return mEditorShell->Init();

or (if it makes for easier maintenance):

>   rv = mEditorShell->Init();
>   return rv;

provided mEditorShell->Init never returns a success code other than NS_OK?

I see hard tabs in MsgComposeCommands.js, in violation of the sacred Emacs 
modeline comment.  Can you persuade CW to expand to the proper number of spaces? 
Thanks.  r=brendan@mozilla.org FWIW.

/be
Re: returning errors. Yes, I can do return mEditorShell->Init();
Re: hard tabs in JS files -- mailnews (and AIM) XUL and JS is full of tabs.
I was trying to do as the Romans.
Romans didn't do that hard-tab injection in violation of the modeline and of 
Roman Law -- Vandals did.  The use of tabs is spotty anyway, and violates the 
covenant of the modeline, which keeps the peace in the old Republic.  I say fix 
incrementally, esp. if adding code.

/be
I am the culprit for the hard-tab in MsgComposeCommands.js. I have changed the 
setting of all my IDE to use 2 spaces instead and try to convert it every time I 
have to change some code in it. Sorry for the mess!
I have applied the patch on Windows and tested message compose. Sofar so good, 
no problem.
Joe, could you apply the AIM patches and vishy, could you super-review the AIM 
portion? I'd do this but will be away till Monday.
for the message compose side, r=ducarroz
Do I need to apply all the patches to test the AIM fix for the double editor
problem? If I apply just the AIM set of patches I get an error, that is causing
the screenname field not be automatically filled in:
JavaScript error:
 line 0: uncaught exception: [Exception... "Component returned failure code:
0xc1f30001 (NS_ERROR_NOT_INITIALIZED) [nsIEditorShell.GetString]"  nsresult:
"0xc1f30001 (NS_ERROR_NOT_INITIALIZED)"  location: "JS frame ::
chrome://editor/content/editor.js :: GetString :: line 301"  data: no]

and then later,
JavaScript error:
 line 0: uncaught exception: [Exception... "Component returned failure code:
0x80004005 (NS_ERROR_FAILURE) [nsPIAimIM.OnWindowUnload]"  nsresult: "0x80004005
(NS_ERROR_FAILURE)"  location: "JS frame :: chrome://aim/content/IM.js ::
AimIMOnWndUnload :: line 318"  data: no]


Yes, you have to apply all the patches, otherwise editorShell.Init will never be 
called.
a=hyatt.
r=kin@netscape.com for editor portions of the diff.
I'm pulling a new tree now to test this for IM. don't check this in until I test
it. Thanks.
r=jelwell for the AIM portion. checked on linux and mac branch pulls from 10/09/2000
Whiteboard: [rtm+ NEED INFO][p:1] → [rtm NEED INFO][p:1]
removing + per pdt sw rules
rtm++ since this patch has had lots of reviews, and ducarroz has been running
with it for a while.
Whiteboard: [rtm NEED INFO][p:1] → [rtm++][p:1]
Fixes are in, trunk and branch.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
sujay - pls verify for composer.
QA Contact: adamlock → sujay
verified in 10/16 build on all 3 platforms.
Status: RESOLVED → VERIFIED
has this been verified on the trunk? am not sure, so am reopening, re-resolving
so that people using the trunk could test it out (unless it already has been
tested).
Status: VERIFIED → REOPENED
Keywords: vtrunk
Resolution: FIXED → ---
re-resolving. pls verify using trunk bits. thx!
Status: REOPENED → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
verified on trunk 10/17 build.
Status: RESOLVED → VERIFIED
Removing vtrunk keyword to pull this off the needs verifying on trunk radar (and
becasue these keywords will go away soon)
also Verified Fixed on Mozilla trunk builds
linux 101908 RedHat 6.2
win32 101904 NT 4
mac 101904 Mac OS9
Keywords: vtrunk
You need to log in before you can comment on or make changes to this bug.