Closed Bug 119034 (anxioususer) Opened 23 years ago Closed 22 years ago

Crash in XPCOM:EventReceiver or nothing happens if you start Moz quickly after closing it

Categories

(Core :: XPCOM, defect)

x86
Windows NT
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.2alpha

People

(Reporter: jonasj, Assigned: blythe)

References

Details

(Keywords: crash)

Attachments

(2 files, 4 obsolete files)

Reproducable: Always. Start Mozilla. Close it again. Attempt to start it again
within 0.4 (or so) seconds. Nothing happens.
Mozilla isn't close if you select close and all windows are closed.
You try to start mozilla again while the old task is on shutdown...

(you can also see a crash if you do this)

wontfix ?

Jonas, what build are you using, and exactly how are you closing and starting? 
Does it matter what window you are closing?
> You try to start mozilla again while the old task is on shutdown...

Yes, but it's still annoying.

> (you can also see a crash if you do this)

Yeah, that has happened for me a couple of times. I forgot to mention it.

> Jonas, what build are you using, and exactly how are you closing and starting? 

Happens on any build. I close it by clicking the X in the upper right corner, by
pressing Alt+F4, or by pressing Ctrl+W. I start it again by clicking a shortcut
in the Quick Launch toolbar.

> Does it matter what window you are closing?

No.
->law. if you still get a crash in this scenario, please supply a TalkBack ID
and *exact* steps to reproduce.  Otherwise, this is a minor delay in a very
unusual scenario, and will have to wait for a future release.
Assignee: trudelle → law
I can't get a Talkback report, I think it is crashing after talkback is shut 
off.  I am attaching a .bmp showing the alert I get from Windows.  It says the 
error is in XPCOM:EventReceiver.

Setting component and reassigning.
Assignee: law → dougt
Component: XP Apps → XPCOM
QA Contact: sairuh → scc
Talkback IDs:
TB2119333Z
TB2119307K
TB2119268X

> this is a minor delay in a very unusual scenario

The scenario isn't that unusual for me - I often have only the mail window open,
close it, and try to open a browser window. However, since the crash only occurs
when there are no windows open, no data gets lost, so I agree that this bug can
wait.
Adding crash keyword. (But leaving severity as normal.)
Keywords: crash
The stacks look like:

0x00345b6d
NS_GetGlobalComponentManager
[d:\builds\seamonkey\mozilla\xpcom\components\nsComponentManager.cpp, line 3323]
nsComponentManager::CreateInstance
[d:\builds\seamonkey\mozilla\xpcom\components\nsComponentManagerObsolete.cpp,
line 101]
nsNativeAppSupportWin::GetCmdLineArgs
[d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsNativeAppSupportWin.cpp, line 1599]
nsNativeAppSupportWin::HandleRequest
[d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsNativeAppSupportWin.cpp, line 1328]
MessageWindow::WindowProc
[d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsNativeAppSupportWin.cpp, line 818]
USER32.DLL + 0x2e98 (0x77e12e98)
USER32.DLL + 0x39a3 (0x77e139a3)
USER32.DLL + 0x7b2e (0x77e17b2e)
ntdll.dll + 0x2032f (0x77fa032f)
USER32.DLL + 0x3cd2 (0x77e13cd2)
ole32.dll + 0x257b0 (0x77a757b0)
ole32.dll + 0x25609 (0x77a75609)
ole32.dll + 0x4d67 (0x77a54d67)
ole32.dll + 0x4610 (0x77a54610)
0xffffffff 
Bill, could this at all be related to DDE?
Not DDE, but it involves the same area of code.  It looks like what's happening
is this:

1. First process starts to shut down.
2. Second process starts up, sees first process is still running.
3. First process shuts down to a certain point, but is still capable of
receiving requests from other processes.
4. Second process sends request to first process.
5. First process tries to handle request but the system is is an unstable state.

Where things go bad will vary depending entirely on the timing.

The fix requires that we set some state at point 1. so that the second process
doesn't erroneously think that process 1 is still ready to field requests.  I
put in place a mutex semaphore to try to deal with that but it is not foolproof
(obviously).  As I recall, the biggest obstacle was that there was no easy way
to get notification in nsNativeAppSupportWin that we had commenced the shutdown.

Maybe there's a solution to that problem now.  What we need to do is close the
"MessageWindow" at the point where we're no longer in a position to receive
requests.

There would remain a small window where there could be a queueud request from a
second process that gets processed after we've made the decision to shut down. 
The best way to fix that is to make the shutdown decision while holding the same
mutex semaphore that the startup code in the second process is using, and, to
check for pending incoming requests before letting the shutdown proceed.  The
plan was for nsINativeAppSupport::Stop to be used for that purpose.  See the
comment at
http://lxr.mozilla.org/seamonkey/source/xpfe/appshell/public/nsINativeAppSupport.idl#86

Note that nsNativeAppSupportWin::Stop is not implemented up to that standard, as
of yet.  It is premised on the old DDE-based ipc mechanism, not the current
"message window WM_COPYDATA" mechanism.  In addition to checking for DDE
conversations being open, it needs to peek for WM_COPYDATA events in the queue.
 It also must close the message window when it decides shutdown is OK.  If we
fix that, and call nsINativeAppSupport::Stop at the point where we're deciding
to shutdown, then that should solve the problem.

Another solution might lie in the "prevent two processes from accessing the same
profile" fix that Conrad has done (in some other bug).  That code is for
protecting access to profiles, though, and may not protect against access to
this "message window."
over to bill.
Assignee: dougt → law
Component: XPCOM → XP Apps
It might also suffice to implement a partial solution that simply blocks the
incoming requests where we receive the WM_COPYDATA message (at
http://lxr.mozilla.org/seamonkey/source/xpfe/bootstrap/nsNativeAppSupportWin.cpp#809)
and we know we're shutting down.

That still requires some notification that we've begun the shutdown process to
get to nsNativeAppSupportWin.

Also, it may be that we don't actually need to check for queued WM_COPYDATA
requests in Stop().  If we just go ahead and close the message window, then
those queue requests might get discarded.
Component: XP Apps → XPCOM
Target Milestone: --- → mozilla1.0
Keywords: nsbeta1
*** Bug 124365 has been marked as a duplicate of this bug. ***
*** Bug 127412 has been marked as a duplicate of this bug. ***
nsbeta1- per ADT triage team
Keywords: nsbeta1nsbeta1-
Target Milestone: mozilla1.0 → mozilla1.2
*** Bug 123595 has been marked as a duplicate of this bug. ***
*** Bug 107350 has been marked as a duplicate of this bug. ***
*** Bug 130511 has been marked as a duplicate of this bug. ***
Summary: Can't start Mozilla again right after closing last open window → Crash in XPCOM:EventReceiver or nothing happens if you start Moz quickly after closing it
I had a crash about 3 minutes ago.

I send talkback data. I discovered so that bug 130511 is a duplicate of this one.

Hope my talkback data could help a little !
*** Bug 135216 has been marked as a duplicate of this bug. ***
Just to confirm... I have been seeing this for a while on Win2K. I can easily
and consistently reproduce it. It still exists on build 2002040103.

*** Bug 138390 has been marked as a duplicate of this bug. ***
*** Bug 140203 has been marked as a duplicate of this bug. ***
*** Bug 140790 has been marked as a duplicate of this bug. ***
*** Bug 144235 has been marked as a duplicate of this bug. ***
saw this with 1.0RC2, Windows 2000

(XPCOM:EventReceiver:Mozilla.exe - Application Error - the instruction at
"0x00235c8f" referenced memory at "0xffffffff" the memory could not be "read")

this could be worth a mention in the Release Notes
personally i'm not going to release note this. it's an interesting bug, but i'd 
hope that our average user has better things to do than to quit mozilla and 
then try to restart it.
Actually, since certain preference changes or installs require closing the 
browser for them to take effect, it is quite likely that someone who makes that 
change, closes the browser so it will take effect, and then immediately double 
clicks the desktop icon or clicks a shortcut bar icon, will see this error. In 
fact, as I recall, that's how I initially encountered this issue. (On a K6-
2/300 w/ 256 meg ram the delay is long enough that it is easy to accidentally 
restart within the time this error occurs. Perhaps on a faster machine one 
would have to be doing something unusual to trigger it.)
Two comments:
1. I have run into this many times even though I have a reasonably fast machine,
P3-850, 256 MB, 7200 rpm hard-drive, so I think it is more (not less) likely to
happen as users change their settings and try to restart the app.

2. I have this problem on WinNT, not W2K.  As I understand it, the OS setting is
supposed to reflect the older OS-version that exhibits the problem.  So, the OS
option for this bug should be changed from "Windows 2000" to "Windows NT" (if
not lower).  I don't have access rights (and it's not my bug) so I leave for
others to change!
OS: Windows 2000 → Windows NT
*** Bug 146221 has been marked as a duplicate of this bug. ***
*** Bug 146197 has been marked as a duplicate of this bug. ***
*** Bug 146481 has been marked as a duplicate of this bug. ***
Some more talkbacks: TB7017545E and TB6856455Y.

I also see this on a fairly fast machine (PIII 1Ghz, 512MB) but see it on Win2K
but NOT on WinXP Home.
*** Bug 149403 has been marked as a duplicate of this bug. ***
Blocks: 1.1b
*** Bug 154012 has been marked as a duplicate of this bug. ***
*** Bug 155729 has been marked as a duplicate of this bug. ***
*** Bug 146258 has been marked as a duplicate of this bug. ***
*** Bug 134348 has been marked as a duplicate of this bug. ***
Status: The status remains pretty much as I described in comment 11 and comment
13.  I haven't done any work on fixing this one and no fix is imminent (from me).
Blocks: 1.1
No longer blocks: 1.1b
*** Bug 158413 has been marked as a duplicate of this bug. ***
Alias: nervoususer
*** Bug 160163 has been marked as a duplicate of this bug. ***
If no fix is imminent (at least from Bill), and this bug is targetted at v1.2a,
then why is it a blocker for 1.1 (Bug #157592)?

Yes, I realise that crashes aren't very nice, but from reading this bug there is
no easy solution. To paraphrase (and use some different wording), what we have
here is an application starting up that is trying to communicate with an
application that is shutting down, and then getting upset because it can't
communicate with it. (Replace "application" with "process" if that makes you
feel better).
about target being 1.2a: there is currently no way to set a target of mozilla
1.1 (or 1.2, for that matter). so most bugs actually targeted for 1.1 end up
being targeted to 1.1b or 1.2a.
rpotts, are you able to help us with this while law is busy with other things?
if not, then who? This is clearly one of our most frequent crashers (don't let
the lack of talkback fool you; I think that talkback is missing this one).
Garrett, can you help?
Re Comment #44. So what is "mozilla1.1final" for?
Why is this so tough? It seems to me that all you need to do is create a shared
lock object, which you have to PRLock before starting the process shutdown and
before doing the communication during process startup. As the descending
process, you then unlock it after you're sufficently dead that it won't bother
the new process.
Law says no time and Garret says he'll take a look at this. Reassigning.
Assignee: law → blythe
Attached patch NULL check (obsolete) — Splinter Review
request for r,sr,a

A NULL check avoids a crash in the process shutting down.

This is not the real fix, as this does not solve the cause of the problem. 
Agree with comment #48.  I need more time to study the existing convolution;  I
am new to all things xpfe (nsNativeAppSupportWin.cpp).	Having the second
process do the right thing (load up) is not critical but could be solved by any
real fix.
Attached patch Destroy IPC message window (obsolete) — Splinter Review
Request for r,sr,a

Destroy our new window request window in nsNativeAppSupportWin::Quit().

nsAppShellService::Quit() calls nsNativeAppSupportWin::Quit(), and an
nsAppShellService comment explicitly states:  "Shutdown native app support;
doing this first will prevent new requests to open additional windows coming
in."
So we should do that.
Comment on attachment 94641 [details] [diff] [review]
Destroy IPC message window

r=timeless
fwiw dbradley and mjudge dislike the messagewindow class (esp the overloaded
operator).
Attachment #94641 - Flags: review+
Comment on attachment 94641 [details] [diff] [review]
Destroy IPC message window

Also please change the spacing of your if to match the file's.	When in Rome...

    if ( (HWND)mw ) {
Attached patch Destroy IPC message window (v2) (obsolete) — Splinter Review
format correct
Attachment #94641 - Attachment is obsolete: true
Comment on attachment 94637 [details] [diff] [review]
NULL check

Checked in NULL check, should fix crash.
sr=darin, a=asa

Still waiting on sr from law on MessageWindow destroy for real fix.
Attachment #94637 - Attachment is obsolete: true
Attached patch Destroy IPC message window (v3) (obsolete) — Splinter Review
Properly NULL out mHandle upon window Destroy, suggestions from #mozilla
Attachment #94714 - Attachment is obsolete: true
Garret, this never landed on the 1.1 branch (that I can tell) Can you get at
least the nullcheck into the 1.1 branch? THanks.
Comment on attachment 94795 [details] [diff] [review]
Destroy IPC message window (v3)

This fix isn't quite right.  The problem is that the code inserted into
nsNativeAppSupportWin::Quit can destroy another process's MessageWindow. 
Normally, that function will not be called in a process that doesn't "own" the
system-wide singleton instance of such a window, but that's not always the
case.

Two exceptions that I can think of off the top of my head:
a. A restarting QL scenario where we destroy the message window then restart
another process that will create a new one.
b. Cases involving MOZ_NO_REMOTE.

The simplest fix might be to check the process that owns the resulting window
and not destroy it if it's not the currently running process.
Attachment #94795 - Flags: needs-work+
Thanks for the review Bill.  We talked about the possibility of destroying
another apps MessageWindow, and DestroyWindow (win32 sdk) will only destroy a
window that was created from the same thread, so it is safe.

However, during our talk, we decided that a lookup lock should be employed as
elsewhere in the file, and that I should comment the DestroyWindow behavior.

revised patch forthcoming....

Asa, on comment #57, after I looked into this I find that my NULL check is a
trunk only fix as there has already been drift from the branch in that file.

I think it is best if we proceed with the IPC patch that is forthcoming....
Finale (we hope)

Request for r,sr,a
Attachment #94795 - Attachment is obsolete: true
Comment on attachment 94957 [details] [diff] [review]
Destroy IPC message window (v4)

r=pavlov
Attachment #94957 - Flags: review+
Comment on attachment 94957 [details] [diff] [review]
Destroy IPC message window (v4)

I just got Asa's voicemail about this today - this looks fine..sr=alecf
though for this line:
+    Mutex mutexLock = Mutex(mMutexName);

we like this convention:
+    Mutex mutexLock(mMutexName);
Attachment #94957 - Flags: superreview+
Comment on attachment 94957 [details] [diff] [review]
Destroy IPC message window (v4)

a=asa (on behalf of drivers) for checkin to the  1.1 branch. Garret, can you
get this in quickly? Time is short. Thanks.
Attachment #94957 - Flags: approval+
checked in trunk and branch, closing
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
i'm mailing drivers to ask about 1.0.1, i spent today using n7p1 (mostly for
spellchecker and skin switching games, i really liked using "modern the default
skin for Netscape" in Netscape7). I insist that I'm an *anxious* user, not a
nervous user.
Alias: nervoususer → anxioususer
Keywords: nsbeta1-mozilla1.0.1
*** Bug 190958 has been marked as a duplicate of this bug. ***
verified with recent trunk build.
Status: RESOLVED → VERIFIED
*** Bug 261529 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: