Closed Bug 119034 (anxioususer) Opened 23 years ago Closed 23 years ago

Crash in XPCOM:EventReceiver or nothing happens if you start Moz quickly after closing it

Categories

(Core :: XPCOM, defect)

x86
Windows NT
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.2alpha

People

(Reporter: jonasj, Assigned: blythe)

References

Details

(Keywords: crash)

Attachments

(2 files, 4 obsolete files)

Reproducable: Always. Start Mozilla. Close it again. Attempt to start it again within 0.4 (or so) seconds. Nothing happens.
Mozilla isn't close if you select close and all windows are closed. You try to start mozilla again while the old task is on shutdown... (you can also see a crash if you do this) wontfix ?
Jonas, what build are you using, and exactly how are you closing and starting? Does it matter what window you are closing?
> You try to start mozilla again while the old task is on shutdown... Yes, but it's still annoying. > (you can also see a crash if you do this) Yeah, that has happened for me a couple of times. I forgot to mention it. > Jonas, what build are you using, and exactly how are you closing and starting? Happens on any build. I close it by clicking the X in the upper right corner, by pressing Alt+F4, or by pressing Ctrl+W. I start it again by clicking a shortcut in the Quick Launch toolbar. > Does it matter what window you are closing? No.
->law. if you still get a crash in this scenario, please supply a TalkBack ID and *exact* steps to reproduce. Otherwise, this is a minor delay in a very unusual scenario, and will have to wait for a future release.
Assignee: trudelle → law
I can't get a Talkback report, I think it is crashing after talkback is shut off. I am attaching a .bmp showing the alert I get from Windows. It says the error is in XPCOM:EventReceiver. Setting component and reassigning.
Assignee: law → dougt
Component: XP Apps → XPCOM
QA Contact: sairuh → scc
Talkback IDs: TB2119333Z TB2119307K TB2119268X > this is a minor delay in a very unusual scenario The scenario isn't that unusual for me - I often have only the mail window open, close it, and try to open a browser window. However, since the crash only occurs when there are no windows open, no data gets lost, so I agree that this bug can wait.
Adding crash keyword. (But leaving severity as normal.)
Keywords: crash
The stacks look like: 0x00345b6d NS_GetGlobalComponentManager [d:\builds\seamonkey\mozilla\xpcom\components\nsComponentManager.cpp, line 3323] nsComponentManager::CreateInstance [d:\builds\seamonkey\mozilla\xpcom\components\nsComponentManagerObsolete.cpp, line 101] nsNativeAppSupportWin::GetCmdLineArgs [d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsNativeAppSupportWin.cpp, line 1599] nsNativeAppSupportWin::HandleRequest [d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsNativeAppSupportWin.cpp, line 1328] MessageWindow::WindowProc [d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsNativeAppSupportWin.cpp, line 818] USER32.DLL + 0x2e98 (0x77e12e98) USER32.DLL + 0x39a3 (0x77e139a3) USER32.DLL + 0x7b2e (0x77e17b2e) ntdll.dll + 0x2032f (0x77fa032f) USER32.DLL + 0x3cd2 (0x77e13cd2) ole32.dll + 0x257b0 (0x77a757b0) ole32.dll + 0x25609 (0x77a75609) ole32.dll + 0x4d67 (0x77a54d67) ole32.dll + 0x4610 (0x77a54610) 0xffffffff
Bill, could this at all be related to DDE?
Not DDE, but it involves the same area of code. It looks like what's happening is this: 1. First process starts to shut down. 2. Second process starts up, sees first process is still running. 3. First process shuts down to a certain point, but is still capable of receiving requests from other processes. 4. Second process sends request to first process. 5. First process tries to handle request but the system is is an unstable state. Where things go bad will vary depending entirely on the timing. The fix requires that we set some state at point 1. so that the second process doesn't erroneously think that process 1 is still ready to field requests. I put in place a mutex semaphore to try to deal with that but it is not foolproof (obviously). As I recall, the biggest obstacle was that there was no easy way to get notification in nsNativeAppSupportWin that we had commenced the shutdown. Maybe there's a solution to that problem now. What we need to do is close the "MessageWindow" at the point where we're no longer in a position to receive requests. There would remain a small window where there could be a queueud request from a second process that gets processed after we've made the decision to shut down. The best way to fix that is to make the shutdown decision while holding the same mutex semaphore that the startup code in the second process is using, and, to check for pending incoming requests before letting the shutdown proceed. The plan was for nsINativeAppSupport::Stop to be used for that purpose. See the comment at http://lxr.mozilla.org/seamonkey/source/xpfe/appshell/public/nsINativeAppSupport.idl#86 Note that nsNativeAppSupportWin::Stop is not implemented up to that standard, as of yet. It is premised on the old DDE-based ipc mechanism, not the current "message window WM_COPYDATA" mechanism. In addition to checking for DDE conversations being open, it needs to peek for WM_COPYDATA events in the queue. It also must close the message window when it decides shutdown is OK. If we fix that, and call nsINativeAppSupport::Stop at the point where we're deciding to shutdown, then that should solve the problem. Another solution might lie in the "prevent two processes from accessing the same profile" fix that Conrad has done (in some other bug). That code is for protecting access to profiles, though, and may not protect against access to this "message window."
over to bill.
Assignee: dougt → law
Component: XPCOM → XP Apps
It might also suffice to implement a partial solution that simply blocks the incoming requests where we receive the WM_COPYDATA message (at http://lxr.mozilla.org/seamonkey/source/xpfe/bootstrap/nsNativeAppSupportWin.cpp#809) and we know we're shutting down. That still requires some notification that we've begun the shutdown process to get to nsNativeAppSupportWin. Also, it may be that we don't actually need to check for queued WM_COPYDATA requests in Stop(). If we just go ahead and close the message window, then those queue requests might get discarded.
Component: XP Apps → XPCOM
Target Milestone: --- → mozilla1.0
Keywords: nsbeta1
*** Bug 124365 has been marked as a duplicate of this bug. ***
*** Bug 127412 has been marked as a duplicate of this bug. ***
nsbeta1- per ADT triage team
Keywords: nsbeta1nsbeta1-
Target Milestone: mozilla1.0 → mozilla1.2
*** Bug 123595 has been marked as a duplicate of this bug. ***
*** Bug 107350 has been marked as a duplicate of this bug. ***
*** Bug 130511 has been marked as a duplicate of this bug. ***
Summary: Can't start Mozilla again right after closing last open window → Crash in XPCOM:EventReceiver or nothing happens if you start Moz quickly after closing it
I had a crash about 3 minutes ago. I send talkback data. I discovered so that bug 130511 is a duplicate of this one. Hope my talkback data could help a little !
*** Bug 135216 has been marked as a duplicate of this bug. ***
Just to confirm... I have been seeing this for a while on Win2K. I can easily and consistently reproduce it. It still exists on build 2002040103.
*** Bug 138390 has been marked as a duplicate of this bug. ***
*** Bug 140203 has been marked as a duplicate of this bug. ***
*** Bug 140790 has been marked as a duplicate of this bug. ***
*** Bug 144235 has been marked as a duplicate of this bug. ***
saw this with 1.0RC2, Windows 2000 (XPCOM:EventReceiver:Mozilla.exe - Application Error - the instruction at "0x00235c8f" referenced memory at "0xffffffff" the memory could not be "read") this could be worth a mention in the Release Notes
personally i'm not going to release note this. it's an interesting bug, but i'd hope that our average user has better things to do than to quit mozilla and then try to restart it.
Actually, since certain preference changes or installs require closing the browser for them to take effect, it is quite likely that someone who makes that change, closes the browser so it will take effect, and then immediately double clicks the desktop icon or clicks a shortcut bar icon, will see this error. In fact, as I recall, that's how I initially encountered this issue. (On a K6- 2/300 w/ 256 meg ram the delay is long enough that it is easy to accidentally restart within the time this error occurs. Perhaps on a faster machine one would have to be doing something unusual to trigger it.)
Two comments: 1. I have run into this many times even though I have a reasonably fast machine, P3-850, 256 MB, 7200 rpm hard-drive, so I think it is more (not less) likely to happen as users change their settings and try to restart the app. 2. I have this problem on WinNT, not W2K. As I understand it, the OS setting is supposed to reflect the older OS-version that exhibits the problem. So, the OS option for this bug should be changed from "Windows 2000" to "Windows NT" (if not lower). I don't have access rights (and it's not my bug) so I leave for others to change!
OS: Windows 2000 → Windows NT
*** Bug 146221 has been marked as a duplicate of this bug. ***
*** Bug 146197 has been marked as a duplicate of this bug. ***
*** Bug 146481 has been marked as a duplicate of this bug. ***
Some more talkbacks: TB7017545E and TB6856455Y. I also see this on a fairly fast machine (PIII 1Ghz, 512MB) but see it on Win2K but NOT on WinXP Home.
*** Bug 149403 has been marked as a duplicate of this bug. ***
Blocks: 1.1b
*** Bug 154012 has been marked as a duplicate of this bug. ***
*** Bug 155729 has been marked as a duplicate of this bug. ***
*** Bug 146258 has been marked as a duplicate of this bug. ***
*** Bug 134348 has been marked as a duplicate of this bug. ***
Status: The status remains pretty much as I described in comment 11 and comment 13. I haven't done any work on fixing this one and no fix is imminent (from me).
Blocks: 1.1
No longer blocks: 1.1b
*** Bug 158413 has been marked as a duplicate of this bug. ***
Alias: nervoususer
*** Bug 160163 has been marked as a duplicate of this bug. ***
If no fix is imminent (at least from Bill), and this bug is targetted at v1.2a, then why is it a blocker for 1.1 (Bug #157592)? Yes, I realise that crashes aren't very nice, but from reading this bug there is no easy solution. To paraphrase (and use some different wording), what we have here is an application starting up that is trying to communicate with an application that is shutting down, and then getting upset because it can't communicate with it. (Replace "application" with "process" if that makes you feel better).
about target being 1.2a: there is currently no way to set a target of mozilla 1.1 (or 1.2, for that matter). so most bugs actually targeted for 1.1 end up being targeted to 1.1b or 1.2a.
rpotts, are you able to help us with this while law is busy with other things? if not, then who? This is clearly one of our most frequent crashers (don't let the lack of talkback fool you; I think that talkback is missing this one).
Garrett, can you help?
Re Comment #44. So what is "mozilla1.1final" for?
Why is this so tough? It seems to me that all you need to do is create a shared lock object, which you have to PRLock before starting the process shutdown and before doing the communication during process startup. As the descending process, you then unlock it after you're sufficently dead that it won't bother the new process.
Law says no time and Garret says he'll take a look at this. Reassigning.
Assignee: law → blythe
Attached patch NULL check (obsolete) — Splinter Review
request for r,sr,a A NULL check avoids a crash in the process shutting down. This is not the real fix, as this does not solve the cause of the problem. Agree with comment #48. I need more time to study the existing convolution; I am new to all things xpfe (nsNativeAppSupportWin.cpp). Having the second process do the right thing (load up) is not critical but could be solved by any real fix.
Attached patch Destroy IPC message window (obsolete) — Splinter Review
Request for r,sr,a Destroy our new window request window in nsNativeAppSupportWin::Quit(). nsAppShellService::Quit() calls nsNativeAppSupportWin::Quit(), and an nsAppShellService comment explicitly states: "Shutdown native app support; doing this first will prevent new requests to open additional windows coming in." So we should do that.
Comment on attachment 94641 [details] [diff] [review] Destroy IPC message window r=timeless fwiw dbradley and mjudge dislike the messagewindow class (esp the overloaded operator).
Attachment #94641 - Flags: review+
Comment on attachment 94641 [details] [diff] [review] Destroy IPC message window Also please change the spacing of your if to match the file's. When in Rome... if ( (HWND)mw ) {
Attached patch Destroy IPC message window (v2) (obsolete) — Splinter Review
format correct
Attachment #94641 - Attachment is obsolete: true
Comment on attachment 94637 [details] [diff] [review] NULL check Checked in NULL check, should fix crash. sr=darin, a=asa Still waiting on sr from law on MessageWindow destroy for real fix.
Attachment #94637 - Attachment is obsolete: true
Attached patch Destroy IPC message window (v3) (obsolete) — Splinter Review
Properly NULL out mHandle upon window Destroy, suggestions from #mozilla
Attachment #94714 - Attachment is obsolete: true
Garret, this never landed on the 1.1 branch (that I can tell) Can you get at least the nullcheck into the 1.1 branch? THanks.
Comment on attachment 94795 [details] [diff] [review] Destroy IPC message window (v3) This fix isn't quite right. The problem is that the code inserted into nsNativeAppSupportWin::Quit can destroy another process's MessageWindow. Normally, that function will not be called in a process that doesn't "own" the system-wide singleton instance of such a window, but that's not always the case. Two exceptions that I can think of off the top of my head: a. A restarting QL scenario where we destroy the message window then restart another process that will create a new one. b. Cases involving MOZ_NO_REMOTE. The simplest fix might be to check the process that owns the resulting window and not destroy it if it's not the currently running process.
Attachment #94795 - Flags: needs-work+
Thanks for the review Bill. We talked about the possibility of destroying another apps MessageWindow, and DestroyWindow (win32 sdk) will only destroy a window that was created from the same thread, so it is safe. However, during our talk, we decided that a lookup lock should be employed as elsewhere in the file, and that I should comment the DestroyWindow behavior. revised patch forthcoming.... Asa, on comment #57, after I looked into this I find that my NULL check is a trunk only fix as there has already been drift from the branch in that file. I think it is best if we proceed with the IPC patch that is forthcoming....
Finale (we hope) Request for r,sr,a
Attachment #94795 - Attachment is obsolete: true
Comment on attachment 94957 [details] [diff] [review] Destroy IPC message window (v4) r=pavlov
Attachment #94957 - Flags: review+
Comment on attachment 94957 [details] [diff] [review] Destroy IPC message window (v4) I just got Asa's voicemail about this today - this looks fine..sr=alecf though for this line: + Mutex mutexLock = Mutex(mMutexName); we like this convention: + Mutex mutexLock(mMutexName);
Attachment #94957 - Flags: superreview+
Comment on attachment 94957 [details] [diff] [review] Destroy IPC message window (v4) a=asa (on behalf of drivers) for checkin to the 1.1 branch. Garret, can you get this in quickly? Time is short. Thanks.
Attachment #94957 - Flags: approval+
checked in trunk and branch, closing
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
i'm mailing drivers to ask about 1.0.1, i spent today using n7p1 (mostly for spellchecker and skin switching games, i really liked using "modern the default skin for Netscape" in Netscape7). I insist that I'm an *anxious* user, not a nervous user.
Alias: nervoususer → anxioususer
Keywords: nsbeta1-mozilla1.0.1
*** Bug 190958 has been marked as a duplicate of this bug. ***
verified with recent trunk build.
Status: RESOLVED → VERIFIED
*** Bug 261529 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: