Closed Bug 53353 Opened 24 years ago Closed 24 years ago

Crash on browser/installer exit on win9x

Categories

(Core :: XPCOM, defect, P1)

x86
Windows 98
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: doronr, Assigned: waterson)

References

Details

(Keywords: crash, regression, Whiteboard: [nsbeta3++][dogfood-] FIX IN HAND)

Attachments

(3 files)

Possible related to 45842

Win98 2000092005 crashes on exit of browser, the last line in console is
Pref_Cleanup().

Can anyone other reproduce this?
*** Bug 53359 has been marked as a duplicate of this bug. ***
windows commercial build 2000-09-20-06-M18

closing browser with "X" or File | Close  or File | Exit crashes with error:

messages "This program has perfomed an illegal operation and will be shut down"

and:

Runtime error!

Program: c:\ *

R6016
-not enough psace for thread data
Talkback data? unable to reproduce on NT with 092005 build.
i'm also seeing this one
drwatson on 98 gives me this info:

Remote Procedure Call DLL performed an invalid memory access.

Module Name: RPCRT4.DLL
Description: Remote Procedure Call DLL
Version: 4.71.2900
Product: Microsoft(R) Windows NT(TM) Operating System
Manufacturer: Microsoft Corporation

Application Name: Mozilla.exe
and now without drwatson:

MOZILLA caused an invalid page fault in
module RPCRT4.DLL at 017f:7fb9181c.
Registers:
EAX=00000000 CS=017f EIP=7fb9181c EFLGS=00010246
EBX=81998794 SS=0187 ESP=0068fd8c EBP=0068fdc0
ECX=d82db5b0 DS=0187 ESI=7fb90000 FS=6f07
EDX=c003094c ES=0187 EDI=00000000 GS=0000
Bytes at CS:EIP:
ff 70 28 ff 15 78 d0 bd 7f c7 05 bc c0 bd 7f 01 
Stack dump:
00000000 00000000 7fb90000 81998794 00000000 16670246 bff741f7 0068fd90 0068fbbc 
0068ff78 7fb953e8 7fbd4a70 ffffffff 0068ff88 bff7ddd6 7fb90000 

talkback does not seem to kick in i'm running mozilla with -console and to be 
able to close the console after the crask i'm forced to end winold task using 
the ctrl-alt-del trick

i can attach a more detailed drwatson report if needed
talkback is shutdown before the crash. annoying and makes it harder to locate
the culprit
over tyo XPCOM for investigation.
Component: Browser-General → XPCOM
reassigning because I forgot.  I'm not sure this is XPCOM but the only lxr ref
to that windows RPCRT4.DLL is in xpcom/tests and the microsoft literature talks
about it in idl and com docs.
Assignee: asa → rayw
QA Contact: doronr → rayw
Since the windows installer uses xpcom, it also crashes with the same result at 
the end of setup.exe.
adding dp and scc
Thus far, I have been unable to duplicate this.  I tried on several independent 
occations before the bug disappeared and after it reappeared in my scope.  I do 
not have a Windows 98 platform to test on, but it does not appear on my WNT 4.0 
SP 6 build using a strait Mozilla build.  While the original bug wasn't logged 
against a commercial build, at least one reproduction was on a commercial build, 
so that is my next attempt.
this is a win9x only bug!  Still in 2000092108.  Nominating for nsbeta3, as
win9x is a very wide spread OS.  Updating summary to make it clearer where we crash.
Keywords: crash, nsbeta3
Summary: Crash on exit → Crash on browser/isntaller exit on win9x
adding myself to the CC: list.
Not sure if anyone else sees this, but the crash created by this bug creates a
presistent crash window that doesn't go away on my Win98 computer, forcing me to
reboot to get rid of it.

If I'm not the only one experiencing this, then this should give this bug a
little more priority.
Why isn't tinderbox catching this?
*** Bug 53666 has been marked as a duplicate of this bug. ***
RE versions: This problem is reported in Win95 and Win98 only, not NT or 2k.
http://bugscape.netscape.com/show_bug.cgi?id=2415

Worse, this isn't plussed yet.

Does someone need to go to PDT to explain this needs to be fixed? We can't ship 
anything that has an installer that crashes at the end, even if it installs 
correctly.

If this sounds like I'm volunteering, I volunteer.

I've attached a stack trace of the assertions that are thrown when exiting the 
browser.  I thought they might be useful.

I'm going to try to get a stack trace during the exit of the installer now.
I would plus it and accept it as assigned if I could dup it. I suggest that 
someone who can dup it look at it and figure out what is causing the problem.  
If required, I will set up Win98 and the surrounding development platform, 
but I doubt I will have the bug troubleshooted by Monday when I leave for 
Boston.  From all the assertions, it would appear that there is non-thread-safe 
stuff happening on timers.

While XPCOM as a model is the root of many of this type of problem, XPCOM 
registers no timers I am aware of and does not do RPC's by itself during 
shutdown.  I could be wrong, but that is my belief.
I'm afraid I do not have the needed debug tools to test this.  However, 100% of
win9x people I have asked see this.
I have this problem on a laptop. I'm game to letting people work on it.
If anyone needs the proper debugging environment to debug this problem, let me 
know.  I have everything set up to build and debug this win98 bug in my cube.

My win98 system has VC6 with yesterday's debug build on it that reproduces this 
problem consistently.
per PDT, this is upgrade to Priority P2, and is now nsbeta3+
re-assigned to ssu
Assignee: rayw → ssu
Priority: P3 → P1
Whiteboard: [nsbeta3+]
I am not the right person to look at this bug.  I am not that familiar with 
timers and xpcom to be looking at this.  I just have a win98 system that can be 
used to debug this problem.
Reassigning to dougt as possibly more appropriate to deal with Win9x threading 
problem. This is not an install issue.

CC'ing valeski because he's probably going to object :-)
Assignee: ssu → dougt
*** Bug 53817 has been marked as a duplicate of this bug. ***
*** Bug 53890 has been marked as a duplicate of this bug. ***
*** Bug 53890 has been marked as a duplicate of this bug. ***
*** Bug 53962 has been marked as a duplicate of this bug. ***
I see this R6016 error with a daily on my Win98 machine.

I don't see a link to the MSDN search here that would give us clues. It is:
Go to http://search.microsoft.com/us/dev/ and type in R6016

somewhat useful:
http://msdn.microsoft.com/library/devprods/vs6/visualc/vccore/r6016.htm

suggested user workaround that did not work for me:
http://support.microsoft.com/support/kb/articles/Q193/9/03.ASP

I can see about getting a debug build on my win98 machine to see if I can get 
further clues. Otherwise I don;t have any special insight.
This might be helpful.  I found an article in MSDN6 and MSDN Online, Article ID: 
Q126709:
   PRB: Error on Win32s: R6016 - not enough space for thread data
   http://support.microsoft.com/support/kb/articles/Q126/7/09.asp
Keep in mind, though, that article is concerning an old version of the Win32s
software for Windows 3.x
Yes. That article was one of the three found in the search I showed above. I 
think the other two are more interesting.

I see this behavior in win98 with the debug build too. I also see it with viewer 
and winEmbed. I *don't* see it with xpcshell, testxpc (which also init and exit 
xpcom).

The threadsafety asserts attached by ssu@netscape.com are telling. I 
see them too. taskinfo2000  - http://www.iarsn.com/download.html#TaskInfo - 
shows that at the time of these asserts, and of the crash, there is only one 
thread left running in the process and it is *not* the original main thread. 
This is very odd.
I'm seeing a variation on my win98 pc.

MOZILLA caused an invalid page fault in
module KERNEL32.DLL at 017f:bff9db61.
Registers:
EAX=c00309c4 CS=017f EIP=bff9db61 EFLGS=00010212
EBX=0068ff78 SS=0187 ESP=0058ff4c EBP=005901e8
ECX=00000000 DS=0187 ESI=00000000 FS=68c7
EDX=bff76855 ES=0187 EDI=bff79198 GS=0000
Bytes at CS:EIP:
53 8b 15 e4 9c fc bf 56 89 4d e4 57 89 4d dc 89 
Stack dump:

Also I'm getting the presistent crash window.

Bug is in bin\xpcom.dll. I coppied this file from the 0919 build to the 0924 
build. 0924 shuts down normally with the 0919 xpcom.dll file.
>>Bug is in bin\xpcom.dll. I coppied this file from the 0919 build to the 0924
>>build. 0924 shuts down normally with the 0919 xpcom.dll file.

We need to find out who checked in XPCOM stuff after 0919 and before 0921.

Interesting is, that if I have the console open, the crash causes it to not
close, and freezes my win98. Adding regression/dogfood keywords to hopefully get
more attention.
Keywords: dogfood, regression
Good idea, Doron. I see only three people checking into XPCOM in that time: 
warren, waterson, and jband.

warren only changed some chrome jar makefile stuff

jband touched xpt error checking, looks safe enough.

waterson *did* touch XPCOM shutdown, including a fix that says "Add memory 
flusher thread." Bingo, I think we have a winner.
I reproduced this also on my Win95 machine at home with 9/23 build
Yes, you are right.  I knew someone was putting in a memory flusher on a timer 
thread.  I just hadn't figured out who it was.
I just tried running debug with waterson's flusher thread disabled using 
 #undef NS_MEMORY_FLUSHER_THREAD
 
I still get all the threadsafe assertions, but not the R6016 error and crash.

I don't think that just disabling the flusher thread is the right thing to do. 
We need to understand whay it is doing this cleanup work on some other thread. 
These asserts are warnings we should not ignore.
*** Bug 54045 has been marked as a duplicate of this bug. ***
Summary: Crash on browser/isntaller exit on win9x → Crash on browser/installer exit on win9x
MS Windows 95 4.00.950a french version.  
NN4 default browser, IE 5 5.50.4134.0600 implemented. 
Same R6016 error and crash on 2000092520 setup. 

Same crash on M18 exit with these details : 

MOZILLA a causé une défaillance de page dans
 le module RPCRT4.DLL à 0147:70101e19.
Registres :
EAX=00000000 CS=0147 EIP=70101e19 EFLGS=00010246
EBX=8165ec90 SS=014f ESP=0068fd80 EBP=0068fdb4
ECX=c757ecc4 DS=014f ESI=8165ecd4 FS=3c8f
EDX=c0020ed8 ES=014f EDI=70100000 GS=0000
Octets à CS : EIP :
ff 70 28 ff 15 78 e0 14 70 c7 05 6c d0 14 70 01 
Etat de la pile :
00000000 70100000 8165ecd4 8165ec90 0068ff80 00000001 bff74277 0068fd84 0068fbac 
0068ff70 7010a6dc 70146488 ffffffff 0068ff80 bff7b9b5 70100000 

followed by R6016 error and : 

MOZILLA a causé une défaillance de page dans
 le module KERNEL32.DLL à 0147:bff9a08c.
Registres :
EAX=0068febc CS=0147 EIP=bff9a08c EFLGS=00000246
EBX=8165ec90 SS=014f ESP=0068feb8 EBP=0068ff0c
ECX=80002f48 DS=014f ESI=00000000 FS=3c8f
EDX=80005f80 ES=014f EDI=780025ff GS=0000
Octets à CS : EIP :
5e 8b e5 5d c2 10 00 64 a1 00 00 00 00 55 8b ec 
Etat de la pile :
78037130 c0000005 00000000 00000000 bff9a08c 00000000 5c3a4520 474f5250 204d4152 
454c4946 45535c53 4e4f4d41 5c59454b 495a4f4d 2e414c4c 0a455845 
pulling off of dougt's plate.
Assignee: dougt → valeski
reassigning to waterson (per jband's comments) and raising to nsbeta3++, we need
this fixed on the branch.
Assignee: valeski → waterson
Whiteboard: [nsbeta3+] → [nsbeta3++]
Still seeing this bug on 2000092508, Win98 with IE5.5 installed, on all closes

Message:

Visual C++ Runtime Library

-R6016
Not enough space for thread data

Status: NEW → ASSIGNED
Recently upgraded my work box (win98) from an earlier m18 nightly to the
25/09/00 vers: fine! wonderful! scrumdiddlydumptious!
Installed the Green & Black skin: no wukkers!
Installed the latest Aphrodite nightly (from
http://aphrodite.mozdev.org/installation.html): BOOM!
NOW I'm getting the same MS VisC++ RL "R6016 - not enough space for thread data"
Runtime Error! every time I close Mozilla.
Tried removing the obvious Mozilla related bits and doing a clean reinstall, but
it's still there so I must have missed something (either that or the VisC++
library has been screwed over).
Anybody else seeing an Aphrodite install as a trigger for this bug?
This bug doesn't exist in 2000091908, but appeared in 2000092008 and has
persisted through to current (2000092608). The bug occurs in all 3 windows
binary builds under the original 95 through to 98 SE.
Being without native nor cross-compiler for windows however, I haven't been able
to test builds from source.
there is no need to report "still seeing this in build x" and such, this is 100%
reproducable on win9x.
OK, so we won't say it still exists.  The question, however, is when is someone 
likely to FIX it?  This is a week of being broken . . . 

Beker@cnpr.org
Attached patch proposed fixSplinter Review
The above patch fixes one problem, which is factoring out the "startup" and
"shutdown" of the memory service from it's creation and destruction. Turns out
nsMemoryImpl is created well before XPCOM initialization, and is re-created
after XPCOM shutdown (doing memory management for nsString's, both times). This
was causing the memory flusher thread to be *re-created* after XPCOM shutdown;
certainly not something I expected to happen!

With this patch, XPCOM startup calls nsMemoryImpl::Startup(), which'll start the
memory flusher thread. As before XPCOM shutdown calls nsMemoryImpl::Shutdown()
to spin down the memory flusher thread. But, I made XPCOM shutdown call
nsMemoryImpl::Shutdown() *before* calling nsThread::Shutdown(). (Since
nsMemoryImpl's Shutdown is doing thread tinkering.)

With this patch, I still see the assertions on exit. Here's what appears to be
happening with those: the last thread to exit the app appears to be the Winsock
thread (the code is from WS2_32.DLL), and apparently since it's the last thread,
it gets to run all the app's static dtors. The half a dozen or so static
nsCOMPtr's being clobbered on this thread (static nsCOMPtr's are a no-no,
remember?) are each asserting because the objects that they hold were created on
the main thread, not the Winsock(?) thread.

Why the existence of the memory flusher thread affects whether or not Winsock(?)
exits properly is beyond me. Maybe there is some funky startup ordering problem?
cc'ing wtc & rpotts, who may have insight into what WS2_32.DLL is. Also cc'ing
warren for some r= on the proposed fix.
Whiteboard: [nsbeta3++] → [nsbeta3++] FIX IN HAND
*** Bug 54429 has been marked as a duplicate of this bug. ***
approval keyword.
Keywords: approval
*** Bug 54484 has been marked as a duplicate of this bug. ***
*** Bug 54487 has been marked as a duplicate of this bug. ***
*** Bug 54487 has been marked as a duplicate of this bug. ***
Extremely informative email from wan-teh...recording for posterity's sake.

Chris Waterson wrote:

> Hey, if you get a chance, could you look at my last couple of comments
> for bug 53353? I think I've got a fix for the bug, but we're still
> assert-botching like crazy on exit. It appears that what's happening
> is that a bunch of static dtors are running on a thread other than the
> main thread. This is befuddling to me. First, why would that happen?
> Is the "last DLL to exit" the lucky winner that gets to run the static
> dtors?

I don't know why the static dtors are running on
a thread other than the main thread.  My experience
is that as soon as the main() function returns,
all the other threads get instant death.  At least
it appears to be that way.  However, if your main()
function calls _endthreadex() or ExitThread(), the
main thread terminates but the process does not
terminate until all the other threads terminate.

Does Mozilla's main() function call _endthreadex()
or ExitThread()?

Do you need the assumption that the static dtors
are running on the main thread?


> Second, why would the WS2_32.DLL's thread (Winsock?) be
> lingering after shutdown? Are we failing to clean up winsock
> correctly?

 PR_Cleanup() calls WSAShutdown(), but PR_Cleanup() is
a dangerous function to call in a program as complicated
as Mozilla.  (I will spare you the details.)  So very
likely Mozilla is not calling WSAShutdown().

Wan-Teh
Well, as far as I can tell, the patch does exactly what you say it does in your 
comment on the 27th; and that sounds like the right thing to me.  Naturally, I'm 
concerned about these other threads continuing to live beyond the main thread.  
_That_ seems wrong.  Was this the case before any of the memory flusher 
modifications?  Is this something to worry about?  Have we filed a separate bug 
to get rid of the static |nsCOMPtr|s?

If these other problems warrant a separate bug, and/or are of significantly less 
importance that this bug --- and I believe they are --- then r=scc on this last 
patch (09/27/00 20:16 'proposed fix').
I recall hearing about this, and I think that the crash is ugly, but has no evil
side effects (i.e., the install works).  IF the crash blocks the install, then
this is a dogfood-plus.
Believing this is just an ugly crash (and noting we want it for beta3), I just
don't think it would stop internal folks from using the product.
marking dogfood-minus

IS this going to land today on the branch? I think we're now on our final respin
plan for Beta3 on friday AM.
Whiteboard: [nsbeta3++] FIX IN HAND → [nsbeta3++][dogfood-] FIX IN HAND
Some minor cleanup per warren's suggestions: fix race condition with Stop() and
testing mRunning; use nsAutoLock to detect deadlocks. Re-testing on Win98 now...
fix checked in, tip & branch.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Adding myself, roberts, and tpringle to cc: list.
*** Bug 54654 has been marked as a duplicate of this bug. ***
*** Bug 54609 has been marked as a duplicate of this bug. ***
*** Bug 54653 has been marked as a duplicate of this bug. ***
Still crashes in build 92808, Win98SE
verified fixed on windows build 2000-09-29-08-M18
Status: RESOLVED → VERIFIED
I still se this with branch build 2000092908 win98. However, other people who
have seen it say it is gone (win98se only though).  Is anyone still seeing this
other than me? Not going to reopen yet
WFM with branch build 092908 on Win 98SE
looks like my computer was acting up, after a reboot, installing the new build
fixes this.  way to go warren!
*** Bug 7799 has been marked as a duplicate of this bug. ***
*** Bug 7799 has been marked as a duplicate of this bug. ***
As the designated techno-laggard for install testing ... this does not crash on 
2000-09-29-08-MN6 with Win 95 Debut (the original 95).
Thanks, Chris.
I am still seeing this on 2000100108 Win98.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
After restarting this is no long visible. Sorry for the spam.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
And spam #3.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: