Closed Bug 94752 Opened 24 years ago Closed 24 years ago

Mozilla page faults at end of download if no browser windows are open

Categories

(Core :: XPConnect, defect)

x86
Windows 98
defect
Not set
major

Tracking

()

VERIFIED FIXED

People

(Reporter: phil, Assigned: jband_mozilla)

References

()

Details

(Keywords: crash)

Attachments

(5 files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.3+) Gecko/20010809 BuildID: 2001080904 Start download of above url, and close all browser windows. Mozilla pagefaults at end of download. Reproducible: Always Steps to Reproduce: 1. Start download 2. Close all open windows 3. Actual Results: Crashes with page fault. Expected Results: Mozilla should close gracefully Doesn't crash if a browser window is open
Confirmed using w2k build 2001081003, same deal for both ftp and http downloads.
Status: UNCONFIRMED → NEW
Ever confirmed: true
If you are crashing in Mozilla the best thing you can do to help the developers fix your bug is to attach a stacktrace. If you're not building yourself you are not out of luck. Mozilla (thanks to a very cool donation from Netscape) releases nightly and milestone builds with Full Circle's Talkback. Talkback should catch most crashes and offer to send in a crash report. I can retrieve that crash report and attach it to your bug report if you provide either the Incident ID (you can get it by running the talkback program from /components/talkback/) or you can let me know the email address you used to submit the report and the time of sending. Thanks for your help in testing Mozilla and reporting bugs.
Incident IDs: TB33966723E TB33966719Y TB33966712K (win32 build 2001081003)
Phil, is this jseng? Incident ID 33966723 Stack Signature 0x01100101 1a30294a Bug ID Trigger Time 2001-08-11 06:58:08 Email Address r@burlco.org User Comments Build ID 2001081010 Product ID MozillaTrunk Platform ID Win32 Trigger Reason Access violation Stack Trace 0x01100101 js_GetSlotThreadSafe [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 511] JS_InstanceOf [d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 1856] JS_GetInstancePrivate [d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 1896] args_enumerate [d:\builds\seamonkey\mozilla\js\src\jsfun.c, line 472] js_PutArgsObject [d:\builds\seamonkey\mozilla\js\src\jsfun.c, line 249] js_Interpret [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 1407] js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 825] nsXPCWrappedJSClass::CallMethod [d:\builds\seamonkey\mozilla\js\src\xpconnect\src\xpcwrappedjsclass.cpp, line 1025] nsXPCWrappedJS::CallMethod [d:\builds\seamonkey\mozilla\js\src\xpconnect\src\xpcwrappedjs.cpp, line 427] PrepareAndDispatch [d:\builds\seamonkey\mozilla\xpcom\reflect\xptcall\src\md\win32\xptcstubs.cpp, line 102] SharedStub [d:\builds\seamonkey\mozilla\xpcom\reflect\xptcall\src\md\win32\xptcstubs.cpp, line 124] nsDocLoaderImpl::FireOnStateChange [d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 1095] nsDocLoaderImpl::doStopURLLoad [d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 700] nsDocLoaderImpl::OnStopRequest [d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 553] nsLoadGroup::RemoveRequest [d:\builds\seamonkey\mozilla\netwerk\base\src\nsLoadGroup.cpp, line 517] nsHttpChannel::OnStopRequest [d:\builds\seamonkey\mozilla\netwerk\protocol\http\src\nsHttpChannel.cpp, line 2156] nsOnStopRequestEvent::HandleEvent [d:\builds\seamonkey\mozilla\netwerk\base\src\nsRequestObserverProxy.cpp, line 161] PL_HandleEvent [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 591] PL_ProcessPendingEvents [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 524] _md_EventReceiverProc [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 1072] KERNEL32.DLL + 0x2222f (0xbff9222f) 0x00688b60
Assignee: asa → rogerl
Component: Browser-General → Javascript Engine
QA Contact: doronr → pschwartau
Here are extracts from all three Talkback reports, from the lower halves of the stack: SharedStub nsDocLoaderImpl::FireOnStateChange nsDocLoaderImpl::doStopURLLoad nsDocLoaderImpl::OnStopRequest nsLoadGroup::RemoveRequest nsHttpChannel::OnStopRequest nsOnStopRequestEvent::HandleEvent PL_HandleEvent SharedStub nsDocLoaderImpl::FireOnStateChange nsDocLoaderImpl::doStopURLLoad nsDocLoaderImpl::OnStopRequest nsLoadGroup::RemoveRequest nsHttpChannel::OnStopRequest nsOnStopRequestEvent::HandleEvent SharedStub nsStreamXferOp::OnStopRequest nsStreamListenerTee::OnStopRequest nsHttpChannel::OnStopRequest nsOnStopRequestEvent::HandleEvent PL_HandleEvent
Keywords: crash
The reporter says the crash does not occur if the browser windows are open; only if they are all closed. I would guess that has to be checked for by the Networking and DocLoader code at the bottom of the stack traces. I don't think this has anything to do with JS Engine. Let me reassign this to the Installer component to see if they know whether this bug should belong to DocShell, Networking, or themselves -
Assignee: rogerl → ssu
Component: Javascript Engine → Installer
QA Contact: pschwartau → gemal
FWIW I've been able to reproduce this in a debug build, although with a stack trace different from any of the previous. All three cases, and in my case, it appears the stack has been corrupted. It appears that what ever code is getting triggered by this event is trashing the stack and then as the program returns up the stack it's crashing at rand spots, probably depending on what part of the stack gets trashed. I'll attach my stack trace. It has a bit more information than what TalkBack has.
since this happens with downloading all files (I just tried .exe files) this is *not* installer. btw also crash on Win2k perhaps xp apps? Correct me if I'm wrong...
Assignee: ssu → pchen
Component: Installer → XP Apps
QA Contact: gemal → sairuh
This seems to be a general download issue, not an installation issue. I'm able to consistently on any file I ftp. Also I just got another crash, with a slightly different stack. This occured while another thread was in the middle of GlobalWindowImpl::Cleanup call. This last crash occured when I clicked Save Link As, instead of just clicking on the file to download it. NTDLL! 77f827e8() KERNEL32! 77e86a3d() _PR_WaitCondVar(PRThread * 0x03930508, PRCondVar * 0x00cfaa58, PRLock * 0x00cfc5a0, unsigned int 0xffffffff) line 185 + 23 bytes PR_WaitCondVar(PRCondVar * 0x00cfaa58, unsigned int 0xffffffff) line 532 + 23 bytes nsThreadPool::GetRequest(nsIThread * 0x0391d800) line 633 + 15 bytes nsThreadPoolRunnable::Run(nsThreadPoolRunnable * const 0x0391d7b8) line 865 + 27 bytes nsThread::Main(void * 0x0391d800) line 105 + 26 bytes _PR_NativeRunThread(void * 0x03930508) line 399 + 13 bytes _threadstartex(void * 0x039caab8) line 212 + 13 bytes KERNEL32! 77e8758a()
benc/mscott, should this belong in networking or xp apps?
A couple more observations. I've seen two distinct behaviors from similar operations. I think both are a manifistation of the same problem, though. Specifically I seem one type of crash when I just click on a link to download a file and another type of crash when I right click and choose "Save Link As". The Save Link As crash seems to be more telling. What I've found is that the Global Window is getting destroyed twice and/or things accessing the global window during it's destruction. Also I noticed that if I clicked on a file, it displays a dialog asking me what to do, run it, save, etc. If I close all the windows while this dialog remained opened and then chose an option, everything just shutdown. In my opinion it's realistic to expect a user to kick off a long download and might just close the windows before setting this process off. It appears apparently that the application thinks that since this last window is closing it's time to shutdown. So it would appear that the problem may rest with this shutdown logic. I suspect in both cases the closing of the last visible window is trigging some of the shutdown code, at least triggering the global window to be destroyed. There should be a way for these windows to disappear without actually shutting the process down until everything is finished.
Are we nesting an event loop at a bad spot? Cc'ing jst and danm, who love these kinds of bugs! /be
Better than a thorough beating, anyway. It's always good to think of event queues when things go wrong, but in this case it can't be: none are pushed during ftp download. I can't reproduce the problem on Win2k. I can't even Save Link As... (I get a JS error ("contextMenu.saveLink is not a function")). But doing it the old fashioned way, just clicking on an ftp file, on those occasions when the download doesn't stall, I have no problems. (But man does UI responsiveness take a stiff dive.) Tough for me to help debug it, then.
Something the original reporter didn't mention, and I meant to mention in my subsequent investigation, is the fact that you have to click that little check box that says to leave window open after operation. If it's checked the window stays open and there isn't a problem. It has to be cleared so the window closes after the operation.
OK. I'm going to claim that this is an XPConnect (and DOM) bug. I get the same crash stack as dbradley. I think the talkbacks reports are the same problem, but the debug builds simply catch the problem closer to the source. The problem is that the DOM system has called JS_DestroyContext while xpconnect still has pending calls using the JSContext. I can show this by snapshoting the stack of the JS_DestroyContext which happens while we are in XPCWrappedNative::CallMethod. That bit of code wraps the call to XPTC_InvokeByIndex with an auto object that does a JS_SuspendRequest before and JS_ResumeRequest after the call out to XPTC_InvokeByIndex. In this case the JS_DestoryContext call happens before XPTC_InvokeByIndex returns and our call to JS_ResumeRequest gets a garbage JSContext. Dealing with this locally won't help because we still have a nsXPCWrappedJS::CallMethod call pending on the stack - i.e. we are still running code in the JS engine on the JSContext that has been destroyed. The XPConnect and DOM systems have worked together at cross purposes here to break the expectations of the JS engine. I think that what we have to do is bite the bullet to bring still more of the DOM interaction with the JS engine into xpconnect itself. In this case we should have the DOM not call JS_DestroyContext itself, but instead it should inform xpconnect that it is done using the JSContext. And then xpconnect can be responsible for actually destoying the JSContext (either immediately or at its leisure). XPConnect has the info to manage this. The XPCCallContext chain for the current thread tracks the uses (or lack thereof) of the JSContext. So, we could add XPConnect::DoneWithJSContext(in JSContextPtr cx) (or whatever we want to call it) that will check the current thread's XPCCallContext chain for the given JSContext. If not found then the the JSContext will be immediately destroyed. If it is found then the *deepest* user of the JSContext in that chain will be marked to destroy the JSContext upon its own destruction. This just costs us one flag per XPCCallContext and a quick check in the XPCCallContext dtor. That dtor code should also ASSERT that the JSContext is not present in the threadjscontextstack. I believe that it would be an error for the JSContext to appear in that stack any deeper than the frame (optionally) pushed by the XPCCallContext. So a debug ASSERT should be sufficient. Does this seem like the right solution to others?
Assignee: pchen → dbradley
Component: XP Apps → XPConnect
QA Contact: sairuh → pschwartau
Note that the stack I just posted shows that in this case the JSContext in use by the running JS code is not the same one being destroyed (of course). So, my arguments about the insuffientcy of a local fix might not be fully supported. Still, a more local fix (say to the use of AutoJSSuspendRequest to detect if the JSContext had been whacked while we were suspended) would require us to patch in a different JSContext into the current XPCCallContext (and probably the top frame of the threadjscontextstack). This would be *way* ugly. Letting XPConnect decide on the right time for the call to JS_DestroyContext has got to be the way to go.
jband: sounds right on the non-local solution. Even the classic codebase had to be careful not to destroy a JSContext (one per window) until it was sure no one was using it (http://lxr.mozilla.org/classic/source/lib/libmocha/lm_win.c#3310) /be
Sounds reasonable to me, calling into XPConnect in stead of calling JS_DestroyContext() should be straight forward in the DOM code, but we might wanto add a flag to the call that lets the DOM code indirectly destroy the context w/o forcing a GC (i.e. XPConnect would call JS_DestroyContextNoGC() if the flag is true). This is probably not an issue if the context destruction is delayed, but being able to destroy context w/o causing a GC is something that should be done (the current code doesn't do this) to speed up teardown of [i]frame-intesive pages.
The attached patch makes it not crash for me. It does not address jst's 'no-gc' request. I had to add the kungFuDeathGrip to nsStreamXferOp.cpp because that object was releasing its reference to the observer before the call to the observer returned. This is, of course, a no-no. But I'm concerned that we might have other stuff getting tripped up in this case where the window is going away while we are still making JS calls on it. Note that I took out the bit where we were clearing the global object of the JSContext in nsJSEnvironment because doing this there would allow the window object to get gc'd and die before this all unwound. I'm out of my depth here as far as understanding the wider implications. This certainly needs more testing.
s/Destory/Destroy/g :-)
Thanks Brendan. I'd already fixed the typos you may have noticed in the printfs. I have yet to see my fix cause a problem during my browsing. But that is hardly an exhautive test. I can't help thinking that there might be a better DOM fix to keep us from needing to defer the JSContext destruction in this case. We should also figure out if there is any code left that will barf on a context global with null private data. We're introducing a new (though rare) state here.
I applied the patch and ran with it under Windows and Linux. I haven't been able to reproduce the problem, nor have I seen any new problems crop up. I'll continue running with it. I've played with composer, chat, mail, the browser buster site, etc. Still not an exhaustive test, but looking pretty good. There is still another issue that I think needs to be addressed, but in another bug. That's the fact that if all windows are closed and the only thing left is the dialog that appears when you single click on a ftp file link, the browser process terminates instead of allow you to continue with the download process.
Status: NEW → ASSIGNED
I saw an RFE that asked for a warning box to appear when you cut off a download in progress. (Can't find the bug though). Is one possible solution hooking up a new window to stave off the shutdown event?
Reassigning to me. Asking for r= and sr=. The above patch is ready to go as far as I'm concerned. Johnny says that he is going to look into all the places in the code that might be unhappy with a null context private. We decided that that effort does not block this bug since we only will only be creating that state in cases that would have likely crashed without this patch anyway.
Assignee: dbradley → jband
Status: ASSIGNED → NEW
r=dbradley
sr=jst
fix checked in
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
phil@rebee.clara.co.uk: if you get a chance, could you try the latest binary after this check-in and verify that the problem has gone away? If so, please mark this bug "Verified"; otherwise, you can reopen it - thanks
No longer seeing it with build 2001081504
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: