Closed Bug 31847 Opened 25 years ago Closed 24 years ago

Need to solve problem of JS roots for event handler funcs [@ js_LockScope1] [@ js_GetSlotWhileLocked]

Categories

(Core :: DOM: Core & HTML, defect, P1)

x86
Windows NT
defect

Tracking

()

RESOLVED FIXED
mozilla0.9

People

(Reporter: shrir, Assigned: joki)

References

Details

(Keywords: crash, dom0, topcrash, Whiteboard: [hit in 10/30 builds][rtm-])

Crash Data

Attachments

(3 files)

I tried installing today's windows build 2000-03-14-12-M15 I crashed just after the installation completed. Clicking on the Netscape 6 icon also crashes and the browser does not start.
Stack trace: ntdll.dll + 0xce0c (0x77f6ce0c) ntdll.dll + 0x7546 (0x77f67546) js_LockScope1 [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 649] js_LockObj [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 720] js_GetSlotWhileLocked [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 276] js_ValueToFunction [d:\builds\seamonkey\mozilla\js\src\jsfun.c, line 1663] JS_ValueToFunction [d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 493] nsJSContext::CallEventHandler [d:\builds\seamonkey\mozilla\dom\src\base\nsJSEnvironment.cpp, line 542] nsJSDOMEventListener::HandleEvent [d:\builds\seamonkey\mozilla\dom\src\events\nsJSDOMEventListener.cpp, line 97] nsEventListenerManager::HandleEventSubType [d:\builds\seamonkey\mozilla\layout\events\src\nsEventListenerManager.cpp, line 701] nsEventListenerManager::HandleEvent [d:\builds\seamonkey\mozilla\layout\events\src\nsEventListenerManager.cpp, line 1256] GlobalWindowImpl::HandleDOMEvent [d:\builds\seamonkey\mozilla\dom\src\base\nsGlobalWindow.cpp, line 374] nsWebShell::OnEndDocumentLoad [d:\builds\seamonkey\mozilla\webshell\src\nsWebShell.cpp, line 2380] nsDocLoaderImpl::FireOnEndDocumentLoad [d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 592] nsDocLoaderImpl::DocLoaderIsEmpty [d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 494] nsDocLoaderImpl::DocLoaderIsEmpty [d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 466] nsDocLoaderImpl::OnStopRequest [d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 438] nsLoadGroup::RemoveChannel [d:\builds\seamonkey\mozilla\netwerk\base\src\nsLoadGroup.cpp, line 545] nsInputStreamChannel::OnStopRequest [d:\builds\seamonkey\mozilla\netwerk\base\src\nsInputStreamChannel.cpp, line 367] nsOnStopRequestEvent::HandleEvent [d:\builds\seamonkey\mozilla\netwerk\base\src\nsAsyncStreamListener.cpp, line 288] nsStreamListenerEvent::HandlePLEvent [d:\builds\seamonkey\mozilla\netwerk\base\src\nsAsyncStreamListener.cpp, line 98] PL_HandleEvent [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 564] PL_ProcessPendingEvents [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 527] _md_EventReceiverProc [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 1020] USER32.dll + 0x1820 (0x77e71820)
Summary: Crash trying to install browser → Browser crashes after installation and trying to launch
*** Bug 31841 has been marked as a duplicate of this bug. ***
*** Bug 31841 has been marked as a duplicate of this bug. ***
Keywords: smoketest
Target Milestone: M15
reassigning to dp, cc selmer. I don't see this on the mozilla build.
Assignee: ssu → dp
Win32 (2000-03-14-12-M15) commercial I see the crash in the netscape build too.
As can be seen in the console log attached to the bug, profile manager has completed ALL of it's activities. (Look for those inconspicuous Begin and End comments in the log :-)
Looking at what this could potentially be....
From the stack looks like jsdom or webshell. I am cluless as to what this is. We will need a commercial tree to test this out. Travis any clue ?
Assignee: dp → travis
I get the crash, but can still run my tests for AIm if I use command line option -aim when I launch Seamonkey. I hear the same is true for Mail and other apps outside of the browser also.
Hmmmm, it's gonna be tough for Travis to fix this seeing as how he's not in the office (yet) today.
Putting on dogfood, PDT+ radar.
Keywords: dogfood
Whiteboard: [PDT+]
Only reason webshell is in the stack is because it is the end of the document load and it is firing the OnEndDocumentLoad so JS does the onLoad handlers. Hyatt changed a bunch of skin stuff last night that probably needs to be applied to the commercial tree.
Assignee: travis → hyatt
yeah yeah yeah... stupid fucking installer files
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Wait. I was right the first time. There's no problem with the installer files.
reopening
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
removing installer red herring from summary
Summary: Browser crashes after installation and trying to launch → Browser crashes on startup
The crash occurs when global window is handling the onload events. Something goes wrong with the attempt to execute the event handler. Seems to be some sort of window/webshell timing thing. I do not believe this is my bug. Reassigning to travis. I have a commercial build and am trying to investigate, but I'm baffled. All of the data looks sane.
Assignee: hyatt → travis
Status: REOPENED → NEW
Travis says, "Yeah, I am seeing this too now with my build that finally finished. This is dying in NT DLLs from deep inside js. I think we need some JS experts to help look at this."
Priority: P3 → P1
Whiteboard: [PDT+] → [PDT+] NEED HELP FROM JS GURU!!!
Removing PDT+, this is only in the tip, not in the beta branch.
Whiteboard: [PDT+] NEED HELP FROM JS GURU!!! → NEED HELP FROM JS GURU!!!
We don't need a JS guru, we need a DOM guru... I see this in my comercial build too. The JSObject is garbage. The nsJSDOMEventListener's mHandler is a JSObject that has probably been previously gc'd. Now why would that be? Somewhere there is probably a call to NS_NewScriptEventListener that passes in a JSObject that just isn't around anymore by the time that nsWebShell::OnEndDocumentLoad gets called.
If it helps... The document that finished loading was navigator.xul.
I did a tiny bit of digging in navigator.xul etc. without finding anything incriminating. I have a few thoughts... It looks to me like nsJSDOMEventListener is doing a dangerous thing by hanging on to JSObjects for future use without rooting them. There is an implied assumption that such objects must be rooted in JS via some other means. But that can break down. For instance: One might set function 'foo' as the onLoad handler and then later set foo = null (or some other function). This would leave the function object held in the nsJSDOMEventListener vulnerable to be gc'd. Right? Also, an onLoad function that lives in window 'A' which targets the completion of loading window 'B' is broken if window 'A' is destroyed before window 'B' finished loading. Perhaps there are other scenerios where this scheme breaks down. With limited knowlege of how our chrome content is structured (especially in the comercial tree), I didn't find instances of such things. But, I wouldn't bet against them lurking there. It seems to me that the short term issue is to find any JS/xul/html code that might be setting a vulnerable function as an event handler. The longer term issue is to look at using JS_AddNamedRoot in nsJSDOMEventListener (or thereabouts). This is not a no-brainer because any roots that we add increase the likelyhood of erroneously uncollectable graphs in JS (and spreading into the native realm). Does this shake anything loose for anyone?
adding jst@netscape.com (our virtual vidur) to cc list.
jband, I noticed you made gc changes over the weekend. Does this problem go away rolling those back? That would answer when it entered in. But from your description it sounds like it just exposed an existing problem.
Win32 (2000-03-15-09 m15) this problem still exists in today's build
Additional info: Removed mozregistry.dat and the build shows the splash screen, but never opens a browser window.
The changes I made to jsgc were just to protect from gc reentrance and should have no effect here. Just to be sure I did a test of backing them out and this crash remains. The fact is that there were a *lot* of checkins over the weekend. I still believe that some bit of chrome (or possibly native code) was checked in that trips over this weakspot in nsJSDOMEventListener.
This could be totally unrelated but here goes... I just did a clobber build on NT 4 of todays mozilla tree and I'm seeing an nspr assert on startup, if I ignore the assert I hit a few more asserts in NS_CheckThreadSafe() and mozilla exits, no real crash AFAIK! Seems like giving an url as a command line argument to mozilla makes mozilla start normally but nothing is loaded, entering a *file* url in the url bar loads the file but loading a http url crashes mozilla (this could possibly be what's causing the crash (or whatever it really is) during a normal startup)! The stack trace at the first nspr assert I get is: NTDLL! DbgBreakPoint@0 address 0x77f76148 PR_Assert(const char * 0x0026b824 ??_C@_0BJ@LHIK@?$CB?$CCI?1O?5method?5is?5invalid?$CC?$AA@, const char * 0x0026b840 ??_C@_0L@KLDB@priometh?4c?$AA@, int 72) line 448 _PR_InvalidInt16() line 72 + 35 bytes _PR_MD_PR_POLL(PRPollDesc * 0x014535d0, int 2, unsigned int 2610055) line 131 + 42 bytes PR_Poll(PRPollDesc * 0x014535d0, int 2, unsigned int 2610055) line 115 + 17 bytes nsSocketTransportService::Run(nsSocketTransportService * const 0x01455ea4) line 414 + 24 bytes nsThread::Main(void * 0x01454230) line 83 + 26 bytes _PR_NativeRunThread(void * 0x01453360) line 399 + 13 bytes _threadstartex(void * 0x01454550) line 212 + 13 bytes KERNEL32! BaseThreadStart@8 + 81 bytes As I said, this could be unrelated, I've never been able to reproduce the event handler crash...
I get the following text when I startup (after the profile manager and after xpcom registration): Loading page specified via openDialog frame: Box[id=] (00F11F38) style: 03E86200 { http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul|* weight: 0 { display: 1[0x1]enum } } Wrong parent style context: style: 03E86770 { http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul|* weight: 0 { display: 1[0x1]enum } } should be using: style: 03F66940 { http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul|* weight: 0 { display: 1[0x1]enum } } Opening file cookies.txt failed Opening file cookperm.txt failed
The crash jst reports is actually something different I think. Warren I think owns those, adding him on the CC list.
The channel gets removed from the same load group twice. Once in the listener on StopRequest and then again by assigning null. Here is the code from nsInputStreamChannel.cpp: nsInputStreamChannel::OnStopRequest(nsIChannel* transportChannel, nsISupports* context, nsresult aStatus, const PRUnichar* aMsg) { nsresult rv; rv = mRealListener->OnStopRequest(this, context, aStatus, aMsg); if (0 && mLoadGroup) { if (NS_SUCCEEDED(rv)) { mLoadGroup->RemoveChannel(this, context, aStatus, aMsg); } } // Release the reference to the consumer stream listener... mRealListener = null_nsCOMPtr(); mFileTransport = null_nsCOMPtr(); return rv; } Gagan, is this what you would expect to have happen?
From a fresh pull done 2 hours ago, this is what I see, 1) mozilla and commercial builds come up, no crash, but no url is loaded upon statup. 2) Sidebar is also empty. Clicking on bookmarks on toolbars loads the page. 3) Cliking on links in a page load it. 4) typing in a url in the urlbar gives the message, "BrowserAppCore has not been initialized". 5) I get a lot of the following assertions... JavaScript Error: uncaught exception: [Exception... "Factory not registered" co de: "-2147221164" nsresult: "0x80040154 (NS_ERROR_FACTORY_NOT_REGISTERED)" loca tion: "http://home.netscape.com/h.js Line: 3"] ###!!! ASSERTION: NS_ENSURE_TRUE(NS_SUCCEEDED(service->Notify(removeme, topic.Ge tUnicode(), aData))) failed: '(!((service->Notify(removeme, topic.GetUnicode(), aData)) & 0x80000000))', file d:\0315\mozilla\xpfe\appshell\src\nsXULWindow.cpp, line 1105 Opening D:\0315\ns\dist\WIN32_D.OBJ\bin\chrome\navigator\content\default\NavSecu rityOverlay.xul failed *** Failed to load overlay chrome://navigator/content/NavSecurityOverlay.xul ###!!! ASSERTION: element not in tree: 'mDocument != nsnull', file d:\0315\mozil la\rdf\content\src\nsXULElement.cpp, line 3460 ###!!! ASSERTION: unable to create widget item: 'NS_SUCCEEDED(rv)', file d:\0315 \mozilla\rdf\content\src\nsRDFGenericBuilder.cpp, line 1009 ###!!! ASSERTION: NS_ENSURE_TRUE(NS_SUCCEEDED(mXULWindow->NotifyObservers(status Name.GetUnicode(), status.GetUnicode()))) failed: '(!((mXULWindow->NotifyObserve rs(statusName.GetUnicode(), status.GetUnicode())) & 0x80000000))', file d:\0315\ mozilla\xpfe\appshell\src\nsContentTreeOwner.cpp, line 203 Looking in to why BrowserAppcore is not initialized....
We talked with joki about this. He said it sounded like a bug we had before when some doc loading ordering changed such that onload might be getting called when the doc in the window had changed. So, the onload handler getting called was associated with the *old* content of the window and had already been gc'd. He has a build going and is going to look into it.
Joki has some ideas about what might be happening here, so I am flipping it over to him as he is going to play around with this for a while.
Assignee: travis → joki
I have a fix for this...
Alex, My god! What is it?!?
I see that BrowserInstance is actually initialised during shutdown. Is Onload handler being called during unload?
If you look at ns/netwerk/security/browser/NavSecurityOverlay.xul, there is a reference to a NavSecurityOverlay.css file. Moving the reference out of the overlay xul tag and to the top of the file fixes this problem. So if you want to be able to repro this case so you can fix the crash, just move the <?xml-stylesheet ... > reference back inside the overlay tag. I have checked in the fix that removes the problem as a blocker so I'm clearing the keywords and lowering the severity.
Severity: blocker → critical
Component: Installer → Javascript Engine
Keywords: dogfood, smoketest
Whiteboard: NEED HELP FROM JS GURU!!!
Actually, the problem was with the js file that was referenced, not the css file. In NavSecurityUI.js, there was a top level call to: window.addEventListener("load", SetSecurityButton, false); Commenting this out lets us run correctly. Sorry for the confusion. Has anyone changed the addEventListener call recently?
I just wanted to point out that this *did* turn out to be the kind of problem that I was suggesting. Namely that the code was adding a JS function - which was rooted because it is a property of the global window object - as an event listener. And then later some code comes along and defines some other function with the same name that displaces the first function in its slot as a property of the global window object. The first object is then in gc limbo. This is very dangerous. We have a short term fix for this specific instance of the problem. But this is a trap that we could well fall into again. Holding a JS function in a native object for future use without rooting it in one way or another is evil.
Component: Javascript Engine → DOM Level 0
Mass-moving bugs out of M15 that I won't get to. Will refit individual milestones after moving them.
Target Milestone: M15 → M16
This is not a current crashing bug, moving to M17
Status: NEW → ASSIGNED
Summary: Browser crashes on startup → Need to solve problem of JS roots for event handler funcs
Target Milestone: M16 → M17
reassigning QA contact
QA Contact: gbush → desale
Updating Milestone to M18.
Target Milestone: M17 → M18
Per heikki, saari, ckritzer: This bug has been marked "future" because the original netscape engineer working on this is over-burdened. If you feel this is an error, that you or another known resource will be working on this bug,or if it blocks your work in some way -- please attach your concern to the bug for reconsideration.
Target Milestone: M18 → Future
Blocks: 58551
This is serious, see blocked bug. Please unfuture this. nisheeth: please allow joki to work on this bug. It is critical.
Keywords: crash, mozilla0.9, rtm
Whiteboard: [hit in 10/30 builds]
I talked with jband and nisheeth about this earlier today. Unfortunately fixing this bug has some pretty scary repercussions. The basic issue is that if we simply root the js functions we'll create circular references which will keep the native content from being released. But we also can't easily figure out a way to know when a JSFunc goes away. I believe what it comes down to is that any quick fix for rtm would be scarier than this serious but low frequency, and generally workaround-able, bug. Per nisheeth I'm going to rtm- this. But we'll keep looking into it and see if we can find something we've missed.
Whiteboard: [hit in 10/30 builds] → [hit in 10/30 builds][rtm-]
JS GC can handle cycles within its heap just fine -- a garbage cycle will be collected. Since JS functions don't hold strong XPCOM refs directly, the only problem could be an indirect XPCOM strong ref, perhaps from handler function through its parent link to the window object, and then through the window's strong XPCOM ref in its private data slot. That problem exists in spades in the DOM, and it's handled using out-of-band SetDocument signalling. Can we propagate SetDocument calls into the listeners, and have them remove their roots? Do we do that already? If SetDocument had the effect of removing each listener's handler-root, we would still want to clear that root when the handler function object is finalized. One step at a time.... /be
*** Bug 58551 has been marked as a duplicate of this bug. ***
Adding topcrash keyword and [@ js_LockScope1] [@ js_GetSlotWhileLocked] for tracking. These are the top 2 crashers with the latest RTM builds.
Summary: Need to solve problem of JS roots for event handler funcs → Need to solve problem of JS roots for event handler funcs [@ js_LockScope1] [@ js_GetSlotWhileLocked]
This doesn't look like a topcrasher anymore, but there are still a few crashes in the official RTM build that have 3 similar lines at the top of the stack. Here is the latest stack trace and one entry from talkback: Incident ID 22054384 KERNEL32.DLL + 0x2a3c0 (0xbff9a3c0) KERNEL32.DLL + 0x98af (0xbff798af) js_LockScope1 [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 668] js_LockObj [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 739] js_GetSlotWhileLocked [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 295] JS_GetParent [d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 1854] nsJSUtils::nsGetStaticScriptGlobal [d:\builds\seamonkey\mozilla\dom\src\base\nsJSUtils.cpp, line 685] nsJSUtils::nsConvertObjectToJSVal [d:\builds\seamonkey\mozilla\dom\src\base\nsJSUtils.cpp, line 217] nsJSUtils::nsConvertObjectToJSVal [d:\builds\seamonkey\mozilla\dom\src\base\nsJSUtils.cpp, line 217] WindowInternalOpen [d:\builds\seamonkey\mozilla\dom\src\base\nsJSWindow.cpp, line 4203] js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 822] js_Interpret [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 2623] js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 838] js_InternalInvoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 910] JS_CallFunctionValue [d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 3195] handleTriggerEvent [d:\builds\seamonkey\mozilla\xpinstall\src\nsXPITriggerInfo.cpp, line 146] PL_HandleEvent [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 581] PL_ProcessPendingEvents [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 517] _md_EventReceiverProc [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 1051] ------------ CrashDate: 2000-11-30 UptimeMinutes: 2 Total: 260 OS: Windows 95 4.0 build 67109975 Incident ID: http://climate/reports/stackcommentemail.cfm?dynamicBBID=22054384
jpatel: your stack looks like a symptom of bug 58573, not of this one. /be
I've just attached a HTML file that contains JS code that will crash mozilla by explictly invoking this bug. (At least I think this is the relevant bug). The file needs to be invoked from a javascript: url. To test it, save the attached file as Picker.html. Then load some other interesting html file into mozilla (use a local file: url or the javascript: url won't work.) Then, enter a javascript: url like the following into the location field: javascript:open("/tmp/Picker.html") This will open a new window and load the attached file into it. The JS code in this new window registers a mouseover event handler on your main window. Move the mouse over your original window and watch the contents of the new window change. When you're tired of doing that, dismiss the new window, which destroys the event handler, then move the mouse over your original window. It triggers the now non-existant event handler, and mozilla crashes. I'm using mozilla build id 2000120521
Changing crasher bug milestone to mozilla0.9.
Target Milestone: Future → mozilla0.9
*** Bug 67691 has been marked as a duplicate of this bug. ***
Keywords: dom0
*** Bug 69017 has been marked as a duplicate of this bug. ***
*** Bug 49980 has been marked as a duplicate of this bug. ***
*** Bug 60785 has been marked as a duplicate of this bug. ***
Blocks: 31426
*** Bug 70361 has been marked as a duplicate of this bug. ***
Using Brendan's patch, I don't see any more leaks than usual. (I leaked 19K after 10 minutes or so of browsing, leaked the usual amount (I think) with './mozilla -mail', etc.) I haven't seen a single nsJSDOMEventListener leaked. I'm not exactly sure why we're not leaking. Assuming (!) that not every call to AddEventListener that I"m seeing is matched by a RemoveEventListener and that when the event listener is rooted the window object can't be GCed because of the event listener's parent pointer, I don't see how there would be leaks that I'm not seeing. I still don't see what's breaking the cycle, though...
Not surprisingly, this patch dramatically increases the leaks on a test page where we already leak nsJSDOMEventListener objects without the patch: http://www.people.fas.harvard.edu/~dbaron/dom/test/two-events/EventTarget Since we already leak nsJSDOMEventListener objects on that page without the patch, I'm still looking for a good leak test...
This fixes 71610, which is a smoke-test blocker. If someone can review it (sr=shaver) I'll check it in ASAP.
r=dbaron, since I don't want to hold up the blocker over theoretical leaks that can't be demonstrated. I still don't completely understand why it doesn't leak, but it seems like it's OK.
I've checked this in, to get the tree green again. Hope nobody gets mad at little old me!
Status: ASSIGNED → RESOLVED
Closed: 25 years ago24 years ago
Resolution: --- → FIXED
*** Bug 69172 has been marked as a duplicate of this bug. ***
Crash Signature: [@ js_LockScope1] [@ js_GetSlotWhileLocked]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: