Closed Bug 31847 Opened 24 years ago Closed 23 years ago

Need to solve problem of JS roots for event handler funcs [@ js_LockScope1] [@ js_GetSlotWhileLocked]

Categories

(Core :: DOM: Core & HTML, defect, P1)

x86
Windows NT
defect

Tracking

()

RESOLVED FIXED
mozilla0.9

People

(Reporter: shrir, Assigned: joki)

References

Details

(Keywords: crash, dom0, topcrash, Whiteboard: [hit in 10/30 builds][rtm-])

Crash Data

Attachments

(3 files)

I tried installing today's windows build 2000-03-14-12-M15 

I crashed just after the installation completed. Clicking on the Netscape 6 icon 
also crashes and the browser does not start.
Stack trace:

   
   ntdll.dll + 0xce0c (0x77f6ce0c) 
                                              
     
   ntdll.dll + 0x7546 (0x77f67546) 
                                              
     
   js_LockScope1 
                                             
[d:\builds\seamonkey\mozilla\js\src\jslock.c, line 649]
     
   js_LockObj 
                                             
[d:\builds\seamonkey\mozilla\js\src\jslock.c, line 720]
     
   js_GetSlotWhileLocked 
                                             
[d:\builds\seamonkey\mozilla\js\src\jslock.c, line 276]
     
   js_ValueToFunction 
                                             
[d:\builds\seamonkey\mozilla\js\src\jsfun.c, line 1663]
     
   JS_ValueToFunction 
                                             
[d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 493]
     
   nsJSContext::CallEventHandler 
                                             
[d:\builds\seamonkey\mozilla\dom\src\base\nsJSEnvironment.cpp, line 542]
     
   nsJSDOMEventListener::HandleEvent 
                                             
[d:\builds\seamonkey\mozilla\dom\src\events\nsJSDOMEventListener.cpp, line 97]
     
   nsEventListenerManager::HandleEventSubType
                                             
[d:\builds\seamonkey\mozilla\layout\events\src\nsEventListenerManager.cpp, line 
701]
     
   nsEventListenerManager::HandleEvent 
                                             
[d:\builds\seamonkey\mozilla\layout\events\src\nsEventListenerManager.cpp, line 
1256]
     
   GlobalWindowImpl::HandleDOMEvent 
                                             
[d:\builds\seamonkey\mozilla\dom\src\base\nsGlobalWindow.cpp, line 374]
     
   nsWebShell::OnEndDocumentLoad 
                                             
[d:\builds\seamonkey\mozilla\webshell\src\nsWebShell.cpp, line 2380]
     
   nsDocLoaderImpl::FireOnEndDocumentLoad 
                                             
[d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 592]
     
   nsDocLoaderImpl::DocLoaderIsEmpty 
                                             
[d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 494]
     
   nsDocLoaderImpl::DocLoaderIsEmpty 
                                             
[d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 466]
     
   nsDocLoaderImpl::OnStopRequest 
                                             
[d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 438]
     
   nsLoadGroup::RemoveChannel 
                                             
[d:\builds\seamonkey\mozilla\netwerk\base\src\nsLoadGroup.cpp, line 545]
     
   nsInputStreamChannel::OnStopRequest 
                                             
[d:\builds\seamonkey\mozilla\netwerk\base\src\nsInputStreamChannel.cpp, line 
367]
     
   nsOnStopRequestEvent::HandleEvent 
                                             
[d:\builds\seamonkey\mozilla\netwerk\base\src\nsAsyncStreamListener.cpp, line 
288]
     
   nsStreamListenerEvent::HandlePLEvent 
                                             
[d:\builds\seamonkey\mozilla\netwerk\base\src\nsAsyncStreamListener.cpp, line 
98]
     
   PL_HandleEvent 
                                             
[d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 564]
     
   PL_ProcessPendingEvents 
                                             
[d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 527]
     
   _md_EventReceiverProc 
                                             
[d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 1020]
     
   USER32.dll + 0x1820 (0x77e71820) 
Summary: Crash trying to install browser → Browser crashes after installation and trying to launch
*** Bug 31841 has been marked as a duplicate of this bug. ***
*** Bug 31841 has been marked as a duplicate of this bug. ***
Keywords: smoketest
Target Milestone: M15
reassigning to dp, cc selmer.

I don't see this on the mozilla build.
Assignee: ssu → dp
Win32 (2000-03-14-12-M15) commercial
I see the crash in the netscape build too.
As can be seen in the console log attached to the bug, profile manager has 
completed ALL of it's activities.  (Look for those inconspicuous Begin and End 
comments in the log :-)
Looking at what this could potentially be....
From the stack looks like jsdom or webshell. I am cluless as to what this is. 
We will need a commercial tree to test this out.

Travis any clue ?
Assignee: dp → travis
I get the crash, but can still run my tests for AIm if I use command line option 
-aim when I launch Seamonkey.
I hear the same is true for Mail and other apps outside of the browser also.
Hmmmm, it's gonna be tough for Travis to fix this seeing as how he's not in the
office (yet) today.
Putting on dogfood, PDT+ radar.
Keywords: dogfood
Whiteboard: [PDT+]
Only reason webshell is in the stack is because it is the end of the document 
load and it is firing the OnEndDocumentLoad so JS does the onLoad handlers.  
Hyatt changed a bunch of skin stuff last night that probably needs to be 
applied to the commercial tree.
Assignee: travis → hyatt
yeah yeah yeah... stupid fucking installer files

Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Wait.  I was right the first time.  There's no problem with the installer files.
reopening
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
removing installer red herring from summary
Summary: Browser crashes after installation and trying to launch → Browser crashes on startup
The crash occurs when global window is handling the onload events.  Something 
goes wrong with the attempt to execute the event handler.  Seems to be some sort 
of window/webshell timing thing.

I do not believe this is my bug.  Reassigning to travis.

I have a commercial build and am trying to investigate, but I'm baffled.  All of 
the data looks sane.
Assignee: hyatt → travis
Status: REOPENED → NEW
Travis says, "Yeah, I am seeing this too now with my build that finally
finished.  This is dying in NT DLLs from deep inside js.  I think we need some
JS experts to help look at this."
Priority: P3 → P1
Whiteboard: [PDT+] → [PDT+] NEED HELP FROM JS GURU!!!
Removing PDT+, this is only in the tip, not in the beta branch.
Whiteboard: [PDT+] NEED HELP FROM JS GURU!!! → NEED HELP FROM JS GURU!!!
We don't need a JS guru, we need a DOM guru...

I see this in my comercial build too. The JSObject is garbage. The 
nsJSDOMEventListener's mHandler is a JSObject that has probably been previously 
gc'd. Now why would that be? Somewhere there is probably a call to 
NS_NewScriptEventListener that passes in a JSObject that just isn't around 
anymore by the time that nsWebShell::OnEndDocumentLoad gets called.
If it helps... The document that finished loading was navigator.xul.
I did a tiny bit of digging in navigator.xul etc. without finding anything 
incriminating. I have a few thoughts...

It looks to me like nsJSDOMEventListener is doing a dangerous thing by hanging 
on to JSObjects for future use without rooting them. There is an implied 
assumption that such objects must be rooted in JS via some other means. But that 
can break down. For instance: One might set function 'foo' as the onLoad handler 
and then later set foo = null (or some other function). This would leave the 
function object held in the nsJSDOMEventListener vulnerable to be gc'd. Right?

Also, an onLoad function that lives in window 'A' which targets the completion 
of loading window 'B' is broken if window 'A' is destroyed before window 'B' 
finished loading.

Perhaps there are other scenerios where this scheme breaks down.

With limited knowlege of how our chrome content is structured (especially in the 
comercial tree), I didn't find instances of such things. But, I wouldn't bet 
against them lurking there.

It seems to me that the short term issue is to find any JS/xul/html code that 
might be setting a vulnerable function as an event handler. The longer term 
issue is to look at using JS_AddNamedRoot in nsJSDOMEventListener (or 
thereabouts). This is not a no-brainer because any roots that we add increase 
the likelyhood of erroneously uncollectable graphs in JS (and spreading into 
the native realm).

Does this shake anything loose for anyone?
adding jst@netscape.com (our virtual vidur) to cc list.
jband, I noticed you made gc changes over the weekend.  Does this problem go 
away rolling those back?  That would answer when it entered in.  But from your 
description it sounds like it just exposed an existing problem.
Win32 (2000-03-15-09 m15) 
this problem still exists in today's build
Additional info:
Removed mozregistry.dat and the build shows the splash screen, but never opens a 
browser window.
The changes I made to jsgc were just to protect from gc reentrance and should 
have no effect here. Just to be sure I did a test of backing them out and this 
crash remains. The fact is that there were a *lot* of checkins over the weekend. 
I still believe that some bit of chrome (or possibly native code) was checked in 
that trips over this weakspot in nsJSDOMEventListener.
This could be totally unrelated but here goes...

I just did a clobber build on NT 4 of todays mozilla tree and I'm seeing an nspr
assert on startup, if I ignore the assert I hit a few more asserts in
NS_CheckThreadSafe() and mozilla exits, no real crash AFAIK! Seems like giving
an url as a command line argument to mozilla makes mozilla start normally but
nothing is loaded, entering a *file* url in the url bar loads the file but
loading a http url crashes mozilla (this could possibly be what's causing the
crash (or whatever it really is) during a normal startup)!

The stack trace at the first nspr assert I get is:

NTDLL! DbgBreakPoint@0 address 0x77f76148
PR_Assert(const char * 0x0026b824 
??_C@_0BJ@LHIK@?$CB?$CCI?1O?5method?5is?5invalid?$CC?$AA@, const char * 
0x0026b840 ??_C@_0L@KLDB@priometh?4c?$AA@, int 72) line 448
_PR_InvalidInt16() line 72 + 35 bytes
_PR_MD_PR_POLL(PRPollDesc * 0x014535d0, int 2, unsigned int 2610055) line 131 + 
42 bytes
PR_Poll(PRPollDesc * 0x014535d0, int 2, unsigned int 2610055) line 115 + 17 
bytes
nsSocketTransportService::Run(nsSocketTransportService * const 0x01455ea4) line 
414 + 24 bytes
nsThread::Main(void * 0x01454230) line 83 + 26 bytes
_PR_NativeRunThread(void * 0x01453360) line 399 + 13 bytes
_threadstartex(void * 0x01454550) line 212 + 13 bytes
KERNEL32! BaseThreadStart@8 + 81 bytes

As I said, this could be unrelated, I've never been able to reproduce the event
handler crash...
I get the following text when I startup (after the profile manager and after 
xpcom registration):

Loading page specified via openDialog
frame: Box[id=] (00F11F38) style: 03E86200 {
  http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul|* weight: 0 { 
display: 1[0x1]enum }
}
Wrong parent style context:  style: 03E86770 {
  http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul|* weight: 0 { 
display: 1[0x1]enum }
}
should be using:  style: 03F66940 {
  http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul|* weight: 0 { 
display: 1[0x1]enum }
}

Opening file cookies.txt failed
Opening file cookperm.txt failed
The crash jst reports is actually something different I think.  Warren I think 
owns those, adding him on the CC list.
The channel gets removed from the same load group twice. Once in the listener on 
StopRequest and then again by assigning null. Here is the code from 
nsInputStreamChannel.cpp:

nsInputStreamChannel::OnStopRequest(nsIChannel* transportChannel, nsISupports* 
context,
                                    nsresult aStatus, const PRUnichar* aMsg)
{
    nsresult rv;

    rv = mRealListener->OnStopRequest(this, context, aStatus, aMsg);

    if (0 && mLoadGroup) {
        if (NS_SUCCEEDED(rv)) {
            mLoadGroup->RemoveChannel(this, context, aStatus, aMsg);
        }
    }

    // Release the reference to the consumer stream listener...
    mRealListener = null_nsCOMPtr();
    mFileTransport = null_nsCOMPtr();
    return rv;
}

Gagan, is this what you would expect to have happen?
From a fresh pull done 2 hours ago, this is what I see,

1) mozilla and commercial builds come up, no crash, but no url is loaded upon 
statup.
2) Sidebar is also empty. Clicking on bookmarks on toolbars loads the page.
3) Cliking on links in a page load it.
4) typing in a url in the urlbar gives the message,
"BrowserAppCore has not been initialized".

5) I get a lot of the following assertions...

JavaScript Error: uncaught exception: [Exception... "Factory not registered"  co
de: "-2147221164" nsresult: "0x80040154 (NS_ERROR_FACTORY_NOT_REGISTERED)"  loca
tion: "http://home.netscape.com/h.js Line: 3"]

###!!! ASSERTION: NS_ENSURE_TRUE(NS_SUCCEEDED(service->Notify(removeme, topic.Ge
tUnicode(), aData))) failed: '(!((service->Notify(removeme, topic.GetUnicode(),
aData)) & 0x80000000))', file d:\0315\mozilla\xpfe\appshell\src\nsXULWindow.cpp,
 line 1105

Opening D:\0315\ns\dist\WIN32_D.OBJ\bin\chrome\navigator\content\default\NavSecu
rityOverlay.xul failed
*** Failed to load overlay chrome://navigator/content/NavSecurityOverlay.xul

###!!! ASSERTION: element not in tree: 'mDocument != nsnull', file d:\0315\mozil
la\rdf\content\src\nsXULElement.cpp, line 3460
###!!! ASSERTION: unable to create widget item: 'NS_SUCCEEDED(rv)', file d:\0315
\mozilla\rdf\content\src\nsRDFGenericBuilder.cpp, line 1009
###!!! ASSERTION: NS_ENSURE_TRUE(NS_SUCCEEDED(mXULWindow->NotifyObservers(status
Name.GetUnicode(), status.GetUnicode()))) failed: '(!((mXULWindow->NotifyObserve
rs(statusName.GetUnicode(), status.GetUnicode())) & 0x80000000))', file d:\0315\
mozilla\xpfe\appshell\src\nsContentTreeOwner.cpp, line 203

Looking in to why BrowserAppcore is not initialized....
We talked with joki about this. He said it sounded like a bug we had before when  
some doc loading ordering changed such that onload might be getting called when 
the doc in the window had changed. So, the onload handler getting called was 
associated with the *old* content of the window and had already been gc'd. He 
has a build going and is going to look into it.
Joki has some ideas about what might be happening here, so I am flipping it 
over to him as he is going to play around with this for a while.
Assignee: travis → joki
I have a fix for this...
Alex, My god!  What is it?!?
I see that BrowserInstance is actually initialised during shutdown. Is Onload
handler being called during unload?
If you look at ns/netwerk/security/browser/NavSecurityOverlay.xul, there is a 
reference to a NavSecurityOverlay.css file.  Moving the reference out of the 
overlay xul tag and to the top of the file fixes this problem.

So if you want to be able to repro this case so you can fix the crash, just move 
the <?xml-stylesheet ... > reference back inside the overlay tag.

I have checked in the fix that removes the problem as a blocker so I'm clearing 
the keywords and lowering the severity.
Severity: blocker → critical
Component: Installer → Javascript Engine
Keywords: dogfood, smoketest
Whiteboard: NEED HELP FROM JS GURU!!!
Actually, the problem was with the js file that was referenced, not the css file.

In NavSecurityUI.js, there was a top level call to:

window.addEventListener("load", SetSecurityButton, false);

Commenting this out lets us run correctly.   Sorry for the confusion.

Has anyone changed the addEventListener call recently?
I just wanted to point out that this *did* turn out to be the kind of problem 
that I was suggesting. Namely that the code was adding a JS function - which was 
rooted because it is a property of the global window object - as an event 
listener. And then later some code comes along and defines some other function 
with the same name that displaces the first function in its slot as a property 
of the global window object. The first object is then in gc limbo. This is very 
dangerous. We have a short term fix for this specific instance of the problem. 
But this is a trap that we could well fall into again. Holding a JS function in 
a native object for future use without rooting it in one way or another is evil.
Component: Javascript Engine → DOM Level 0
Mass-moving bugs out of M15 that I won't get to.  Will refit individual 
milestones after moving them.
Target Milestone: M15 → M16
This is not a current crashing bug, moving to M17
Status: NEW → ASSIGNED
Summary: Browser crashes on startup → Need to solve problem of JS roots for event handler funcs
Target Milestone: M16 → M17
reassigning QA contact
QA Contact: gbush → desale
Updating Milestone to M18.
Target Milestone: M17 → M18
Per heikki, saari, ckritzer:
This bug has been marked "future" because the original netscape engineer working 
on this is over-burdened. If you feel this is an error, that you or another
known resource will be working on this bug,or if it blocks your work in some way 
-- please attach your concern to the bug for reconsideration.
Target Milestone: M18 → Future
Blocks: 58551
This is serious, see blocked bug. Please unfuture this.

nisheeth: please allow joki to work on this bug. It is critical.
Keywords: crash, mozilla0.9, rtm
Whiteboard: [hit in 10/30 builds]
I talked with jband and nisheeth about this earlier today.  Unfortunately fixing 
this bug has some pretty scary repercussions.  The basic issue is that if we 
simply root the js functions we'll create circular references which will keep 
the native content from being released.  But we also can't easily figure out a 
way to know when a JSFunc goes away.

I believe what it comes down to is that any quick fix for rtm would be scarier 
than this serious but low frequency, and generally workaround-able, bug.  Per 
nisheeth I'm going to rtm- this.  But we'll keep looking into it and see if we 
can find something we've missed.
Whiteboard: [hit in 10/30 builds] → [hit in 10/30 builds][rtm-]
JS GC can handle cycles within its heap just fine -- a garbage cycle will be 
collected.  Since JS functions don't hold strong XPCOM refs directly, the only 
problem could be an indirect XPCOM strong ref, perhaps from handler function 
through its parent link to the window object, and then through the window's 
strong XPCOM ref in its private data slot.

That problem exists in spades in the DOM, and it's handled using out-of-band 
SetDocument signalling.  Can we propagate SetDocument calls into the listeners, 
and have them remove their roots?  Do we do that already?

If SetDocument had the effect of removing each listener's handler-root, we would 
still want to clear that root when the handler function object is finalized.  
One step at a time....

/be
*** Bug 58551 has been marked as a duplicate of this bug. ***
Adding topcrash keyword and [@ js_LockScope1] [@ js_GetSlotWhileLocked]
 for tracking.  These are the top 2 crashers with the latest RTM builds.
Summary: Need to solve problem of JS roots for event handler funcs → Need to solve problem of JS roots for event handler funcs [@ js_LockScope1] [@ js_GetSlotWhileLocked]
This doesn't look like a topcrasher anymore, but there are still a few crashes
in the official RTM build that have 3 similar lines at the top of the stack.
Here is the latest stack trace and one entry from talkback:

Incident ID 22054384
KERNEL32.DLL + 0x2a3c0 (0xbff9a3c0)
KERNEL32.DLL + 0x98af (0xbff798af)
js_LockScope1 [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 668]
js_LockObj [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 739]
js_GetSlotWhileLocked [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 295]
JS_GetParent [d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 1854]
nsJSUtils::nsGetStaticScriptGlobal
[d:\builds\seamonkey\mozilla\dom\src\base\nsJSUtils.cpp, line 685]
nsJSUtils::nsConvertObjectToJSVal
[d:\builds\seamonkey\mozilla\dom\src\base\nsJSUtils.cpp, line 217]
nsJSUtils::nsConvertObjectToJSVal
[d:\builds\seamonkey\mozilla\dom\src\base\nsJSUtils.cpp, line 217]
WindowInternalOpen [d:\builds\seamonkey\mozilla\dom\src\base\nsJSWindow.cpp,
line 4203]
js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 822]
js_Interpret [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 2623]
js_Invoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 838]
js_InternalInvoke [d:\builds\seamonkey\mozilla\js\src\jsinterp.c, line 910]
JS_CallFunctionValue [d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 3195]
handleTriggerEvent
[d:\builds\seamonkey\mozilla\xpinstall\src\nsXPITriggerInfo.cpp, line 146]
PL_HandleEvent [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 581]
PL_ProcessPendingEvents [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c,
line 517]
_md_EventReceiverProc [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line
1051]
------------
 CrashDate: 2000-11-30 UptimeMinutes: 2  Total: 260 OS: Windows 95  4.0 build
67109975
	 Incident ID: http://climate/reports/stackcommentemail.cfm?dynamicBBID=22054384
jpatel: your stack looks like a symptom of bug 58573, not of this one.

/be
I've just attached a HTML file that contains JS code that will crash mozilla
by explictly invoking this bug.  (At least I think this is the relevant bug).

The file needs to be invoked from a javascript: url.  To test it, save the
attached file as Picker.html.  Then load some other interesting html file into
mozilla (use a local file: url or the javascript: url won't work.)  Then, enter
a javascript: url like the following into the location field:

	javascript:open("/tmp/Picker.html")

This will open a new window and load the attached file into it.  The JS code in
this new window registers a mouseover event handler on your main window.  Move
the mouse over your original window and watch the contents of the new window
change.  When you're tired of doing that, dismiss the new window, which destroys
the event handler, then move the mouse over your original window.  It triggers
the now non-existant event handler, and mozilla crashes.

I'm using mozilla build id 2000120521
Changing crasher bug milestone to mozilla0.9.
Target Milestone: Future → mozilla0.9
*** Bug 67691 has been marked as a duplicate of this bug. ***
Keywords: dom0
*** Bug 69017 has been marked as a duplicate of this bug. ***
*** Bug 49980 has been marked as a duplicate of this bug. ***
*** Bug 60785 has been marked as a duplicate of this bug. ***
Blocks: 31426
*** Bug 70361 has been marked as a duplicate of this bug. ***
Using Brendan's patch, I don't see any more leaks than usual.  (I leaked 19K
after 10 minutes or so of browsing, leaked the usual amount (I think) with
'./mozilla -mail', etc.)  I haven't seen a single nsJSDOMEventListener leaked.

I'm not exactly sure why we're not leaking.  Assuming (!) that not every call
to AddEventListener that I"m seeing is matched by a RemoveEventListener and
that when the event listener is rooted the window object can't be GCed because
of the event listener's parent pointer, I don't see how there would be leaks
that I'm not seeing.  I still don't see what's breaking the cycle, though...
Not surprisingly, this patch dramatically increases the leaks on a test page
where we already leak nsJSDOMEventListener objects without the patch:

http://www.people.fas.harvard.edu/~dbaron/dom/test/two-events/EventTarget

Since we already leak nsJSDOMEventListener objects on that page without the
patch, I'm still looking for a good leak test...
This fixes 71610, which is a smoke-test blocker.  If someone can review it
(sr=shaver) I'll check it in ASAP.
r=dbaron, since I don't want to hold up the blocker over theoretical leaks that
can't be demonstrated.  I still don't completely understand why it doesn't leak,
but it seems like it's OK.
I've checked this in, to get the tree green again.  Hope nobody gets mad at
little old me!
Status: ASSIGNED → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → FIXED
*** Bug 69172 has been marked as a duplicate of this bug. ***
Crash Signature: [@ js_LockScope1] [@ js_GetSlotWhileLocked]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: