Closed Bug 58551 Opened 24 years ago Closed 24 years ago

crash on startup -- 10/30 branch build [@ js_GetSlotWhileLocked]

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

VERIFIED DUPLICATE of bug 31847

People

(Reporter: jrgmorrison, Assigned: joki)

References

Details

(Keywords: smoketest, topcrash, Whiteboard: [rtm++])

Crash Data

Attachments

(3 files)

Posted this to the hook and n.p.m.builds. (This may not be JS per se but 
I don't have any other clues right now).

I'm assuming that the builds in 
  ftp://sweetlou/products/client/seamonkey/windows/32bit/x86/2000-10-30-20-MN6/
should be good to go, right? 

On win2k and win98, I am crashing on startup with this talkback provided
trace (the top few lines vary somewhat, but always in jslock.c). Sometimes,
with a new profile, I can start OK, but the second run will crash, either
just as the browser window starts to come up, or on one occasion just after
it had finished coming up.

I'm filing a bug now.

John 

js_GetSlotWhileLocked [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 272] 
JS_GetPrivate [d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 1799] 
nsScriptSecurityManager::GetFunctionObjectPrincipal 
[d:\builds\seamonkey\mozilla\caps\src\nsScriptSecurityManager.cpp, line 842] 
nsScriptSecurityManager::CheckFunctionAccess 
[d:\builds\seamonkey\mozilla\caps\src\nsScriptSecurityManager.cpp, line 610] 
nsJSContext::CallEventHandler 
[d:\builds\seamonkey\mozilla\dom\src\base\nsJSEnvironment.cpp, line 911] 
nsJSDOMEventListener::HandleEvent 
[d:\builds\seamonkey\mozilla\dom\src\events\nsJSDOMEventListener.cpp, line 89] 
nsEventListenerManager::HandleEventSubType 
[d:\builds\seamonkey\mozilla\layout\events\src\nsEventListenerManager.cpp, line 
789] 
nsEventListenerManager::HandleEvent 
[d:\builds\seamonkey\mozilla\layout\events\src\nsEventListenerManager.cpp, line 
1372] 
GlobalWindowImpl::HandleDOMEvent 
[d:\builds\seamonkey\mozilla\dom\src\base\nsGlobalWindow.cpp, line 512] 
DocumentViewerImpl::LoadComplete 
[d:\builds\seamonkey\mozilla\layout\base\src\nsDocumentViewer.cpp, line 676] 
nsWebShell::OnEndDocumentLoad 
[d:\builds\seamonkey\mozilla\docshell\base\nsWebShell.cpp, line 954] 
nsDocLoaderImpl::FireOnEndDocumentLoad 
[d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 818] 
nsDocLoaderImpl::DocLoaderIsEmpty 
[d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 620] 
nsDocLoaderImpl::OnStopRequest 
[d:\builds\seamonkey\mozilla\uriloader\base\nsDocLoader.cpp, line 555] 
nsLoadGroup::RemoveChannel 
[d:\builds\seamonkey\mozilla\netwerk\base\src\nsLoadGroup.cpp, line 583] 
nsJARChannel::OnStopRequest 
[d:\builds\seamonkey\mozilla\netwerk\protocol\jar\src\nsJARChannel.cpp, line 
709] 
nsOnStopRequestEvent::HandleEvent 
[d:\builds\seamonkey\mozilla\netwerk\base\src\nsAsyncStreamListener.cpp, line 
302] 
nsStreamListenerEvent::HandlePLEvent 
[d:\builds\seamonkey\mozilla\netwerk\base\src\nsAsyncStreamListener.cpp, line 
106] 
PL_HandleEvent [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 581] 
PL_ProcessPendingEvents [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, 
line 517] 
_md_EventReceiverProc [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, 
line 1051] 
KERNEL32.DLL + 0x24407 (0xbff94407) 
0x00658b52
adding putterman.  he's seeing js crashes too.
There are 25 reports already for this from Netscape6 Builds, including 3 from 
me.
(about 8 were from me, on three machines. [Hey, I wanted to be sure]).

But note: I'm not crashing in the mozilla build on the branch from the same 
build time. (Is it a build thing, or from the commercial tree ...)

*** Bug 58550 has been marked as a duplicate of this bug. ***
The other top of stack that talkback provides for this crash is 

ntdll.dll + 0xed61 (0x77f8ed61) 
ntdll.dll + 0xecf1 (0x77f8ecf1) 
js_LockScope1 [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 668] 
js_LockObj [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 739] 
js_GetSlotWhileLocked [d:\builds\seamonkey\mozilla\js\src\jslock.c, line 295] 
JS_GetPrivate [d:\builds\seamonkey\mozilla\js\src\jsapi.c, line 1799] 
nsScriptSecurityManager::GetFunctionObjectPrincipal 
[d:\builds\seamonkey\mozilla\caps\src\nsScriptSecurityManager.cpp, line 842] 
nsScriptSecurityManager::CheckFunctionAccess 
[d:\builds\seamonkey\mozilla\caps\src\nsScriptSecurityManager.cpp, line 610] 
   ... snip ...

I'm getting the same crash on Linux.
Yeah, I got one of the JS_LockScope1 crashes as well.
Just one more data point: I just pulled fresh commercial NS6 bits from sweetlou
and things work just fine on Win2k here...
This 'xxx-20' build would be a build made in the middle of the day today after 
most of the limbo fixes when in right?  The 'xxx-08' build doesn't have this 
problem, right?

I'm not seeing problems with my branch debug build from this afternoon on NT. 
I'll update again and build the commercial part too.
I'm upgrading severity.  This is the build (from afternoon, evening) which the 
release team made after all the Limbo bugs were checked in.
Severity: normal → critical
My crash window from the OS said kernel32.dll.  Here are some talkback 
IDs: TB20130288H, TB20130227E.  Am I having this bug or a different one?  (I'm 
also on the limbo 1 build.)
Raise severity to blocker, add a couple random keywords
Severity: critical → blocker
Keywords: rtm, smoketest
Steve, yes, both of yours were js_LockScope1 crashes.  
*** Bug 58606 has been marked as a duplicate of this bug. ***
I crash on startup with debug build. The one I downloaded from sweetlou seems to
work fine except I crash 100% on Help > About Netscape. The crash happens in
some js DLL, so I am assuming it could be the same as this. This is on NT.
seen on branch builds:
windows 2000-10-31-09-MN6
linux 2000-10-31-09-MN6
mac 2000-10-31-08-MN6

note: trunk builds from this morning worked fined
Here's my first guess: Bubg #53849 looks very suspicious to me. I haven't talked
to a developer whose seen this crash in a debug build yet. Only in release
builds. Our debug builds don't have JAVA so that would make sense.

One problem with my theory is that OJI shouldn't be invoked on startup right?
Unless the plugin manager is doing some funky pre-loading of some things....


I also crash when doing Ctrl+n to open a new window.
mscott and I are about to test out his theory. As soon as I moved the java 
plugins into my build directory I crashed at this spot.
OK. I can get this in the debugger on my branch commercial build. I have not 
seen mozilla.exe crash in this way.

This is a garbage JSObject that the code assumes is a function object. It is 
non-null but not a valid JSObject. It looks like pretty random memory contents. 
I'll dig further.
I see this crash every time with an existing profile. However, not that
frequently with a *new* profile!!!
Here are the files we are backing out right now if others want to try too:

cvs update -j1.19.4.1 -j1.19 mozilla/js/src/liveconnect/jsj_JSObject.c
cvs update -j1.14.30.1 -j1.14 mozilla/js/src/liveconnect/jsjava.h
cvs update -j1.28.2.1 -j1.28 mozilla/modules/oji/src/lcglue.cpp
cvs update -j1.3.54.1 -j1.3 mozilla/modules/oji/src/lcglue.h
cvs update -j1.8.2.1 -j1.8 mozilla/modules/oji/src/nsCSecurityContext.cpp
cvs update -j1.4.4.1 -j1.4 mozilla/modules/oji/src/nsCSecurityContext.h

You may need to copy JAVA over to your plugins directory in your debug build...
From my point of view I want to know why that particular nsJSDOMEventListener 
object has an mHandler that is non-null and not a valid JSObject. Was there a 
rooting problem? Or what?
mscott, I'm seeing this on a built w/o Java. I'm not clear on your rationale.
when I backed out the files mscott mentions, the crash went away.

My crash didn't get into js_GetSlotWhileLocked, but it did get into here.

 JS_GetPrivate(JSContext * 0x030081d0, JSObject * 0x00d88ca0) line 1797 + 10 
bytes
nsScriptSecurityManager::GetFunctionObjectPrincipal(nsScriptSecurityManager * 
const 0x0213b050, JSContext * 0x030081d0, JSObject * 0x00d88ca0, nsIPrincipal * 
* 0x0012ee68) line 841 + 14 bytes
nsScriptSecurityManager::CheckFunctionAccess(nsScriptSecurityManager * const 
0x0213b050, JSContext * 0x030081d0, void * 0x00d88ca0, void * 0x00d17150) line 
609 + 44 bytes
jband: mHandler is not rooted.  That's reported in bug 31847, which was futured 
(!).

/be
jband, can you rever the files I listed above and see if you still crash? Your
stack trace is slightly different from the one we are fixing in this bug so
maybe there's another problem somewhere?

That is true.  We have not fixed the rooting problem.  But in general we have 
not been running into it in normal usage.  The cases where we have run into it 
have generally been tracked back to someone doing something they shouldn't have 
been doing.  That was the rationale for futuring.
I'm crashing 75% of the time on mac, in PR_Lock called from js_InitContextForLocking

setting platform to all. existing profile, 10/31/00-08 branch build.
OS: Windows 2000 → All
Hardware: PC → All
mscott: you can get crashes in several places in the JS engine by feeding its 
API a bogus JSObject pointer.  That's what will happen in some cases (likelier 
now with Jeff Dyer's fix to bug 53849, although I don't have a complete theory 
for why this isn't showing up in trunk builds) due to bug 31847.

I don't think any content has to misbehave to result in a crash due to 31847; I 
think we futured that bug by making a wish based on the low frequency observed 
at the time.  We have high frequency now!  I'm betting there is no "other bug" 
here, just Jeff's valid security fix to 53849 in conjunction with 31847.

/be
mscott, mine is not so different. js_CompareAndSwap isa inlined and has asm. I 
wouldn't expect to see it in talkback. If we have a bad handler, then various 
events could hit it.

I fear voodoo in backing out changes that *may* have litle to do with thge real 
problem. I'm also not certain how predicatably I can reproduce this.
If bug 31847 is biting, then the new problem could be in xul or js files. 
We should look for multiple event handlers for the same event.
Anyone with strict JS warnings enabled see warnings about "redeclaration of 
function foo"?  That could tickle 31847.

/be
Per instructions from PDT, I've backed this files out on the branch for a respin
so QA can get a build that doesn't crash. I'm going to leave this open because
it sounds like jeff's code isn't at fault, it's just opened us up to seeing
31847 more often then we were. (Tell me if I'm misunderstanding that part)....

FWIW, The jar entry that I see loading last before the event is for 
"skin/classic/global/print.gif"

I can't figure out how to tell which file the errant event handler is in.
I'm still building my commericial branch build but I notice ben's checkin into
ns/xpfe/browser/resources/content/keywords.js has

addEventListener("load", toggleKeywordMenu, true);

in it.  Could someone try that?

Ugh! I didn't dig through the ns tree changes!

The case I'm seeing *is* an NS_PAGE_LOAD event
Ya joki!

With that line commented out it starts up fine 3 times in a row. With the line 
in it crashes three times. We have a winner!
I say you guys that saw the Java change do something should try putting that 
back in and commenting out this setting of the event handler and see what you 
get.
<jgrm via email re: jband@netscape.com  2000-10-31 01:26> 
But those crashes were with the 2000102709 trunk build (all from 
timeless@bemail.org -- perhaps he could fill us in on how he was doing 
this (something about a panel from scc "nsEngineer"?)).

Interesting bug. per doron: Changing component, reassigning. Per /All fixing 
summary.
Assignee: rogerl → dveditz
Component: Javascript Engine → Installer: XPInstall Engine
QA Contact: pschwartau → jimmylee
Summary: crash on startup -- 10/30 branch build on win32. → crash on startup -- 10/30 branch build
erm sorry, half of my changes were for the wrong bug. *undo*
Assignee: dveditz → rogerl
Component: Installer: XPInstall Engine → Javascript Engine
QA Contact: jimmylee → pschwartau
I backed out the changes to js and modules/oji, but that did not help in my case
- still crash on startup. When I also commented out:

ns/xpfe/browser/resources/content/keywords.js

addEventListener("load", toggleKeywordMenu, true);

I can start ok. I get two assertions (one I had before).

I can also open Help > About Netscape and I can open a new window doing Ctrl+n.

Next I will try putting back the changes in js and modules/oji.
ok, I think the oji code might have been a red herring though I swear that 
backing it out made it work for me.

Backing out the keywords line also made it work for me. We probably backed out 
the wrong code. Sorry about that.
linux respin 2000-10-31-12-MN6

still crashes
Okay, I guess we jumped on a red herring. Jonathan, you mine as well stop the
respins since they still have the problem. Looks like jband and joki have the
right fix. Do you want them to check it in right now and restart the process?
I have now tested this on NT & Linux. The only thing that has effect on my
builds is the one line in keywords.js.
I'm noting the dependency, even if a workaround is to back out ben's xpfe change 
(or modify it somehow to dodge bug 31847).

/be
Depends on: 31847
It's really probably more than a dependency.  Other than the location where it 
happened its really a dupe of the exact same issue.  I agree we should be able 
to modify ben's fix to not cause this but I'm not familiar enough with the xul 
hierarchy to know which functions may override.
joki: agreed, but I didn't want to resolve it as a dup and make it drop off the 
radar.  Cc'ing ben, who has missed out on all the fun.

/be
Sorry I'm late to the scene of the crime. Now that the mystery is solved, what 
is the plan for putting the fix for 53849 back into the branch? I'm happy to do 
it.
-jd
What bug number is for ben's one-line xpfe change that is referred to in 
previous comments? Thanks.
Lisa: the bug # for that is 54782.

Jeff, I'll check your changes back into the branch again once PDT starts taking
in limbo bugs again. 
I'm confused: shouldn't Jeff's changes go back in for any re-spin that also 
"fixes" Ben's xpfe code in order to work around 31847?  Or do Jeff's changes 
have to wait for a later respin, and risk not getting into rtm, just because 
they seemed to be associated with this crash, at first?

/be
adding topcrash keyword and [@ js_GetSlotWhileLocked] for tracking.  sorry for 
the spam.
Keywords: topcrash
Summary: crash on startup -- 10/30 branch build → crash on startup -- 10/30 branch build [@ js_GetSlotWhileLocked]
For accuracy's sake, what is the proper component for this bug? It isn't 
JS Engine - should it be DOM 0, for instance, or one of the XP components?
Making this a virtual dup of 31847.

/be
Assignee: rogerl → joki
Component: Javascript Engine → DOM Level 0
jrgm and I looked at this and we found that the crash was caused by keywords.js 
being loaded twice (once in navigator.xul, and once in navExtraOverlay.xul on 
the commercial side) and hence the load listener added twice. This doesn't 
happen on all machines. I've reproduced it on my HP Kayak as has John, but I 
could not reproduce this on either my dell P210 or my Inspiron notebook. (W2K)

keywords.js shouldn't ever be loaded by any mozilla chrome, so the fix for this 
particular crash symptom for all source trees is to remove reference to 
keywords.js from navigator.xul. I am preparing a patch (it's a one-liner ;) 
right now.
Ben, any idea on how keywords.js was being loaded twice?  Sounds like a race in 
XUL content/document code to fetch a <script src=> from a main XUL file and one 
of its overlays.  Cc'ing waterson.

/be
it probably doesn't mean anything, but r=blake
I wrote:
>Ben, any idea on how keywords.js was being loaded twice?

which is a stupid question -- what I meant was: any idea on how that file was 
sourced once in some cases, twice in others?  Were the machines where it seemed 
to load once all faster processors?  Dual processors?  Etc.

jrgm, feel free to chime in.  Thanks,

/be
Dell system upon which this works:

Precision 210 Workstation, Pentium III-500MHz, 256MB RAM
Dell Inspiron 5000 Workstation, Pentium III-500MHz, 128Mb RAM

fails on:
Hewlett Packard Kayak, Pentium III-500MHz, 128Mb RAM. 
Macintosh G3-450, 128Mb RAM, OS8.6
going to do some more investigation to see exactly what happens and when...
It would be sourced twice in all cases for a commercial build (or should be), 
just the timing would differ.

My Kayak is the same MHz/MB as ben's, but my Precision 210 for linux is the 
same MHz, but 128MB memory (if that is somehow a factor). 

At any rate, r=jrgm for the effectiveness of the patch; works for me, and that 
line was redundant in the commercial build. (But why is 'keywords.js' in the 
mozilla tree if it's not supposed to be part of the mozilla build?).
Uh, two be clear: I crashed on both my Linux Precision 210, and Win2k HP Kayak
(both single processor, 500MHz, 128MB).
for the record, it crashed on startup for me on my Dell Dimension 4100 PIII-
933MHz, 256MB RAM without this patch.
Some more notes:

I'm seeing this now in optimised builds on my /P210/ (which still doesn't show 
it in *debug* builds)... still not seeing it in either on my notebook. 

Adding dump() statements to my code shows this:

*** added load listener  (this is me)
creating new nsJSAimChatRendezvous
*** added load listener  (this is me)
Net2Phone.js has been interpreted...
(talkback window appears)
*** firing load handler  (this is me)
*** firing load handler  (this is me)
(Application error dialog appears)

the sequencing of the talkback window appearing there seemed weird to me, but 
that could just be the way it works...
58693 filed on the keywords.js in the mozilla trunk issue. I'll cvs remove the 
file from the trunk when I get a chance.
I don't see this anymore, since using 10-31-14 MN6 candidate build on Win98.
this is because my change allow the keyword menu to be removed was backed out. 
The patch here will allow my keyword menu stuff to be landed again. 
a=hyatt
and r=ben (obviously)
rtm++, please checkin ASAP so we can build today.
Whiteboard: [rtm++]
the startup bug has been worked around on the branch (checked in my patch to
remove the duplicate file loading). The real bug still exists however. 
The real bug is bug 31847, so can we now DUP this against that bug?  Will that 
mess with anyone's verification procedures?

/be
the symptom described in this bug is actually fixed by my checkin. See
dependency for the real issue. duping, as you wish. 

*** This bug has been marked as a duplicate of 31847 ***
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
Verifying as duplicate -
Status: RESOLVED → VERIFIED
Crash Signature: [@ js_GetSlotWhileLocked]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: