Closed Bug 57096 Opened 25 years ago Closed 25 years ago

Opening PSM crashes browser [@ js_GC]

Categories

(Core Graveyard :: Security: UI, defect, P1)

1.0 Branch
x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jband_mozilla, Assigned: brendan)

Details

(Keywords: crash, topcrash, Whiteboard: [rtm++])

Crash Data

Attachments

(3 files)

The browswer crashes in js_GC when I choose the menu item: Tasks :: Privacy and Security :: Personal Security Manager Forcing the DOM to do a full GC makes this more reproducable: Index: nsJSEnvironment.cpp =================================================================== RCS file: /cvsroot/mozilla/dom/src/base/nsJSEnvironment.cpp,v retrieving revision 1.117 diff -u -r1.117 nsJSEnvironment.cpp --- nsJSEnvironment.cpp 2000/10/14 02:26:36 1.117 +++ nsJSEnvironment.cpp 2000/10/18 02:54:49 @@ -1297,13 +1297,15 @@ mTerminationFunc = nsnull; } + /* mNumEvaluations++; if (mNumEvaluations > 20) { mNumEvaluations = 0; ::JS_MaybeGC(mContext); } - +*/ + GC(); mBranchCallbackCount = 0; return NS_OK; I'll attach a stack.
Attached file stack of crash
brendan, The unmatched JS_PushArguments is the one at: http://lxr.mozilla.org/seamonkey/source/extensions/psm-glue/src/nsPSMUICallbacks .cpp#118 It ends up at the place where we crash before the C stack returns out to this frame and gets to the JS_PopArguments call; i.e. we're servicing an event from the event loop. Also note that this code decides at runtime whether to use a modal or non-modal dialog. Even If I force the use of the modal dialog I get the crash if I click anywhere. I'll attach a file containing the outer and inner stacks.
Summary: Opening PSM crashed browser → Opening PSM crashes browser
Attached file outer and inner stacks
Some data points... - By disabling the GC calls I confirmed that it does in fact eventually wind out and call JS_PopArguments as expected. We didn't blow past the C stack or anything odd like that. - Even though it calls ParentWindow->Open (which calls GlobalWindowImpl::Open then OpenInternal with the PR_FALSE 3rd arg to indicate non-modal I think), inside OpenInternal it sets the modal flag and creates a modal dialog. I'm not sure why this happens. But non-modality does not seem to be part of the problem. - There are other calls to nsJSContext::GC before this unwinds out of the modal dialog. I'm not clear why our call from xpconnect is the problem. The explicit GC calls all have cx->fp = null but cx->stackHeaders == null too. - When running in the debugger from the place the JS_PushArguments is called until the actual call to ShowModal inside OpenInternal I see the main window come to the front and repaint. So, something is allowing at least some Windows messages though in some intervening call where I'd think it wouldn't. This is before the modeal dialogs nested event loop. This could be bad. - The xul window code that actually runs the nested event loop pushes null onto the thread context stack before processing events. - The args got pushed onto the same JSContext that we later crash on. - The state of the AutoPushCompatibleJSContext in xpconnect indicates that we got the JSContext from GetSafeJSContext (i.e. the hidden window) rather than finding a JSContext by peeking on the context stack. Brendan, I can reproduce this at will on the machine in my cube. We can debug if you are going to be around on Wed (assuming you haven't already solved this :)
Keywords: crash
Priority: P3 → P1
Thanks for digging into this after I had to leave last night, jband. This is another old bug tickled by the big change to fix bug 49816. JS_GC should not be freeing all arenas pooled in cx->stackPool if (!cx->fp) -- cx->stackHeaders may point into the first arena, due to the "lower" native call to js_AllocStack. Patch coming up. /be
Assignee: ddrinan → brendan
Cc'ing code buddies and jpatel, who may be able to help add the topcrash keyword to this bug, based on jband's stacktraces. /be
Status: NEW → ASSIGNED
Seth, this is the "this is it" bug I mentioned in bug 57070, in reply to your comments there. r= and a= solicitation, with a bullet. /be
a=jband Oh yeah! Looks right. Fixes my crash. I like the idea of testing with gc getting forced after every script eval. I'm going to set up an #ifdef and leave that on in my builds for testing.
Looking to give r=... but - could you give me/point me to a little tutorial on how we allocate stack slots? (Why an arena? If we do lots of up/down twiddling to conserve stack space, why are we using arenas?)
Yeah, I can see how this would cause us stress. Seems like this code didn't anticipate JS_PushArguments-style stack allocation from outside of function calls proper. r=shaver
Why an arena? To amortize mallocs. There is a high price paid to malloc lots of little pieces of memory, both in cycles and in per-allocation heap overhead. The JS engine allocates stack for a script it's about to interpret by doubling the script's depth, but the resulting number of slots is still small on average. Going through malloc for each such allocation is bad on its face. Then there is the XPConnect case seen here: jband and I noticed 3 slots being allocated to call an XPCOM interface method implemented in JS, 2 for [fun, this] and 1 for the single argument. So, we amortize malloc costs with arenas. And the GC is exact, so it finds parts of stack containing roots by consulting the JSStackFrame list in each context. Each JSStackFrame contains bounded pointers (pointers and counts, generally) into the stack arenas, for argv, vars, and operands (the active part of the "upper" half of that doubled-depth allocation done by js_Interpret that I mentioned above). But what if native code (say, XPConnect, or a TryValueOf call internal to the engine), wants to call into the JS engine and allocate some stack to hold [fun, this, argv...]? It uses the "friend" API, js_AllocStack. But there is no JSStackFrame pushed for the GC to inspect to find the would-be roots. Using JS_AddRoot again is too costly in terms of mallocs and cycles. So an "extra" stack segment is allocated to hold not only [fun, this argv...] but a stack header that describes the three or more slots allocated (JSStackHeader). The JSStackHeader push-down list is maintained by the top-most stack header pointer, cx->stackHeaders, in each context. There is yet another optimization here, to avoid wasting 2 slots for every 3 (assume most calls have only 1 arg; you will find more, and I haven't measured the mean and variance, but it's small enough that burning 2 out of 4 to 2 out of 7 slots total on JSStackHeaders did not seem good). And that is to notice, in js_AllocStack, when cx->stackHeaders points to slots on the very top of the stack, by looking at the top stack arena. In that case, an existing stack segment is extended, rather than a new header and segment being allocated (if possible -- one may still be too close to the end of the arena!). js_FreeStack has matching logic to retract an extended segment rather than to pop it off cx->stackHeaders. /be
shaver: the JS_GC code did not get updated to know about cx->stackHeaders owning stack space in addition to cx->fp (when I name those pointers here, I mean also the headers and frames linked via down pointers from them). But rather than test if (!cx->fp && !cx->stackHeaders), I thought it better and clearer to test that the stackPool arena pool had been released so that its current pointer was set to its empty (first) header-arena. /be
Fix in trunk, adding [rtm+]. Dollars to donuts, this is the cause of topcrash #6 (in js_GC, loading from a cx->stackHeaders pointer that dangles at free or recycled memory). /be
Keywords: topcrash
Whiteboard: [rtm+]
rtm++
Whiteboard: [rtm+] → [rtm++]
Brendan - thanks for the good tutorial. Arenas make sense here, I was just more used to thinking about them as sources of small fixed-size objects. Next question - why does the supplied patch fix it? Is there something special about being in the first stack pool? How does being in the first pool indicate arena-doneness, won't this change based on the (arbitrary?) number we picked for arena pool size?
Arenas are good for variable length allocations too, at some cost due to waste (internal fragmentation) when requests leave an unusably small free span at the end of an arena. "Recyclers" are fixed-sized allocators, which may or may not be LIFO. Arena pools are LIFO with a vengeance -- the last item allocated is freed first, but you can free everything (from "first" to last, see below) in one fell swoop, if you want to. JS uses them both ways: stacks that mark and release; compiler temporaries that are freed en masse. JSArenaPool.first is a dummy arena, allocated inside of JSArenaPool, and having zero net (payload) length. If current points to it, no busy arenas are in LIFO use (although many idle ones may be chained off of first.next, waiting to be reused; hence JS_FinishArenaPool, which frees them all back to the malloc heap). In this bug, the "lower" js_AllocStack marked the end of cx->stackPool.first (or its beginning, if you prefer, as it is zero-net-length), and the matching lower js_FreeStack will free all arenas and reset current to &first, but only after we've unwound from the "upper" XPConnect call, which calls JS_GC -- and that JS_GC will find current != &first (with the fix-patch applied), and will not zap stackPool out from under the "lower" call that's still pointing into it via cx->stackHeaders. Fix in branch too. /be
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Verified fixed in the 10/23 commercial branch WinNT build.
Status: RESOLVED → VERIFIED
Summary: Opening PSM crashes browser → Opening PSM crashes browser [@ js_GC]
Product: PSM → Core
Version: psm1.01 → 1.0 Branch
Crash Signature: [@ js_GC]
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: