Closed Bug 57096 Opened 24 years ago Closed 24 years ago

Opening PSM crashes browser [@ js_GC]

Categories

(Core Graveyard :: Security: UI, defect, P1)

1.0 Branch
x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jband_mozilla, Assigned: brendan)

Details

(Keywords: crash, topcrash, Whiteboard: [rtm++])

Crash Data

Attachments

(3 files)

The browswer crashes in js_GC when I choose the menu item: 

Tasks :: Privacy and Security :: Personal Security Manager

Forcing the DOM to do a full GC makes this more reproducable:

Index: nsJSEnvironment.cpp
===================================================================
RCS file: /cvsroot/mozilla/dom/src/base/nsJSEnvironment.cpp,v
retrieving revision 1.117
diff -u -r1.117 nsJSEnvironment.cpp
--- nsJSEnvironment.cpp 2000/10/14 02:26:36     1.117
+++ nsJSEnvironment.cpp 2000/10/18 02:54:49
@@ -1297,13 +1297,15 @@
     mTerminationFunc = nsnull;
   }

+  /*
   mNumEvaluations++;

   if (mNumEvaluations > 20) {
     mNumEvaluations = 0;
     ::JS_MaybeGC(mContext);
   }
-
+*/
+  GC();
   mBranchCallbackCount = 0;

   return NS_OK;


I'll attach a stack.
Attached file stack of crash
brendan,

The unmatched JS_PushArguments is the one at:
http://lxr.mozilla.org/seamonkey/source/extensions/psm-glue/src/nsPSMUICallbacks
.cpp#118

It ends up at the place where we crash before the C stack returns out to this 
frame and gets to the JS_PopArguments call; i.e. we're servicing an event 
from the event loop.

Also note that this code decides at runtime whether to use a modal or non-modal 
dialog. Even If I force the use of the modal dialog I get the crash if I click 
anywhere.

I'll attach a file containing the outer and inner stacks.
Summary: Opening PSM crashed browser → Opening PSM crashes browser
Attached file outer and inner stacks
Some data points...

- By disabling the GC calls I confirmed that it does in fact eventually wind out 
and call JS_PopArguments as expected. We didn't blow past the C stack or 
anything odd like that.

- Even though it calls ParentWindow->Open (which calls GlobalWindowImpl::Open 
then OpenInternal with the PR_FALSE 3rd arg to indicate non-modal I think), 
inside OpenInternal it sets the modal flag and creates a modal dialog. I'm not 
sure why this happens. But non-modality does not seem to be part of the problem.

- There are other calls to nsJSContext::GC before this unwinds out of the modal 
dialog. I'm not clear why our call from xpconnect is the problem. The explicit 
GC calls all have cx->fp = null but cx->stackHeaders == null too.

- When running in the debugger from the place the JS_PushArguments is called 
until the actual call to ShowModal inside OpenInternal I see the main window 
come to the front and repaint. So, something is allowing at least some Windows 
messages though in some intervening call where I'd think it wouldn't. This 
is before the modeal dialogs nested event loop. This could be bad.

- The xul window code that actually runs the nested event loop pushes null onto 
the thread context stack before processing events.

- The args got pushed onto the same JSContext that we later crash on.

- The state of the AutoPushCompatibleJSContext in xpconnect indicates that we 
got the JSContext from GetSafeJSContext (i.e. the hidden window) rather than 
finding a JSContext by peeking on the context stack.

Brendan, I can reproduce this at will on the machine in my cube. We can debug if 
you are going to be around on Wed (assuming you haven't already solved this :)
Keywords: crash
Priority: P3 → P1
Thanks for digging into this after I had to leave last night, jband.  This is
another old bug tickled by the big change to fix bug 49816.  JS_GC should not be
freeing all arenas pooled in cx->stackPool if (!cx->fp) -- cx->stackHeaders may
point into the first arena, due to the "lower" native call to js_AllocStack. 
Patch coming up.

/be
Assignee: ddrinan → brendan
Cc'ing code buddies and jpatel, who may be able to help add the topcrash keyword
to this bug, based on jband's stacktraces.

/be
Status: NEW → ASSIGNED
Seth, this is the "this is it" bug I mentioned in bug 57070, in reply to your
comments there.

r= and a= solicitation, with a bullet.

/be
a=jband

Oh yeah! Looks right. Fixes my crash.

I like the idea of testing with gc getting forced after every script eval. I'm 
going to set up an #ifdef and leave that on in my builds for testing.
Looking to give r=... but - could you give me/point me to a little tutorial on
how we allocate stack slots?

(Why an arena?  If we do lots of up/down twiddling to conserve stack space, why
are we using arenas?)
Yeah, I can see how this would cause us stress.  Seems like this code didn't
anticipate JS_PushArguments-style stack allocation from outside of function
calls proper.

r=shaver
Why an arena?  To amortize mallocs.  There is a high price paid to malloc lots
of little pieces of memory, both in cycles and in per-allocation heap overhead. 
The JS engine allocates stack for a script it's about to interpret by doubling
the script's depth, but the resulting number of slots is still small on average.
Going through malloc for each such allocation is bad on its face.  Then there is
the XPConnect case seen here: jband and I noticed 3 slots being allocated to
call an XPCOM interface method implemented in JS, 2 for [fun, this] and 1 for
the single argument.

So, we amortize malloc costs with arenas.  And the GC is exact, so it finds
parts of stack containing roots by consulting the JSStackFrame list in each
context.  Each JSStackFrame contains bounded pointers (pointers and counts,
generally) into the stack arenas, for argv, vars, and operands (the active part
of the "upper" half of that doubled-depth allocation done by js_Interpret that I
mentioned above).

But what if native code (say, XPConnect, or a TryValueOf call internal to the
engine), wants to call into the JS engine and allocate some stack to hold [fun,
this, argv...]?  It uses the "friend" API, js_AllocStack.  But there is no
JSStackFrame pushed for the GC to inspect to find the would-be roots.  Using
JS_AddRoot again is too costly in terms of mallocs and cycles.  So an "extra"
stack segment is allocated to hold not only [fun, this argv...] but a stack
header that describes the three or more slots allocated (JSStackHeader).

The JSStackHeader push-down list is maintained by the top-most stack header
pointer, cx->stackHeaders, in each context.  There is yet another optimization
here, to avoid wasting 2 slots for every 3 (assume most calls have only 1 arg;
you will find more, and I haven't measured the mean and variance, but it's small
enough that burning 2 out of 4 to 2 out of 7 slots total on JSStackHeaders did
not seem good).  And that is to notice, in js_AllocStack, when cx->stackHeaders
points to slots on the very top of the stack, by looking at the top stack arena.
In that case, an existing stack segment is extended, rather than a new header
and segment being allocated (if possible -- one may still be too close to the
end of the arena!).

js_FreeStack has matching logic to retract an extended segment rather than to
pop it off cx->stackHeaders.

/be
shaver: the JS_GC code did not get updated to know about cx->stackHeaders owning
stack space in addition to cx->fp (when I name those pointers here, I mean also
the headers and frames linked via down pointers from them).  But rather than
test if (!cx->fp && !cx->stackHeaders), I thought it better and clearer to test
that the stackPool arena pool had been released so that its current pointer was
set to its empty (first) header-arena.

/be
Fix in trunk, adding [rtm+].  Dollars to donuts, this is the cause of topcrash
#6 (in js_GC, loading from a cx->stackHeaders pointer that dangles at free or
recycled memory).

/be
Keywords: topcrash
Whiteboard: [rtm+]
rtm++
Whiteboard: [rtm+] → [rtm++]
Brendan - thanks for the good tutorial.  Arenas make sense here, I was just more
used to thinking about them as sources of small fixed-size objects.

Next question - why does the supplied patch fix it?  Is there something special
about being in the first stack pool?  How does being in the first pool indicate
arena-doneness, won't this change based on the (arbitrary?) number we picked for
arena pool size?
Arenas are good for variable length allocations too, at some cost due to waste
(internal fragmentation) when requests leave an unusably small free span at the
end of an arena.  "Recyclers" are fixed-sized allocators, which may or may not
be LIFO.  Arena pools are LIFO with a vengeance -- the last item allocated is
freed first, but you can free everything (from "first" to last, see below) in
one fell swoop, if you want to.  JS uses them both ways: stacks that mark and
release; compiler temporaries that are freed en masse.

JSArenaPool.first is a dummy arena, allocated inside of JSArenaPool, and having
zero net (payload) length.  If current points to it, no busy arenas are in LIFO
use (although many idle ones may be chained off of first.next, waiting to be
reused; hence JS_FinishArenaPool, which frees them all back to the malloc heap). 

In this bug, the "lower" js_AllocStack marked the end of cx->stackPool.first (or
its beginning, if you prefer, as it is zero-net-length), and the matching lower
js_FreeStack will free all arenas and reset current to &first, but only after
we've unwound from the "upper" XPConnect call, which calls JS_GC -- and that
JS_GC will find current != &first (with the fix-patch applied), and will not zap
stackPool out from under the "lower" call that's still pointing into it via
cx->stackHeaders.

Fix in branch too.

/be
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Verified fixed in the 10/23 commercial branch WinNT build.
Status: RESOLVED → VERIFIED
Summary: Opening PSM crashes browser → Opening PSM crashes browser [@ js_GC]
Product: PSM → Core
Version: psm1.01 → 1.0 Branch
Crash Signature: [@ js_GC]
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: