Closed Bug 40757 Opened 24 years ago Closed 20 years ago

JS GC safety model (requests, newborns, local roots) hard to use

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla1.8alpha4

People

(Reporter: brendan, Assigned: brendan)

References

Details

Attachments

(1 file, 7 obsolete files)

"Users" below means jsapi.h users.

Users must hand off newborns to local roots.  I've seen too many forget, or do 
it after some amount of code that might call another constructor and overwrite 
the newborn, then call the GC.

Users must allocate and use local roots.

Users must use JS_Begin and JS_EndRequest around maximal non-blocking sequences 
of JS API and native code.  Again, it has proven hard for some users to spot the 
preemption points.

A better model would auto-allocate local roots each time there is a new birth, 
obviating the need for newborns.  This is what JNI, based on warren's JRI, does. 
Then you have to worry about local root bloat (say you are constructing 100,000 
objects in a loop, storing each in a rooted object as you go), so you need some 
DeleteLocalRoot primitive.

This model could be used for unrooted atoms too, inside the engine.  Then we 
wouldn't need to keep the GC from running when any thread is in a JS API call 
that atomizes, and in particular in the compiler.

/be
Status: NEW → ASSIGNED
Keywords: js1.5
Target Milestone: --- → M16
The Request API doesn't detect that the inner request should not try to wait for
the GC to finish, but it should -- otherwise you can easily deadlock cuz the GC
may be waiting for the outer request to end.

But this just points out that nested requests, one or more contexts, one thread
are unnecessary.  Once you know a context is in a request on this thread, you
can safely call the API using any context.  That's because requests are required
only to keep the GC at bay, so any context's request will do if you're in one.

scullin@geocast.com already bore some brunt here, so I think we should fix
JS_BeginRequest to detect whether any context is in a request on this thread. 
We could search the runtime's context list for other contexts with the same
thread-id in cx->thread, and with non-zero cx->requestDepth.  The GC does that
already, in an attempt to ignore such nested requests when the innermost one
calls into the GC.  But the GC can afford linear search and quadratic growth
rates over the number of contexts -- can JS_BeginRequest?

/be
Moving out, will fix soon. I hope js1.5 wraps up in M17.

/be
Target Milestone: M16 → M17
Priority: P3 → P1
Blocks: 39125
I think eliminating the request model implies either stop-all-threads as an NSPR
primitive, or else incremental GC.  I don't think NSPR supports stop-all-threads
any longer, it was not portable (where it was claimed to work, it was buggy in
ways that were unfixable without more OS changes).  Wtc, do you remember more of
this history from the old days of the netscape.com Java VM?

So incremental GC is the holy grail: then, with a write-barrier inside the
set-slot primitive for objects, we could let the GC do some work while scripts
ran and mutated objects not in the generation being scanned.  The GC would have
to respect object locks by not taking them (else it could deadlock, unless the
GC lock was never needed by a thread while that thread held an object locked). 
More in a bit.

/be
PR_SuspendAll and PR_ResumeAll are no longer supported.
Pthreads doesn't have functions to suspend and resume a
thread.  One would need to use either non-portable API
or a technique based on signals to suspend and resume a
pthread.
Not likely to happen in M18 timeframe.

/be
Target Milestone: M17 → M20
Target Milestone: M20 → mozilla1.0
Well, js1.5 needs to go out soon (almost there), but this bug is targeted at
mozilla1.0, which is next year.  So removing js1.5.  If someone needs this in
js1.5, mozilla0.9, etc. please set the keyword back.

/be
Keywords: js1.5
Lowering priority in light of other needs.

/be
Priority: P1 → P3
I can't take the guilt any more.

(And I'm working on incremental/generational GC.)
Assignee: brendan → shaver
Status: ASSIGNED → NEW
Priority: P3 → P2
*** Bug 77636 has been marked as a duplicate of this bug. ***
The debugger isn't going to do much of anything (except crash) until something
happens here, or we check in the bug 77636's patch.
I don't think I want to change the local-root model for 1.0, even if I thought I
would be able to get it done.
Target Milestone: mozilla1.0 → mozilla1.1
Taking for 1.5.

/be
Assignee: shaver → brendan
Target Milestone: mozilla1.1alpha → mozilla1.5alpha
Status: NEW → ASSIGNED
Target Milestone: mozilla1.5alpha → mozilla1.6alpha
Target Milestone: mozilla1.6alpha → mozilla1.7alpha
I'm not sure this is worth hacking on, as opposed to Narcissus.  Someone else
should feel free to step up.  I'll advise, given a righteous hacker who wants to
do this.

/be
Target Milestone: mozilla1.7alpha → Future
Needed for E4X.  Patch in a moment.

/be
Target Milestone: Future → mozilla1.8alpha4
Attached patch scoped local roots (obsolete) — Splinter Review
Shaver, here it is, broken out of the impending gigundo e4x patch.

/be
Notice that JS_LeaveLocalRootScope eagerly frees cx->localRootStack, so there is
the possibility of malloc churn here.  We'd need an lrs->entryCount member to
know when an allocated, empty stack was active, implying js_AllocGCThing should
push on it rather than set cx->newborn.

/be
Attachment #156388 - Flags: review?(shaver)
The entryCount malloc churn fix I mentioned would add bloat, of course (264
bytes per context, net of malloc overhead).  Pick your poison.

Note that the patch contains a relocation of StartResolving and StopResolving
from jsobj.c to jscntxt.c, where they belong.  This anticipates the e4x patch
and helps reduce stress in that bug.

/be
Comment on attachment 156388 [details] [diff] [review]
scoped local roots

This looks pretty good, though I want to re-read in the AM.  How should we use
this to properly solve the atom-reclamation issue from bug 77636?

I also wonder about using an arena and resizing vector of marks, rather than
the individually-allocated lrs frames, but I'll ponder that some more.
I started using an arena pool, but the lack of strong typing made it more pain
than it was worth.  With the ad-hoc structure, you can count on the compiler to
do the alignment (no slop overallocated from malloc), you can index into
lrc->roots, and the result is much more readable.

More in a bit on atoms.  Note that last-ditch GCs from the engine itself do not
collect atoms (part of the patch in bug 39125), so it's only the debugger hook
call-outs that might run a full GC (bug 77636).

/be
Comment on attachment 156388 [details] [diff] [review]
scoped local roots

New patch coming.

/be
Attachment #156388 - Flags: review?(shaver)
Also, I fixed some comment typos and infelicitous wordings.  This looks good,
I'd like to get it into 1.8a4 when the trunk opens.

/be
Attachment #156388 - Attachment is obsolete: true
Also, commented js_LeaveLocalRootScope better.	Ready for review!  Oh, did I
say that before? ;-)  Looking for bugs myself is no fun, I want a code buddy.

/be
Attachment #156448 - Attachment is obsolete: true
Namely, this one.

/be
Attachment #156451 - Attachment is obsolete: true
Attachment #156452 - Attachment description: argh, last patch was a stale copy of the righ patch → argh, last patch was a stale copy of the right patch
Attachment #156452 - Flags: review?(shaver)
When mark != 0 but m == 0 by the bottom of the function.  This case is where a
scope just happens to claim the first element in a non-initial chunk (malloc'd
separately from the stack).  When leaving that scope, the saved lrs->scopeMark
stored at that lrc->roots[0] will be nulled, but without this patch's fix (at
the bottom of js_LeaveLocalRootScope), lrc would be wasted: the next push would
see that (lrs->rootCount & JSLRS_CHUNK_MASK) == 0, and lrs->rootCount != 0, so
it would allocate a new lrc and push it on top of the empty one.

Whew.  Clearly, I was trying to combine cases that can't be combined, given the
lrs->firstChunk special case, mixed with some off-by-one bugs in earlier
drafts.

/be
Attachment #156452 - Attachment is obsolete: true
Attachment #156452 - Flags: review?(shaver)
If m == 0, n != 0, and top != v, we are forgetting a root that's not on top of
the stack, so we have to search downward (top != v).  But (m == 0) we can't
start in the top chunk -- we must go down one.	Hence the lrc2 = (m == 0) ?
lrc->down : lrc statement before the while (--i > mark) loop.

I tightened up a few unsigned comparisons to use > instead of !=, which is safe
in these instances because predecessor code checks and excludes the <= case.

/be
Attachment #156454 - Attachment is obsolete: true
These patches all contain some unrelated cleanup, btw: JSStackFrame.objAtomMap
got in a while ago, and its #if 0'd code was removed or fixed for bug 169559,
but for some reason I forgot to yank the struct member, its initialization to
NULL, and its GC code.  Bleh.  Glad to be rid of it, finally.

/be
1.  Move the empty scope check in js_ForgetLocalRoot (called directly from the
JS_ForgetLocalRoot API) out of the 'if (top != v)'
non-topmost-value-being-popped hard case, so an API caller can't possibly trick
the code into popping a scope mark(!).

2.  Combine the 'lrc2 = (m == 0) ? lrc->down : lrc' statement before the loop
with the 'if (j == 0) lrc2 = lrc2->down' in the loop body, for tighter,
prettier code.	Diff the last patch against this one to see this more clearly.

/be
Attachment #156459 - Attachment is obsolete: true
Ok, this is testing well for me now, never mind mental execution.  Any more
easter egg bugs can come out in the wash.  It won't be used by any code till
e4x lands, so it's not gonna bust anyone.

/be
Attachment #156463 - Attachment is obsolete: true
Blocks: e4x
Comment on attachment 156469 [details] [diff] [review]
one-liner: fix js_EnterLocalRootScope to set cx->localRootStack (duh!)

OK, I finally ran through it all in my head.  Sorry for the overnight delay,
but I was too tired to do a good job with it last night when I got back.
r=shaver
Attachment #156469 - Flags: review+
It's unusual for the text of a JSMSG to end with a period.  Also, the NB in the
doc for JS_ForgetLocalRoot is a little misleading.  If you forget on an old
root, your recently allocated GC-things also become expensive to find due to the
swap before pop.
nallen: thanks, I don't know where that period came from, honest!  I purged a
few that crept into js.msg a while ago.  Looks like I missed the one at the end
of JSMSG_BAD_INDIRECT_CALL's string too.

Yeah, swapping instead of sliding down would preserve the strictly greater
efficiency of forgetting more recent local roots, but I'm not worried -- there
ought not be too many for O(n^2) problems to show up, or someone is failing to
root as they go (build a tree top down).  I'll change the comment to say more
about this, although the subtleties are probably lost in jsapi.h

/be
> Yeah, swapping instead of sliding down would preserve the strictly greater

Er, "sliding down instead of swapping ...."

Checked in.

/be
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Flags: testcase-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: