JS GC has O(bad) mark behaviour

VERIFIED FIXED in M18

Status

()

Core
JavaScript Engine
P3
normal
VERIFIED FIXED
18 years ago
7 years ago

People

(Reporter: alla, Assigned: brendan)

Tracking

({perf})

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(13 attachments)

35.62 KB, patch
Details | Diff | Splinter Review
30.38 KB, patch
Details | Diff | Splinter Review
2.53 KB, patch
Details | Diff | Splinter Review
40.45 KB, patch
Details | Diff | Splinter Review
34.33 KB, patch
Details | Diff | Splinter Review
34.64 KB, patch
Details | Diff | Splinter Review
8.92 KB, patch
Details | Diff | Splinter Review
63.34 KB, patch
Details | Diff | Splinter Review
60.74 KB, patch
Details | Diff | Splinter Review
65.90 KB, patch
Details | Diff | Splinter Review
63.61 KB, patch
Details | Diff | Splinter Review
102.19 KB, patch
Details | Diff | Splinter Review
103.31 KB, patch
Details | Diff | Splinter Review
(Reporter)

Description

18 years ago
When profiling the JavaScript GC gc_find_flags() shows up like a huge blimp on
the radar. It always take most of the GC time. 

Looking at this function shows it does linear searches in both the gc and the
flags arena, and this is a very sensitive function that is called for all live
objects in each GC call. Opening up a lot of windows and frames I've seen the
number of arenas go up to 50. Traversing on average 50 elements of a linked list
(25 for the gc arenas, 25 for the flags) for each kills the GC performance
totally.

There are several opportunities for optimization here. First the simple one. By
upping the GC_ARENA_SIZE from 8k to 64k I lowered the GC time on a frame
intensive page from about 300 to 135 miliseconds. This might waste extra memory,
and is mostly a coverup anyway, but it does make a real difference.

The real fix, which I'll be looking into is making the arena pools use arrays
instead of linked lists. These arrays will then be kept sorted which means
binary search can be used to look up from the gc arena pool. It will also make
it easier to make the mapping from gc arena to flags arena more explicit. This
means gc_find_flags can immediately map from a gc arena to the corresponding
flags arena, without any search at all.

Comments?
(Reporter)

Comment 1

18 years ago
Added brendan to CC list. I'd like to discuss this with him.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: perf
(Assignee)

Comment 2

18 years ago
JSArenas are the same (more or less) as PLArenas from NSPR.  Rather than change
them to use a sorted array instead of a linked list, which is not useful for all
the other JSArenas in the engine, I would favor a GC-only data structure change.
If you want to give this bug to me, I'll take it.  If you want to do the patch
development, I'll advise and code review.  Either way is good, I'm just coming
off bug 46703 and have some time.  Thanks,

/be
(Assignee)

Comment 3

18 years ago
My "you" in the last comment referred to Alex.  In any case, I'll relieve rogerl
of this bug for now.

/be
Assignee: rogerl → brendan
(Assignee)

Updated

18 years ago
OS: Linux → All
Target Milestone: --- → M18
(Assignee)

Updated

18 years ago
Status: NEW → ASSIGNED
I had thought that Alex's intent was to maintain the gcArenaPool and gcFlagsPool
arenas in a GC-specific array, rather than using the built-in chaining.  I could
be wrong, though.
(Assignee)

Comment 5

18 years ago
Alex wrote "making the arena pools use arrays instead of linked lists", and I
took him at his word, but maybe too literally.  My apologies if he was proposing
a GC-only solution, not a change to the JSArenaPool data structure.

The linear scan of the GC thing arena list (gcArenaPool) is necessary for a mark
and sweep GC.  The only optimizations here are (a) making the mark phase be
iterative rather than recursive (which would win on small-stack architectures);
and (b) moving to an incremental GC.

So I think it's gcFlagsPool that should change to use a sorted array of JSArenas
rather than being a vanilla JSArenaPool, or something like that.  Sound right?

/be
Is the linear scan in gc_find_flags necessary, though?  We've already identified
the thing whose flags we want, so it would seem like a with to be able to binary
search the gcArenaPool arenas -- and thereby identify the gcFlagsPool arena in
question without another linear scan as well.

If we had gcArenaArray and gcFlagsArray such that gcArenaArray[n] used
gcFlagsArray[n] for its flags, then we could quickly binary search gcArenaArray
to find n and offset-with-arena, and use that to quickly calculate flagp
(offsets will correlate between thing-in-thing-arena and flag-in-flag-arena,
right?).

No?  Iterative or incremental stuff would be sexy too, but this seems like a
relatively easy win right now.
(Assignee)

Comment 7

18 years ago
Shaver: you're right, I was hoping for some better way to organize things that 
avoids the sorted insertion cost (smallish, only for each new gcArenaPool arena 
allocated), and that beats binary search.  So far all I can think of is to page 
align arenas, and to burn a JSGCThing and a flag at the front of each kind of 
arena to hold a pointer from the thing-arena to its flags-arena.

How easy is it to allocate an aligned page on modern OSes?  I know about Irix, 
but it's dead.

/be
(Reporter)

Comment 8

18 years ago
What I would like to do is to combine the gcArenaPool and gcFlagsPool into *one*
(dynamically growing) array of structures. These structure would basically
contain two JSArenas (the gcthing one and the flags one). These structures would
be sorted on the gcthing arena base pointer. Then you can do a binary search on
the array to find both arenas in O(log(n)) time. Of course binary search
insertion makes new arena insertion slowed O(log(n)) to find where to insert and
O(n) to move the tail one step back in the array.

Unfortunately it seems I've got bound up with some device driver work. I don't
know how long it will take, but i guess some time. And since brendan knows this
code and seem to have time over for it I'll let him have it for now.
On Unixes, you can do it with mmap:
  mmap(NULL, n * getpagesize(), prot, MAP_ANON|MAP_PRIVATE, 0, 0);
should be page-aligned, I'm led to believe.

On the Mac and Windows, we might have to hack some slab allocator atop malloc(),
but I'm firing blind.  Platform experts?

Do we actually care about getpagesize()-alignment, or just that our arena is
naturally aligned, so that we can find the start of the arena in question by
masking off some bits?

(If we can pull this trick here, I wonder if there are other cases where we
could use arenas for objects that are numerous, small, and share common state by
stashing it in the top of an arena.  Fear the ghost of bina-arenas, but
still...)
(Reporter)

Comment 10

18 years ago
Ok, I'm a loser. I didn't understand what brendan planned, but shaver showed me
the light. You'd really want GC_ARENA_SIZE aligned arenas though, or you would
need to waste one gcthing per page. That would be very fast.

I almost came up with a trick that meant aligned arenas weren't needed, but it
broke on a corner case, and i can't fix it.
(Assignee)

Comment 11

18 years ago
Alex: hey, you're the winner who found this bug, and *I'm* the loser who
inflicted O(bad) on the GC way back when, when there were only four or five
arenas in each pool for a typical GC run in the "classic" Netscape browser (ah,
memories).

That reminds me: what benchmark did you run to cause so many JS GC things to be
allocated?  I thought Mozilla's DOM was lazy, and didn't make a JS object for
each DOM element.  Perhaps the advent of XBL is upping the ante.  When you say
"lots of windows and frames", can you give a number to "lots"?  Any particular
pages loaded in those windows and frames?  Thanks,

/be
(Reporter)

Comment 12

18 years ago
When I say many frames I mean many. There is a bug around that says something
about tearing down frames (or webshells, don't remember) taking to much time.
That bug contains a attached html page with 100 frames. I downloaded that to
disk and changed all frames to contain about:blank.
(Assignee)

Comment 13

18 years ago
Created attachment 13368 [details] [diff] [review]
prerequisite cleanup for js1.5 GC bugs (bug 38942 also)
(Assignee)

Comment 14

18 years ago
Created attachment 13369 [details] [diff] [review]
diff -wu format for code reviewers
(Assignee)

Comment 15

18 years ago
Could I have an r= on this cleanup patch?  It's been sitting in my tree, it
fixes layering bugs in the engine, and it includes jsgc.[ch], so I would like to
get it in before tackling this bug and bug 38942.  Thanks,

/be
+/* XXXbe begin move to jsapi.h ?! */
+extern void
+js_MarkGCThing(JSContext *cx, void *thing, void *arg);

I'm not sure about the debugging macros, whose publication could constrain us
later, but I think JS_MarkGCThing or a moral equivalent needs to be in jsapi.h,
if the mark JSObject op is to be useful to anyone.


+#define JSSCRIPT_FIND_CATCH_START(script, pc, catchpc)                        

What's this used for?
(Assignee)

Comment 17

18 years ago
I want to noodle on those XXXbe move to jsapi.h ?! comments.  Note well that
jsgc.h can always be included by those JSObjectOps implementors who implement
mark rather than stub it.  Note also that current JSObjectOps implementors must
use "friend" APIs, anyway, for the most part.  I will fix this, but not right
now (or before fixing this bug or bug 38942).

JSSCRIPT_FIND_CATCH_START hoists inline code out of jsinterp.c for sharing via
this macro between that file and jsjit.c (yay!).  Remember?

/be
(Assignee)

Comment 18

18 years ago
Shaver, jband: do you use any object with XPConnect-flavored JSObjectOps as a
global object in a context where JS_ClearScope would be called on that object?

/be
I do currently use an XPC object as a global, but I don't currently call
JS_ClearScope on it.
(Assignee)

Comment 20

18 years ago
Created attachment 13386 [details] [diff] [review]
xpcwrappednativejsops.cpp patch so mark and clear work on "native" objects that don't use js_ObjectOps

Comment 21

18 years ago
r=jband on the xpcwrappednativejsops.cpp patch (assuming that mark and clear get 
added). I have no code that uses xpcwrapped natives as globals. Callers could do 
that and they might expect to be able to call JS_ClearScope
(Assignee)

Comment 22

18 years ago
Shaver, jband: how about some r= love on the prerequisite patch in full?  Jband,
if you like the xpconnect sub-patch, you must like the mark and clear changes
elsewhere, at least a little.

/be
(Assignee)

Comment 23

18 years ago
Created attachment 13434 [details] [diff] [review]
new consolidated patch, with JS_MarkGCThing API
(Assignee)

Comment 24

18 years ago
Created attachment 13435 [details] [diff] [review]
diff -wu format version of last patch
(Assignee)

Comment 25

18 years ago
I added JS_MarkGCThing(cx, obj, name, arg) to the API, to be called by
implementors of JSObjectOps.mark who want to use only "jsapi.h".  The name is
some ASCII string that lives as long as the call to JS_MarkGCThing, used only by
GC_MARK_DEBUG code.  Likewise, arg is used only by GC_MARK_DEBUG but can be
passed blindly (without #ifdef code in the caller) from JSObjectOps.mark's third
same-named arg.

Anyone care to r=?

/be
(Assignee)

Comment 26

18 years ago
The essence of the attached patch is to fix code layering violations in the JS
engine: places where the JSObjectOps-independent layer had to know that it was
dealing with js_ObjectOps (via OBJ_IS_NATIVE) or similar ops.  It adds two ops:
mark and clear, and for the native ops that provide a simpler, lower-level table
of hooks called JSClass, JSClass.mark.

Mark allows each object ops-type or class to have private data containing GC
roots that it marks, without having to register those roots via JS_AddRoot or
JS_AddNamedRoot, and clean them up in finalize via JS_RemoveRoot.

Clear should remove all properties, enumerable or not, from the object, ensuring
that a subsequent has returns false and get returns undefined.  If a JSObjectOps
implementation doesn't provide a clear op, JS_ClearScope does nothing on an
instace having that ops-type.

To fix the real bug (O(n**3) or whatever it is), once this cleanup patch is in,
I'm going to take advantage of some numerology: with a GC-thing 8 bytes on most
platforms, and its flags 1 byte, you can allocate sizeof(JSArena)+8192+1024 as
an arena, and find the 1024-byte aligned 8k boundary within the 9k net payload.
(Watch out for a 1024-byte boundary in the middle of the JSArena header!)

Say that the boundary happens to be 640 bytes into the arena's payload.  You
therefore use the first 640 bytes as flags, then skip over 8k of space for the
GC-things, and have 384 bytes of flags.  Note that compared to the current code,
we're saving 16 bytes of arena header (the current scheme uses a parallel flags
arena for two total; here we use just one arena to allocate things and flags).

How do you find a thing's flags given its address?  You must burn one JSGCThing
sized record, call it a JSGCPageInfo, for every 1024 bytes of "thing page" in
the 1024-byte-aligned 8k thing sub-arena:

struct JSGCPageInfo {
    JSArena     *arena;		/* page's enclosing arena */
    uint8	*flags;		/* flags for this page */
};

Then given a valid void *thing, and a GC_PAGE_SIZE of 1024 and a GC_PAGE_MASK of
1023, you compute:

    JSGCPageInfo *pi;
    JSArena *a;
    uint8 *flagp;

    pi = (JSGCPageInfo *)((jsuword)thing & ~GC_PAGE_MASK);
    a = pi->arena;
    flagp = pi->flags + ((jsuword)thing & GC_PAGE_MASK);

    /* Do we have to skip over the 8k of things in the middle of a? */
    if (flagp >= (uint8 *)(((jsuword)a + GC_PAGE_MASK) & ~GC_PAGE_MASK))
	flagp += GC_ARENA_SIZE;

    return flagp;

That's the new, O(1) version of gc_find_flags.  It costs us one JSGCPageInfo
struct (8 bytes) every 1024 bytes of things, and of course we have to waste the
corresponding flag byte too.  There are 128 flag bytes per page of things.  So
we are wasting (16+8*8+8*1)/(16+9k) or .95% on overhead.  This compares to .34%
for the current two-arena scheme.

Comments?

/be
(Assignee)

Comment 27

18 years ago
Erratum:

    flagp = pi->flags + ((jsuword)thing & GC_PAGE_MASK) / sizeof(JSGCThing);

The divide of an unsigned int by a sizeof constant should be strength-reduced to
a shift right by 3 (for 8-byte JSGCThings).

I was typing this all into bugzilla's tiny textarea, but you get the idea.

/be

Comment 28

18 years ago
I like this scheme. The 1% overhead seems a small price to pay for O(1) lookup. 
Isn't this a type of card marking scheme?
(Reporter)

Comment 29

18 years ago
Nice work brendan. You're the man.
(Assignee)

Comment 30

18 years ago
Card-marking, IIRC, is a technique for implementing a write barrier in an
incremental GC, where you map the GC heap onto a smaller set of cards, somewhat
like my "thing page" idea.  You then set the card's bit (or byte if faster and
you can stand the space increase) for any write to a thing in the card.  This
helps find newly-written refs in old things (things in a previous increment or
generation) to otherwise unref'd new things that might wrongly be collected. 
You have to scan all things in the card, of course.

Hope I got that right from memory, pre-caffeine.

/be
(Assignee)

Comment 31

18 years ago
Ok, patch is in -- thanks to shaver for the r=.

On to the real fix, and other GC fixes.

/be
(Assignee)

Comment 32

18 years ago
Created attachment 13974 [details] [diff] [review]
big change for O(1) gc_find_flags, please review
(Assignee)

Comment 33

18 years ago
The js shell still passes the testsuite with this most recent patch.

The change to mark only JS stack from fp->spbase up to fp->sp should therefore
fix bug 27924, but I'm afraid it will reopen the unreported bug mentioned by
mccabe's and my comments in bug 39125, where the interpreter homes a "post-pop"
sp into fp while still using GC-things referenced by the popped stack values. 
I'm going to look hard at the interpreter's stack usage again.

/be
(Assignee)

Comment 34

18 years ago
Oops, firstpage computation in the DEBUG-only gc_root_marker code is broken in
that last patch.  It should be

            firstpage = (a->base + GC_FLAGS_SIZE) & ~GC_PAGE_MASK;

I've made a FIRST_THING_PAGE macro to consolidate this rounding bit magic among
the several places that use it in the patch.  I'll hold off attaching another
patch until after addressing the popped-stack problem, but wanted to pass this
fix along in case anyone is trying the last patch (which works pretty well --
the only time the above bug matters is when the arena payload just happens to
align 0 mod 1024 -- unlikely).

/be
(Assignee)

Comment 35

18 years ago
Created attachment 13980 [details] [diff] [review]
full patch, makes jsinterp.c avoid premature popping
(Assignee)

Comment 36

18 years ago
Created attachment 13981 [details] [diff] [review]
argh, that last patch was a stale file -- this is the one
(Assignee)

Comment 37

18 years ago
Created attachment 13982 [details] [diff] [review]
diff -wu for easier (generally) code review
(Assignee)

Comment 38

18 years ago
I went over every damn POP* in the interpreter and changed all that might have
left an unrooted value dangling use the new FETCH_OPND(-n) or STORE_OPND(-n,v)
macros.  These minimize adds and subtracts, provided the target architecture
supports load-immediate and store-immediate where the immediate is a small
negative number (-4, -8, rarely -12 for four-byte jsvals).

Anyone dare to review this?

/be
(Assignee)

Updated

18 years ago
Blocks: 27924
(Assignee)

Comment 39

18 years ago
Created attachment 13988 [details] [diff] [review]
latest and greatest; this patch also fixes bug 38942
(Assignee)

Updated

18 years ago
Blocks: 38942
(Assignee)

Comment 40

18 years ago
Created attachment 13989 [details] [diff] [review]
diff -wu version of last patch
(Reporter)

Comment 41

18 years ago
Ok, i looked a bit at the patch. I can really only comment about the arena/flags
stuff, since i don't know the other code.

One thing i noted was that the comment says "then find the first 0 mod 1024
boundary within that payload", but the code seems to use the last possible 0 mod
1024 boundary.

Also, what is the reason the gc_find_flags calls are moved inside the GC lock?

Except that, it looks good.

I don't know about the allocation policy inside the js code, but there is this
JS_malloc() call. Shouldn't it be used instead of malloc() to get the finalizer
vector?
(Assignee)

Comment 42

18 years ago
> One thing i noted was that the comment says "then find the first 0 mod 1024
> boundary within that payload", but the code seems to use the last possible 0
> mod 1024 boundary.

No, it finds the first 0 mod 1024 boundary after the beginning of the payload,
and assuming GC_FLAGS_SIZE is 1024 -- so the comment is a bit off.  I'll fix it
to say "the first 0 mod 1024 byte boundary in the middle of, or at the end of,
the first flags span" or something better (ay caramba, that sentence sucked!).

(Note that (p+1024)&~1023 is ((p + 1024) / 1024) * 1024 or the greatest multiple
of 1024 less than or equal to p + 1024.  So either all 1024 flag bytes fit in
the first span, and there is no second span -- or we have to split.)

> Also, what is the reason the gc_find_flags calls are moved inside the GC lock?

Those were in js_LockGCThing and js_UnlockGCThing, right?  Good question.  Even
in the old scheme, there was no thread safety bug (another thread could add new
arenas at the end of the thing or flags pool, but not remove any arenas leaving
pointers dangling transiently).  I'll move 'em back, shorten those critical
sections a bit.  Thanks.

> I don't know about the allocation policy inside the js code, but there is
> this JS_malloc() call. Shouldn't it be used instead of malloc() to get the
> finalizer vector?

JS_malloc is guaranteed to be malloc, inside the engine (see other malloc calls)
and you need a cx to call JS_malloc.  Here, we're initializing the GC in JS_Init
aka JS_NewRuntime, before the first context can be created (you need an rt to
make a cx).  So no worries.

/be
(Assignee)

Comment 43

18 years ago
I'm testing and finding several things to fix.  News from my fishbowl:

JS_DefineConstDoubles and other places "knew" that they could tag the address of
a static or heap-allocated jsdouble and let the GC see it, because the old GC
would bounds-check the untagged pointer against its thing arenas.  That's not
true any longer (good thing!) but it leads to wild pointer loads from the
non-existent JSGCPageInfo and equally unreal flags array "below" the static or
heap-allocated jsdouble.

Another problem shows up opening mail.  Somehow, a native method deep below a
bunch of event handling has its fp->sp 4 jsvals beyond fp->spbase.  I'm tracking
that down now.

/be
(Assignee)

Comment 44

18 years ago
D'oh -- forgot to free rt->gcFinalVec in js_FinishGC.  New patch coming up after
I nail this mail-window bringup problem.

/be
(Assignee)

Comment 45

18 years ago
Argh, the old GC was just too lenient.  Now I need to make every discontiguous
js_AllocStack allocation available for scanning by the GC's mark phase.  There
are well-known stack allocations already available via fp->argv and fp->vars,
and I added fp->spbase, bounded by the fp->sp fencepost.  But js_InternalInvoke
and other js_AllocStack callers may cause a new allocation, leaving
uninitialized memory at the top of the old spbase.  That's what is killing
mail/news startup.

I'll probably use a header per discontiguous allocation in the stackPool arena
and let the GC follow that thread.

/be
(Assignee)

Comment 46

18 years ago
Created attachment 14315 [details] [diff] [review]
here it is, the big one you've been waiting for
(Assignee)

Comment 47

18 years ago
[My proposed checkin log message:]

Fixes to make JS GC truly exact:

- All jsvals for which JSVAL_IS_GCTHING evaluates to true must contain tagged
pointers into the GC heap -- therefore jsapi.c's JS_DefineConstDoubles cannot
"cheat" by tagging addresses of static jsdoubles to avoid js_NewNumberValue.

- Finalization is now interleaved with the Sweep phase, to avoid allocating
memory for finalization records while sweeping.  Instead, the JSRuntime holds a
preallocated JSGCThing vector (gcFinalVec) that the Sweep phase fills and
flushes via gc_finalize_phase, repeatedly.

This means that finalizers cannot allocate a new GC thing, an incompatible but
plausible change.  js_AllocGCThing asserts and then checks whether it is called
while rt->gcLevel is non-zero, and fails the allocation attempt if so.  But this
fixes bug 38942, where the old sweep-then-finalize with a sweep => malloc
dependency could lead to memory exhaustion.

- Instead of scanning whole stackPool arenas, which led to UMRs (bug 27924) and
sometimes to gross over-scanning that depended on the GC bounds-checking all
thing pointers against its heap, we scan exactly those stack slots in use:
  - arguments reachable from fp->argv;
  - variables reachable from fp->vars;
  - operands now reachable from fp->spbase, bounded above by the lesser of
    fp->sp or fp->spbase + fp->script->depth for an interpreted frame; if the
    latter, fp->sp has advanced logically above the operand budget, in order to
    call a native method, and all unused slots from fp->sp up to depth slots
    above fp->spbase must be set to JSVAL_VOID;
  - stack segments pushed when calling native methods, prefixed by JSStackHeader
    structs and linked from cx->stackSegments through each header.
The stack segment headers help the GC avoid scanning unused portions of the
stack: the generating pc slots running depth slots below fp->spbase, and slots
at the end of an arena that aren't sufficient to satisfy a contiguous allocation
for more args, vars, or operands.

- Exact GC means the stack pointer must remain above live operands until the
interpreter is done with them, so jsinterp.c got heavily whacked.  Instead of
POPs of various kinds followed by a PUSH for binary operators (e.g.), we use
FETCH and STORE macros that index by -1 and -2 from sp, and minimize adjustments
to sp.  When sp is homed to fp->sp, this allows js_DecompileValueGenerator to
find the value reliably, and if possible its generating pc.

- Finally, the O(n**2) growth rate of gc_find_flags has been fixed, using the
scheme sketched in bug 49816 and documented in a new major comment in jsgc.c. 
Briefly, by allocating flags and things from one arena, we can align things on
1024-byte "thing page" boundaries, and use JSGCPageInfo headers in each page to
find a given thing's flags in O(1) time.

/be
> This means that finalizers cannot allocate a new GC thing, an incompatible but
> plausible change.  js_AllocGCThing asserts and then checks whether it is
> called while rt->gcLevel is non-zero, and fails the allocation attempt if so.

It also means that JSMarkOp markers can't allocate a new GC thing, if I read
that correctly.  Do we care about that?
(Assignee)

Comment 49

18 years ago
It would never work (interleaved sweep and finalize phases, or not) to allocate
a new GC-thing while marking, because you might allocate "behind" the mark depth
first search, leaving a newborn without a mark bit who would be viciously swept
up as garbage.

So how's the rest of the patch look?

/be
I don't see why a JSMarkOp couldn't ``allocate black'' by calling JS_MarkGCThing
on the newborn.  That would cause pain for whatever was relying on cx->newborn
to preserve its stuff, of course, unless we didn't fill cx->newborn[TYPE] for
allocations during GC.

I'm not saying it's a good idea, I just want to make sure that we're consciously
forbidding it. (In which case we should clearly document it in the
JS_MarkGCThing/JSMarkOp comments.)
(Assignee)

Comment 51

18 years ago
(Shaver, you boning up on GC for the SpiderMonkey incremental GC work? ;-)

We need documentation and publicity about the new finalize=>js_AllocGCThing
failure case, and I agree that we should not that JSMarkOp implementors face the
same restriction.  I'll post to the newsgroup today.

/be
(Assignee)

Comment 52

18 years ago
Anyone had time to try the patch?  Bruce, any purify results?  Bueller?  Anyone?

/be

Comment 53

18 years ago
I'm in a crunch at work.  I'm starting to feel that this crunch is not my 
problem, so I'll hopefully find the time to try your patch under Purify tomorrow 
evening.

Comment 54

18 years ago
X:\seamonkey\mozilla\js\src\jsgc.c(897) : warning C4244: '=' : conversion from '
unsigned long ' to 'unsigned char ', possible loss of data

Hey, you already checked in part of the jsparse.c part of the patch? Applying 
this patch undoes that.

|Index: jsparse.c
|===================================================================
|RCS file: /cvsroot/mozilla/js/src/jsparse.c,v
|retrieving revision 3.43
|diff -u -r3.43 jsparse.c
|--- jsparse.c  2000/09/09 05:52:59     3.43
|+++ jsparse.c  2000/09/09 07:44:25
--------------------------
Patching file jsparse.c using Plan A...
Reversed (or previously applied) patch detected!  Assume -R? [y] Hunk #1 succeed
ed at 299.
Hunk #2 succeeded at 2151.
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:

Comment 55

18 years ago
yes, all of the jsparse.c diff is already in. right?

Running xpc tests. Haven't looked at the code yet.

No more js meter?

Comment 56

18 years ago
Don't expect a thorough code review from me (unless no one else steps up). I 
started to read and had to stiffle a panic attack. Maybe some of our real 
Computer Scientist friends will review it?

I tried to run under Purufy on NT after just rebuilding in js/src and got what 
looked like Purify internal errors - it looked pretty confused.

I have that machine doing a depend build and I'll give it a try later.

Comment 57

18 years ago
I ran mozilla under Purify on NT with this patch. chofmann's browser buster has 
gone through 23 pages without any problems attributable to JSGC. I opened the 
mail window and used the editor a bit too. I didn't actually read or send mail.
(Assignee)

Comment 58

18 years ago
Created attachment 14559 [details] [diff] [review]
latest version of the big one (no jsparse.c; avoid extra SAVE_SPs in CACHED_[GS]ET in jsinterp.c)
(Reporter)

Comment 59

18 years ago
God damn. Thats an awful lot of code to review. I'll just give a couple of
comments on the stuff i've read through, and then i'll continue with the rest
tomorrow.

In struct JSStackHeader the nslots member is of type uintN which is not
guaranteed to be 32bit (or rather, the same size as jsval). This means that the
+2 calculation in JS_STACK_SEGMENT might be off.
In fact, a comment next to the JSUintn typedef (which uintN is typedeffed to)
says "These types are never valid for fields of a structure."

There is a gratious change:
-    SAVE_SP(fp);
+    fp->sp = sp;
That gives the same code. I don't know which is more easy to read, but it is
contrary to the style of the rest of the code.
(Assignee)

Comment 60

18 years ago
Damn, I was hoping no one would complain about that ((jsval *)sh + 2) hack, or 
my use of uintN (which I do use in structs, contrary to the NSPR-derived comment 
in jstypes.h).

uintN and intN are sugar (or salt) for unsigned int and int, respectively.  If 
you can be sure that you don't need more than 16 bits, but you don't want to 
pack and waste cycles extending and chopping from 32 to 16 bits, then I claim 
you should use uintN rather than uint16.  Of course, if you need exactly 16 bits 
and no more, then say so.

It turns out that you can be sure sh starts on a jsval boundary, at least on all 
architectures I know of, because the code casts from a jsval-aligned address to 
sh (in js_AllocStack).  It may be that sizeof(JSStackHeader) < 2*sizeof(jsval), 
but that just means there's a wasted 16-bit gap (or 32-bit on alpha) after sh 
and before the first jsval in the stack segment.

I seriously did consider (and started to hack) code that rounds up sizeof *sh to 
a sizeof(jsval) multiple, and does the byte address arithmetic, but it offended 
me and I reckoned (recap above) that it was unnecessary.  Am I wrong?

The fp->sp = sp; change was motivated by similar restore code just below.  The 
SAVE_SP macro is a little nasty (it uses sp from its lexical environment), and 
in this case I judged that inappropriate because asymmetric with the oldsp 
restore code.

Thanks for looking at this.  It seems pure by jband's lights (still hoping that 
Bruce gets a chance to try it).  I'm hoping to check in today or tomorrow.

/be
(Reporter)

Comment 61

18 years ago
Can you be sure that sizeof(uintN) <= sizeof(jsval) always?

For example, on Alpha windows, pointers are 32bit, but registers are 64bit (ie.
good size for local vars). Then it could possibly happen that uintN is 64bit but
jsval is 32. This is a corner case though, and i don't expect it to be a
problem. (You just make jsval 64bit.)
(Assignee)

Comment 62

18 years ago
A jsval has to be big enough to hold a pointer.  A uintN should be a native int 
that's at least 16 bits in precision and unsigned.  Does Alpha really have a 
type model where pointers are 32 bits and ints and unsigneds are 64?  Crazy, and 
bound to break old (pre-ANSI-C) code, so I'm skeptical.

Anyway, a jsval is a jsword is a JSWord (this used to be cleaner before the late 
great NSPR unification, when JSRef did things more directly), and a JSWord is a 
long.  So the question is, might sizeof(long) < sizeof(unsigned) ever be true?  
I don't think so -- that would violate the C standard.

/be

Comment 63

18 years ago
I've built and played with this on NT; I'm intimidated by the scope but haven't 
seen anything that would be an issue. Not sure that counts as an r=...
(Assignee)

Comment 64

18 years ago
I checked in.  It felt like giving birth.  Don't be calling my baby ugly now!

/be
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED

Comment 65

18 years ago
Marking Verified -
Status: RESOLVED → VERIFIED

Updated

7 years ago
Depends on: 620942
You need to log in before you can comment on or make changes to this bug.