Closed Bug 674251 Opened 13 years ago Closed 13 years ago

Make JSScripts gc-things

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla9

People

(Reporter: jorendorff, Assigned: igor)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 8 obsolete files)

A simplification. The current model of who owns JSScripts makes it a pain for the debugger to hold references to them.

#ifdef DEBUG_jorendorff
    It also seems error-prone to me. I don't know why this code is safe:

        script = Compiler::compileScript(...);
        if (!script)
            return NULL;
        JSObject *scriptObj = js_NewScriptObject(cx, script);

    js_NewScriptObject can perform last-ditch GC. Atoms are never collected
    during a last-ditch GC, but what about proto-functions and scope objects
    in the script?
#endif

Currently, JS_Evaluate*Script* and eval directly call js_DestroyScript. What if we dropped that? All the atoms, scope objects, functions, function scripts, and other things allocated by compilation are garbage anyway. If necessary, we can still blow away any jit code held by the script as we exit the frame.
(In reply to comment #0)
> A simplification. The current model of who owns JSScripts makes it a pain
> for the debugger to hold references to them.

So what about always allocating JSScript object and moving most fields there for faster access with a variable-size bytecode and other arrays allocated on the heap?
If you make it finalizable in the background there shouldn't be any additional GC pause time regression.
(In reply to comment #2)
> If you make it finalizable in the background there shouldn't be any
> additional GC pause time regression.

Finalizing scripts i background would be nice as it would also allow to finalize functions in background, but there are a lot of code in DestroyScript that is single-threaded, like those jit-related release methods. Some of it I guess can be replaced with a sweeping loop over all existing scripts that releases things that refers to soon-to-be-finalized scripts, but that requires extra complexity in the code. So lets turn JSScript into GC things first and then see how it would affect the GC pause.
dvander and bhackett chimed in on IRC that this would help stuff they're doing too.
(In reply to comment #3)
> (In reply to comment #2)
> > If you make it finalizable in the background there shouldn't be any
> > additional GC pause time regression.
> 
> Finalizing scripts i background would be nice as it would also allow to
> finalize functions in background, but there are a lot of code in
> DestroyScript that is single-threaded, like those jit-related release
> methods. Some of it I guess can be replaced with a sweeping loop over all
> existing scripts that releases things that refers to soon-to-be-finalized
> scripts, but that requires extra complexity in the code. So lets turn
> JSScript into GC things first and then see how it would affect the GC pause.

With type inference enabled, all method JIT code is now released on every GC, so DestroyScript should not have to consider issues with releasing memory from executable allocators.  This does not hold with TI disabled, though (to do this we need to be able to throw all active frames into the interpreter during GC, which we can't do with the base method JIT since it does not fully sync the stack).

This bug would help TI a lot, as the current fact that DestroyScript can be called on scripts outside of GC means we can't analyze types in such scripts, and can't compile them (type constraints generated on a script will refer to that script's contents, and cannot be freed until the next GC).
Just a note about the DEBUG_jorendorff aside:
js_NewScriptObject creates an AutoScriptRooter on the script you pass in. So it looks like this is actually non-buggy. However, we clearly need a less hacky way to deal with this because it's quite bug-prone.
A note to whoever does this:  the "scripts" memory reporter in xpcjsruntime.cpp (e.g. see GetCompartmentScriptsSize()) will need to be changed.  I'm happy to help with that.
Assignee: general → igor
Attached patch v1 (obsolete) — Splinter Review
The patch implements that idea of splitting JSScript into fixed-sized JSScript part allocated on the GC heap and the variable-length part allocated using malloc. When we will be able to allocate variable-length GC things, this split should be reconsidered, but for now this solves rooting issues as the embedding can use normal rooting API and rely on the conservative stack scanner. It also allows to eliminate JSObject wrappers for scripts that are currently created to root scripts, but this is for another bug as this would require changing JS API.

With the patch I preserved calling script destroy hooks when it is known that the script is no longer used, is this necessary in view of the debugger changes?
Attachment #551541 - Flags: review?(wmccloskey)
Attachment #551541 - Flags: review?(jorendorff)
Comment on attachment 551541 [details] [diff] [review]
v1

The try server shows a consistent orange so the patch needs more work.
Attachment #551541 - Attachment is obsolete: true
Attachment #551541 - Flags: review?(wmccloskey)
Attachment #551541 - Flags: review?(jorendorff)
Igor, I have a bunch of related work in bug 665167 that's partly reviewed. Could you take a look?
(In reply to Jason Orendorff [:jorendorff] from comment #10)
> Igor, I have a bunch of related work in bug 665167 that's partly reviewed.
> Could you take a look?

Well, with this patch we could store eval scripts in the weak hash directly and there would be no need to create any script objects.
Attached patch v2 (obsolete) — Splinter Review
The new version enables background finalization for functions and closures and fixes the try server issues. Now the tryserver is green.
Attachment #551757 - Flags: review?(wmccloskey)
Attachment #551757 - Flags: review?(jorendorff)
Should I wait to review this until bug 672829 lands, or will it not be affected?
Bill, don't wait. This "conflicts" with jsdbg2, but in a good way: I think the merge will involve simplifying the Debugger.Script stuff in jsdbg2 considerably.

I'll finish reviewing this tomorrow. I'm only about 30% done now.
Igor:  I'd like to see some measurements of this change on Gregor's MemBench test (like the ones in bug 671702) before it lands, please!
Comment on attachment 551757 [details] [diff] [review]
v2

Review of attachment 551757 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great to me! It looks like sometimes you need to call the DestroyScriptHook eagerly, and other times you wait until finalization. Hopefully Jason understands that better than me; I didn't check it too carefully.

::: js/src/jsfun.cpp
@@ +1692,3 @@
>       */
> +    if (fun != obj && fun->isFlatClosure())
> +        Foreground::free_(obj->getFlatClosureUpvarsOrNull());

getFlatClosureUpvarsOrNull seems kinda hacky to me. Are there any bits left in JSFunction::flags that we could use to store whether there are upvars?

Either way, I would prefer to have a hasFlatClosureUpvars test rather than the OrNull thing. I think it might make the isDouble test (or the flag check) a little easier to understand.

::: js/src/jsgcmark.cpp
@@ +839,5 @@
> +        JSConstArray *constarray = script->consts();
> +        MarkValueRange(trc, constarray->length, constarray->vector, "consts");
> +    }
> +
> +    if (script->u.object) {

Maybe put a comment here about the eval cache being cleared first, so that you never get the evalCacheLink member out.

::: js/src/jsparse.cpp
@@ +180,5 @@
>      pn_arity = PN_NULLARY;
>      pn_parens = false;
>  }
>  
> +Parser::Parser(JSContext *cx, StackFrame *cfp, bool foldConstants)

I don't understand the principals changes. Are they related to this patch, or just a cleanup?

::: js/src/jsscript.cpp
@@ +1333,1 @@
>          gsnCache->purge();

We already purge all GSN caches earlier in the GC (through ThreadData::purge), so we should be able to remove this.
Attachment #551757 - Flags: review?(wmccloskey) → review+
Attached patch v3 (obsolete) — Splinter Review
The new version addresses the comments above (including removal of unrelated principal chagnes) and fixes one corner case that I missed when enabling background finalization of closures and JSFunction instances.

Currently fun_finalize uses getFunctionPrivate() to access JSFunction * referenced by the closure. But that implies that the closure must always be finalized before JSFunction instances. This is ensured via the finalization ordering in GC.

The issue with the background finalization of functions is that currently we add arena lists to be finalized using fallible Vector::append and do immediate finalization if that fails. When that happens when we try to add JSFunction arenas, then we will finalize them before the closures.

The patch fixes that via reserving the space in the background finalization vector so the code can use Vector::infallibleAppend ensuring the finalization order. JSObject::finalizeUpvarsIfFlatClosure() that replaced getFlatClosureUpvarsOrNull() from the previous patch contains detailed comments about this.
Attachment #551757 - Attachment is obsolete: true
Attachment #551757 - Flags: review?(jorendorff)
Attachment #552334 - Flags: review?(jorendorff)
Here are the memory stats for http://gregor-wagner.com/tmp/mem that opens 150+ tabs. The columns here are showing about:memory after browser start up, showing the benchmark window with all the tabs, closing the tabs using the button, closing the window. In all cases I repeatedly pressed GC and CC buttons until js-gc-heap-chunk-clean-unused becaomes 0 and other numbers stabilizes.

Before the patch:  

                             initial     all tabs       no tabs     no window

heap-allocated               35.21 MB   1,872.72 MB     863.38 MB    120.95 MB
heap-committed               38.00 MB	1,949.00 MB   1,948.00 MB  1,510.00 MB
heap-dirty                    0.64 MB	    2.90 MB       2.50 MB      3.69 MB
heap-unallocated              2.79 MB	   76.28 MB   1,084.62 MB  1,389.05 MB
js-compartments-system              2	          2             2            2
js-compartments-user                3	        251           148            1
js-gc-heap                    8.00 MB	  462.00 MB     469.00 MB     48.00 MB
js-gc-heap-arena-unused       0.21 MB	   80.04 MB      71.24 MB      3.14 MB
js-gc-heap-chunk-clean-unused 0.00 MB	    0.00 MB       0.00 MB      0.00 MB
js-gc-heap-chunk-dirty-unused 1.53 MB	   10.33 MB     305.56 MB     37.65 MB
js-gc-heap-unused-fraction     21.70%	     19.56%        80.34%       84.99%
page-faults-hard                    1	         56            56           56
page-faults-soft               19,509	  2,220,516     2,521,828    2,565,070
resident                     68.61 MB	2,110.54 MB   1,426.49 MB    324.61 MB
vsize                       483.94 MB	2,688.40 MB   2,599.85 MB  2,038.41 MB

After the patch:

                             initial     all tabs       no tabs     no window

heap-allocated               38.16 MB   1,895.62 MB     938.15 MB    137.65 MB
heap-committed               44.00 MB	1,969.00 MB   1,972.00 MB  1,457.00 MB
heap-dirty                    1.78 MB	    3.25 MB       2.80 MB      3.46 MB
heap-unallocated              5.84 MB	   73.38 MB   1,033.85 MB  1,319.35 MB
js-compartments-system              2	          2             2            2
js-compartments-user                1	        253           150            2
js-gc-heap                   10.00 MB	  555.00 MB     562.00 MB     65.00 MB
js-gc-heap-arena-unused       1.14 MB	   93.05 MB      81.30 MB      4.92 MB
js-gc-heap-chunk-clean-unused 0.00 MB	    0.00 MB       0.00 MB      0.00 MB
js-gc-heap-chunk-dirty-unused 2.03 MB	   10.53 MB     367.49 MB     50.46 MB
js-gc-heap-unused-fraction     31.69%	     18.66%        79.85%       85.19%
page-faults-hard                    3	          5             5            5
page-faults-soft               22,371	  2,147,691     2,580,174    2,694,574
resident                     74.95 MB	2,134.36 MB   1,457.65 MB    331.14 MB
vsize                       474.15 MB	2,737.87 MB   2,630.24 MB  1,976.43 MB

As the patch replaces each malloc-allocated script with fixed-size GC-allocated header and variable-length malloc, I expected some regression. But it does not look bad IMO. In particular, the GC fragmentation in the last column has not increased. 

Also increases in RSS and heap-allocated can be explained by moving memory from malloc into the JS heap which does not use madvise/decommit for its unused fragmented parts. That is, an extra 2 initial MB got multiplied into 15 due to high fragmentation in the final state. I suppose bug 671702 would help to offset that.
That looks good! Do you have any numbers about impacts for marking and sweeping and GC pause time in general?
(In reply to Gregor Wagner from comment #19)
> Do you have any numbers about impacts for marking and
> sweeping and GC pause time in general?

The MOZ_GCTIMER stats for v8.googlecode.com/svn/data/benchmarks/v6/run.html :

base                                    patch

Total     Mark     Sweep                Total     Mark     Sweep
8.784000 6.562000 1.236000              4.262000 2.584000 0.567000  
11.261000 7.036000 2.142000		3.874000 2.503000 0.428000  
4.154000 2.616000 0.435000		3.921000 2.508000 0.471000  
4.065000 2.499000 0.436000		3.913000 2.424000 0.502000  
4.059000 2.514000 0.431000		4.233000 2.472000 0.602000  
4.146000 2.469000 0.535000		4.186000 2.475000 0.588000  
4.323000 2.506000 0.509000		4.221000 2.522000 0.573000  
4.266000 2.528000 0.461000		4.243000 2.513000 0.567000  
4.233000 2.469000 0.465000		4.193000 2.492000 0.579000  
4.204000 2.465000 0.456000		4.143000 2.456000 0.576000  
4.229000 2.490000 0.457000		30.899000 2.733000 26.543000
4.279000 2.522000 0.467000		32.242000 2.841000 27.808000
37.499000 2.505000 33.142000		32.327000 2.789000 27.932000
37.641000 2.563000 33.310000		32.465000 2.826000 28.031000
37.755000 2.641000 33.287000		32.291000 2.885000 27.813000
37.788000 2.787000 33.239000		32.482000 2.927000 27.957000
37.736000 2.761000 33.193000		32.394000 2.942000 27.855000
37.891000 2.917000 33.194000		32.432000 3.000000 27.823000
37.835000 2.889000 33.214000		32.362000 2.964000 27.790000
37.904000 2.878000 33.267000		32.465000 2.980000 27.820000
37.439000 2.526000 33.130000		26.006000 2.377000 22.033000
38.354000 2.615000 33.986000		7.356000 5.733000 0.496000  
8.137000 3.163000 3.314000		9.060000 7.463000 0.479000  
8.475000 6.631000 0.499000		4.453000 2.710000 0.457000  
9.771000 7.869000 0.483000		7.816000 6.190000 0.483000  
4.683000 2.789000 0.466000		8.937000 7.314000 0.478000  
8.277000 6.347000 0.496000		4.301000 2.624000 0.462000  
9.467000 7.631000 0.503000		7.611000 5.998000 0.481000  
4.783000 2.869000 0.487000		8.999000 7.372000 0.494000  
8.181000 6.318000 0.501000		4.424000 2.719000 0.464000  
9.440000 7.612000 0.493000		7.639000 6.012000 0.479000  
4.708000 2.847000 0.484000		8.996000 7.388000 0.483000  
8.146000 6.294000 0.498000		4.402000 2.690000 0.479000  
9.392000 7.489000 0.495000		7.642000 5.964000 0.529000  
4.631000 2.771000 0.486000		9.057000 7.444000 0.482000  
8.029000 6.095000 0.500000		4.377000 2.649000 0.462000  
9.406000 7.544000 0.498000		7.561000 5.961000 0.484000  
4.712000 2.853000 0.480000		9.009000 7.406000 0.482000  
7.969000 6.055000 0.496000		4.329000 2.649000 0.467000  
9.384000 7.512000 0.539000		7.606000 6.001000 0.477000  
4.737000 2.870000 0.490000		9.010000 7.408000 0.483000  
4.341000 2.524000 0.469000		23.911000 8.172000 13.123000
23.885000 8.476000 13.019000		23.106000 8.223000 12.911000
24.247000 8.660000 13.197000		24.156000 8.283000 13.885000
25.253000 8.584000 14.231000		24.254000 8.280000 13.996000
25.283000 8.556000 14.301000		54.588000 50.425000 2.332000
55.527000 51.218000 2.246000		69.078000 66.995000 0.592000
69.253000 66.903000 0.641000		71.240000 69.166000 0.593000

The same after opening and fully loading http://gregor-wagner.com/tmp/mem and then running the GC via about:memory

base                                    patch 
646.503000 427.993000 99.133000         578.887000 425.409000 75.855000
649.060000 428.511000 100.787000        577.070000 425.982000 74.095000
645.343000 428.569000 98.480000	        581.235000 425.988000 76.872000

With V8 results I suppose there is small regression in the max timing. But with tabs it looks like there is a clear win during the sweep phase. I guess a background finalization of functions plays a role here.
Just to clarify:  this is a good change primarily because it simplifies various things in the JS engine, right?  And the performance effects (GC time, heap size) are minor?  Seems reasonable.
Thanks for the numbers! Looks great!
Did you disable the Arena::finalize(JSContext *cx) memset instrumentation when you collected the numbers?

                if (!newFreeSpanStart)
                    newFreeSpanStart = thing;
                t->finalize(cx);
                memset(t, JS_FREE_PATTERN, sizeof(T));

The sweeping cost is heavily influenced by this memset (at least on Mac).
I remember that the GC triggers are very tight so that we don't perform any GC during sunspider. I don't think this patch will change it but we should check it at least.
Blocks: 662704
(In reply to Gregor Wagner from comment #22)
> Did you disable the Arena::finalize(JSContext *cx) memset instrumentation
> when you collected the numbers?
> 
>                 if (!newFreeSpanStart)
>                     newFreeSpanStart = thing;
>                 t->finalize(cx);
>                 memset(t, JS_FREE_PATTERN, sizeof(T));

No, I have keep that as is.

> I remember that the GC triggers are very tight so that we don't perform any
> GC during sunspider. I don't think this patch will change it but we should
> check it at least.

I see no differences in SS.
Blocks: 678830
Attached patch v4 (obsolete) — Splinter Review
The new version is a rebase on top of jsdbg2 changes. The patch does not touch the code that creates JSScript object holders. That is for bug 678830.

The most ugly part of merge was the need to add gcpad and gcpad2 to JSScript to ensure that sizeof(JSSccrpt) is a multiple of 8 bytes on 32-bit platform. Initially I tried to hide that with some helper classes, but those just add a few lines of code just to hide memory layout inefficiency.
Attachment #552334 - Attachment is obsolete: true
Attachment #552334 - Flags: review?(jorendorff)
Attachment #553008 - Flags: review?(jorendorff)
Attached patch v5 (obsolete) — Splinter Review
Attachment #553120 - Flags: review?
(In reply to Igor Bukanov from comment #25)
> Created attachment 553120 [details] [diff] [review]
> v5

The new version adds only one pad field in JSScript conditioned on all the necessary cases.
Attachment #553008 - Flags: review?(jorendorff)
Comment on attachment 553120 [details] [diff] [review]
v5

I couldn't find a single thing wrong with this! r=me.

This patch continues calling the destroyScript hook in the exact same
places it was being called before, except in some of the error paths.
That's a good decision, I think, but do you know of any reason why we
couldn't later change it, so we just fire the event when the script
actually gets destroyed? I had planned to do so in bug 674164.

In jscompartment.h, struct JSCompartment:
>+    JSScript                     *evalScriptHash[JS_EVAL_CACHE_SIZE];

Call it evalCache please.

In jscompartment.cpp, JSCompartment::purge:
>+    /*
>+     * Clear the hash and reset all evalHashLink to null before the GC. This
>+     * way MarkChildren(trc, JSScript *) can assume that JSScript::u.object is
>+     * not null when we have script owned by an object and not from the eval
>+     * hash.
>+     */

I think "clear the eval cache" is clearer than "clear the hash".

It's unclear to me what assumption MarkChildren is making exactly. I
think you mean "can assume that if JSScript::u is non-null, it points to
a JSObject, not another script."

In jsscript.h, struct JSScript:
>-        JSScript    *nextToGC;  /* next to GC in rt->scriptsToGC list */
>+        JSScript    *evalHashLink;

Comment it: /* Hash table chaining for JSCompartment::evalCache. */

In MarkKind:
>         case JSTRACE_OBJECT:
>             Mark(trc, reinterpret_cast<JSObject *>(thing));
>             break;
>         case JSTRACE_STRING:
>             MarkString(trc, reinterpret_cast<JSString *>(thing));
>             break;
>         case JSTRACE_SHAPE:
>             Mark(trc, reinterpret_cast<Shape *>(thing));
>             break;
>+        case JSTRACE_SCRIPT:
>+            Mark(trc, reinterpret_cast<JSScript *>(thing));
>+            break;

Nit: As a separate changeset, maybe, I think all these should be static_casts.

In jsinterp.cpp, Execute:
>-    AutoScriptRooter root(cx, script);

Huh. Why was this rooted? Aren't we guaranteed that the StackFrame contains a
pointer to the script? It doesn't matter now, I guess.

>-    Parser(JSContext *cx, JSPrincipals *prin = NULL, StackFrame *cfp = NULL, bool fold = true);
>+    Parser(JSContext *cx, StackFrame *cfp = NULL, bool fold = true);

Add explicit while you're in here, please.

>-    Compiler(JSContext *cx, JSPrincipals *prin = NULL, StackFrame *cfp = NULL);
>+    Compiler(JSContext *cx, StackFrame *cfp = NULL);

Here too.

In jsscript.cpp:
> 
>+
>+void
> CheckScriptOwner(JSScript *script, JSObject *owner)

Style nit: Just one blank line here.

In jsscript.cpp, JSScript::finalize:
>-    memset(script, 0xdb, script->totalSize());

Suggest keeping this for script->data, making it DEBUG-only.

The GC will blit JS_FREE_PATTERN over the JSScript itself, right? If
not, that's a showstopper for me.

In xpconnect/src/nsXPConnect.cpp, nsXPConnect::Traverse:
>             static const char trace_types[JSTRACE_LIMIT][7] = {
>                 "Object",
>                 "String",
>+                "Shape",
>+                "Script",
>                 "Xml"
>             };

Huh. It seems like this must have been wrong before; "Shape" was
missing. To prevent that from happening again, write:
>             static const char trace_types[][7] = {
and static-assert JS_ARRAY_LENGTH(trace_types) == JSTRACE_LIMIT.

In xpconnect/src/xpcjsruntime.cpp, ReportCompartmentStats:
>     ReportMemoryBytes0(MakeMemoryReporterPath(pathPrefix, stats.name,
>+                                              "script-data"),
>+                       nsIMemoryReporter::KIND_HEAP, stats.scriptData,
>+    "Memory allocated for JSScript  byte-code and various variable-length "
>+    "tables.",
>                        callback, closure);

Style nit:    "JSScript bytecode", with a single space and without the hyphen.
Attachment #553120 - Flags: review? → review+
Regarding the memset to 0xdb, I think it makes sense to poison script->data eventually (using the new JS_POISON). For now, I'd rather we just poison the script itself, since it relates to some stuff I'm trying to figure out in bug 670072.
(In reply to Jason Orendorff [:jorendorff] from comment #27)
> This patch continues calling the destroyScript hook in the exact same
> places it was being called before, except in some of the error paths.
> That's a good decision, I think, but do you know of any reason why we
> couldn't later change it, so we just fire the event when the script
> actually gets destroyed?

I wrote it this way to keep the eval cache transparent for the debugger as it is currently now. I do not know reasons why we did it in the first place.

> The GC will blit JS_FREE_PATTERN over the JSScript itself, right? If
> not, that's a showstopper for me.

Yes, the finalization code uses JS_POISON after calling the finalizer.
Attachment #553008 - Attachment is obsolete: true
Backed out to investigate a Tdhtml regression on OSX 10.5.8 (5.62%) and 10.6 (5.93%), I already backed out the other bug in the range but nothing changed, will report back on results.
Whiteboard: [inbound]
Graph confirms this patch was the culprit. I'm surprised it affects only Mac though.
If you meetup and decide the regression is ignorable, please state so when pushing next time, so we know it should be ignored.
Given that the regression is mac-specific where we do not use jemalloc a possible explanation would be the increased number of small allocations for script bytecode. Previously they were allocated together with bigger script, but with the patch they are done separately. So while the patch replaced 220_or_so_bytes_of_script + bytecode_memory by just bytecode_memory. That may be allocated in a different malloc zone and interfered with TDHTML.

To check I try a patch with small inline-buffer for bytecode.
Blocks: 680182
Attached patch inlibe buffer (obsolete) — Splinter Review
This is an extra patch to add an inline buffer for scripts. Lets see what try server will show
I was doing some measurements yesterday and saw that the most common script-data size was 2 bytes;  IIRC roughly 25% of all scripts had that.  jemalloc is able to satisfy a 2-byte request without rounding up (likewise for 4-byte requests), but the Mac allocator rounds up to 8 or 16 (not sure which).

In your patch it looks like you are adding 40 or 44 bytes of inline storage, I think that's much more than necessary.  Some more detailed measurement would be informative.

An alternative to inline storage would be to work out why 2 bytes is such a common case, and see if it can be reduced to zero.
(In reply to Nicholas Nethercote [:njn] from comment #35)
> I was doing some measurements yesterday and saw that the most common
> script-data size was 2 bytes;  IIRC roughly 25% of all scripts had that.

We creates an empty script per compartment per each Function.prototype instance, that is once per global. As the script bytecode must be mutable to allow its debugging this empty scripts cannot be shared. I am surprised that this is that common.
 
> jemalloc is able to satisfy a 2-byte request without rounding up (likewise
> for 4-byte requests), but the Mac allocator rounds up to 8 or 16 (not sure
> which).

The inline data storage has not helped the benchmarks. My another theory is that on Mac small allocation sizes may not be 8-byte aligned unless the allocation size is itself is a multiple of 8. That could lead to misaligned access of jsval stored in the script data. 

> In your patch it looks like you are adding 40 or 44 bytes of inline storage,
> I think that's much more than necessary.  Some more detailed measurement
> would be informative.

Well, the JSScript size is 224 bytes on 64 bit without the inline buffer and on 32 bit it is around 140. So 40 bytes is not that big given. On another hand it allows to fit may small scripts like return Math.sin(this.x). So I have chosen it to investigate the performance implication.
> Well, the JSScript size is 224 bytes on 64 bit without the inline buffer and
> on 32 bit it is around 140. So 40 bytes is not that big given. On another
> hand it allows to fit may small scripts like return Math.sin(this.x). So I
> have chosen it to investigate the performance implication.

Well, measurements would be good.  We get a *lot* of JSScripts in practice.
To test the amount of scripts and their data sizes I patched NewScript from jsscript.cpp to print script size and then run http://gregor-wagner.com/tmp/mem that opens 150+ new tabs. From the browser startup until that test is loaded about 400_000 scripts where created and only about 9300 were empty. Moreover, there were just 3,6% of scripts with size not exceeding 8 and the ratio of scripts that exceeded 64 bytes where 70%.

As such using en extra inline buffer that is unused if the allocation cannot fit it just waste memory unless it significantly offsets the fragmentation. As far as I can see the memory allocation dominates by other GC things so I assume that effect is negligible.

This is sort of confirmed with the try server runs where different inline sizes and data alignments have no  influence on the benchmarks. Which still opens the question of what exactly causes the regression on Mac.
Attached patch inline buffer v2 (obsolete) — Splinter Review
This patch uses the extra 4-byte pad that is necessary on 32 bits to make sizeof(JSScript) a multiple of gc::Cell::CellSize to store inlined empty and similar scripts. It also reorganizes JSScript fields to use more efficient memory layout on 64 bit CPU. That shrinks sizeof(JSScript) on those platforms from 200 down to 184 bytes.

Another change is that now the const jsval array is placed at the start of the memory allocation for the data array. It eliminates the need for extra padding that is necessary without the patch. 

As I wrote in the previous comment, this still shows the same 5% TDHTML regression on OS X. Is simplification to the debugger and script management that this bug provides justifies that platform-specific regression? Or should more efforts be spent to avoid it?
Attachment #554171 - Attachment is obsolete: true
Attachment #554635 - Flags: review?
Your r? lacks a requestee.
Attachment #554635 - Flags: review? → review?(jorendorff)
From the try server run I noticed that the regression became slightly smaller on OSX 64-bit with the patch from the comment 39. As that patch reduces the size of JSScript on 64 bits, it means that for some reason TDHTML on OSX is very sensitive to the JS heap size increase due to JSScript allocations moved from the malloc heap. 

The tests are structured like:

function foo() {

    do_something();
    setTimeout("f()", 0);
}

This implies that a new function is compiled on each iteration so the test creates a few of JSScript instances corresponding to those functions. So a possible explanation for the regression is an extra GC that this triggers. The reason why this has not been observed on other OS could be that our the JS heap size before the tests and GC heuristics depends on the OS and CPU speed.
(In reply to Igor Bukanov from comment #38)
> To test the amount of scripts and their data sizes I patched NewScript from
> jsscript.cpp to print script size and then run
> http://gregor-wagner.com/tmp/mem that opens 150+ new tabs. From the browser
> startup until that test is loaded about 400_000 scripts where created and
> only about 9300 were empty. Moreover, there were just 3,6% of scripts with
> size not exceeding 8 and the ratio of scripts that exceeded 64 bytes where
> 70%.

I remeasured as well, you're right.  Here are the 10 most frequent dataSizes when opening 5 gmail tabs:

( 1)    361 ( 5.9%,  5.9%): dataSize = 348
( 2)    288 ( 4.7%, 10.7%): dataSize = 460
( 3)    180 ( 3.0%, 13.6%): dataSize = 240
( 4)    156 ( 2.6%, 16.2%): dataSize = 128
( 5)    153 ( 2.5%, 18.7%): dataSize = 328
( 6)     79 ( 1.3%, 20.0%): dataSize = 1028
( 7)     72 ( 1.2%, 21.2%): dataSize = 592
( 8)     69 ( 1.1%, 22.3%): dataSize = 568
( 9)     65 ( 1.1%, 23.4%): dataSize = 372
(10)     56 ( 0.9%, 24.3%): dataSize = 352

The top 10 only account for 24.3% of the total, so it's a very flat distribution, and there isn't a single dataSize less than 100 bytes.  Sorry for the bad info, I must have been measuring something else and mixed it up with dataSize.
Igor: given that this hasn't landed, would you consider folding in the patch from bug 680182?
(In reply to Igor Bukanov from comment #41)
> So a possible explanation for the regression is an extra GC that this triggers.
> The reason why this has not been observed on other OS could be that our the
> JS heap size before the tests and GC heuristics depends on the OS and CPU
> speed.

That sounds plausible. How sure are you that the inline buffer will help? Do you still want review for that? Or do you plan to retune the GC heuristics (or try some other tack) instead?
(In reply to Jason Orendorff [:jorendorff] from comment #44)
> That sounds plausible. How sure are you that the inline buffer will help?

According to the try server the inline buffer has not helped. 

> Do
> you still want review for that? 

Yes - the changes saves 4 bytes on average from the malloc data that are allocated fro script, minimize JSScript size on 64 bit CPU and avoids the the need to allocate the inline buffer for empty scripts on 32 bit using the pad word that is necessary to make sizeof(JSScript) devidable by 8. This makes the regression slightly smaller.

> Or do you plan to retune the GC heuristics
> (or try some other tack) instead?

I plan to retune the GC after the bug 665007, for now my proposal is to accept the regression.
(In reply to Steve Fink [:sfink] from comment #43)
> Igor: given that this hasn't landed, would you consider folding in the patch
> from bug 680182?

Sure.
(In reply to Igor Bukanov from comment #45)
> I plan to retune the GC after the bug 665007, for now my proposal is to
> accept the regression.

If you can confirm that the cause of regression is a greater number of GC cycles on Mac, that sounds fine to me.
(In reply to Igor Bukanov from comment #45)
> 
> I plan to retune the GC after the bug 665007, for now my proposal is to
> accept the regression.

How does bug 665007 influence the GC heuristics?
(In reply to Gregor Wagner from comment #48)
> How does bug 665007 influence the GC heuristics?

Sorry, I meant bug 671702. As that bug affects GC heap fragmentation some re tuning is necessary there.
(In reply to Jason Orendorff [:jorendorff] from comment #47)
> If you can confirm that the cause of regression is a greater number of GC
> cycles on Mac, that sounds fine to me.

I can see that the benchmark is clearly influenced by the GC params. When I made the last-ditch GC to run less likely, the regression *increased* by another 4-5%. But I could not remove the regression. When I tried to make the GC ti run to more frequently, the benchmark time dropped by 0.5%, but that could be due to noise.

Does it sound like a confirmation?
(In reply to Igor Bukanov from comment #50)
> (In reply to Jason Orendorff [:jorendorff] from comment #47)
> > If you can confirm that the cause of regression is a greater number of GC
> > cycles on Mac, that sounds fine to me.
> 
> I can see that the benchmark is clearly influenced by the GC params. When I
> made the last-ditch GC to run less likely, the regression *increased* by
> another 4-5%. But I could not remove the regression. When I tried to make
> the GC ti run to more frequently, the benchmark time dropped by 0.5%, but
> that could be due to noise.
> 
> Does it sound like a confirmation?

Well, no. There's no data there about the number of GC cycles. And the results are somewhat unexpected! I don't know what to think about that, except that it doesn't seem consistent with the proposition that we can make up the regression by tuning GC heuristics. It certainly doesn't support the hypothesis in comment 41:

> So a possible explanation for the regression is an extra GC that this triggers.
> The reason why this has not been observed on other OS could be that our the JS
> heap size before the tests and GC heuristics depends on the OS and CPU speed.

The more I think about that, the more skeptical I am, but I could still be convinced. If that hypothesis is correct, then the number of GC cycles that occur during this benchmark should look something like this:

            m-c tip   m-c tip plus the patch
               |       |
    Linux      4       4      No extra GCs, no regression
      Mac      4       6      Extra GCs caused the regression

If you run with MOZ_GCTIMER, you could also confirm that the absolute amount of the regression is very close to the duration of the extra GC cycles. That would be strong confirmation. (You could omit the Linux runs, and I might still be convinced if the numbers came out right.)

An alternative would be to run the benchmark, with and without the patch, in a Shark-enabled build and look at the profiles.

I want this patch in, but a 5% regression is a big deal. We can't blow it off.
I just noticed this in the first patch:

In jsscript.cpp,
>+     * To esnure jsval alignment for the const array we calculate the size

Typo: "ensure".
Comment on attachment 554635 [details] [diff] [review]
inline buffer v2

r=me as far as the code is concerned, but of course the regression still needs to be fixed.

In jsscript.cpp, JSScript::NewScript:
>+     * To esnure jsval alignment for the const array we calculate the size

Oh, the typo in this patch. Sorry, I thought I was looking at the first patch.

>+    /* Constants must go before the data pointer to ensure jsval alignment. */
>+    if (nconsts != 0) {
>+        JS_ASSERT(reinterpret_cast<jsuword>(data) % sizeof(jsval) == 0);
>+        script->consts()->length = nconsts;
>+        script->consts()->vector = reinterpret_cast<Value *>(data - constSize);
>+    }

I don't understand. This seems like just as big a hack as constPadding, maybe
even more egregious. What was wrong with constPadding? Could we just put the
constant array first?

The changes to Snarf in js.cpp seem unrelated.
Attachment #554635 - Flags: review?(jorendorff) → review+
(In reply to Jason Orendorff [:jorendorff] from comment #53)
> >+    /* Constants must go before the data pointer to ensure jsval alignment. */
> >+    if (nconsts != 0) {
> >+        JS_ASSERT(reinterpret_cast<jsuword>(data) % sizeof(jsval) == 0);
> >+        script->consts()->length = nconsts;
> >+        script->consts()->vector = reinterpret_cast<Value *>(data - constSize);
> >+    }
> 
> I don't understand. This seems like just as big a hack as constPadding, maybe
> even more egregious. What was wrong with constPadding? 

constPadding leads to loosing 4 bytes on average per each JSScript instance with const values. That was a consequence of deviating from the initial layout of the script when the array and structures with bigger alignment requirements came first eliminating any need for padding. The patch restores that.

>Could we just put the constant array first?

Our offsets to get various JSObjectArray-like structures must stay within 254 bytes from the data pointer. Putting the const array in front can break that. To avoid that the patch makes the data pointer to point into the middle of allocation where those structures starts.
(In reply to Jason Orendorff [:jorendorff] from comment #51)
> Well, no. There's no data there about the number of GC cycles. And the
> results are somewhat unexpected! I don't know what to think about that,
> except that it doesn't seem consistent with the proposition that we can make
> up the regression by tuning GC heuristics. It certainly doesn't support the
> hypothesis in comment 41:
> 
> > So a possible explanation for the regression is an extra GC that this triggers.
> > The reason why this has not been observed on other OS could be that our the JS
> > heap size before the tests and GC heuristics depends on the OS and CPU speed.

The tests are structures according:

function f() {
    do_something_and_return_when_done();
    setTimeout("f()", 0);
}

window.onload = f;

where do_something_and_return_when_done() does light dom manipulations. 

As setTimeout is called with a string argument, on each timer evaluation we evaluate the string. Without the patch the corresponding script is destroyed immediately but with the patch we must wait for the GC. Now, as the minimal timeout time is 4ms and we run the test for 300 seconds, we effectively creates about 300 / 0.004 or 75_000 scripts. With each script taking about 190 bytes (on 64 bit CPU) that translates into 15MB of data to allocate with the GC. The regression is on scale of 10s so we have 1.5MB per second of the regression. As the GC allocation and mark-and-sweep is at least 100 faster, the regression cannot come from the extra GC overhead and Windows/Linux data confirms that.

So indeed my initial hypothesis regarding the increased number of GC cycles is wrong, we do not have enough allocations to influence that. At worst we would get e
an extra GC cycle and that would take max 1s in the absolutely worst case given the rest of test structure. I wish I calculated this before starting to speculate about the regression origin.

I will try to experiment with disabling the DestroyScript call effectively leaking the script and see how that would compare with the patch numbers.
(In reply to Igor Bukanov from comment #54)
> constPadding leads to loosing 4 bytes on average per each JSScript instance
> with const values.

2 bytes on average, right? And only on 32-bit platforms?

> >Could we just put the constant array first?
> 
> Our offsets to get various JSObjectArray-like structures must stay within
> 254 bytes from the data pointer. Putting the const array in front can break
> that.

I didn't mean at data[0]. I just mean before the other array data. It can go after the JSObjectArray-like things. Those all have sizes that are multiples of 8 bytes, right?
(In reply to Jason Orendorff [:jorendorff] from comment #56)
> (In reply to Igor Bukanov from comment #54)
> > constPadding leads to loosing 4 bytes on average per each JSScript instance
> > with const values.
> 
> 2 bytes on average, right? And only on 32-bit platforms?

sizeof(jsval) == 8 so the waste is on average 4 bytes on 32 bit platforms.

> I didn't mean at data[0]. I just mean before the other array data. It can go
> after the JSObjectArray-like things. Those all have sizes that are multiples
> of 8 bytes, right?

You are right, we can put the const vector right after the data array and enforce with static asserts their sizes. I will update the patch with that.
Updates on the TDHTML regression. When I commented out js_DestroyScript in EvaluateUCScriptForPrincipalsCommon effectively leaking all evaluate scripts I got the same type of regression that is observed with the patch. That is, with the patch the regression is the time increase from 310 to 320, but with the leaking scripts the timing went from 310 to 322. So the issue here is that for some reason the tests on OS X are very sensitive to the number of allocated script instances.

I also tried to schedule the GC after each 1000 created scripts to minimize the memory held by scripts, but that did not help. Another option that I considered is to recognize in setTimeout implementation in dom/base/nsGlobalWindow.cpp a pattern like setTimeout("name()") and replace EvaluateScript there with CallFunctionByName. However, that regressed the benchmarks even more for some reasons on MAC OSX while slightly helping on Windows/Linux.
It looks like type inference patch made even bigger TDHTML regression on MAC than the patches here as time went from 310 to 324.
For TI all script destruction had to move into the GC --- analysis information can freely refer to scripts, and is only purged on GC.  So it made a similar change to the pain point you've identified for this test, with a similar perf change.

I think it's pretty clear from comment 58 that this regression is due to an artificial feature of this test; the patch should just go in without any TDHTML-specific tweaking.
The "artificial feature of the test" is actually a very common way for people to write code on the web, if you mean passing a string argument to setTimeout.  And at one point I seem to recall we just had algorithms in the JS engine that linearly walked all live JSScripts or something.  Have we gotten rid of those?

And we still have no idea why this is a problem on Mac and not on Linux/Windows, right?  (Or rather less of a problem on Linux/Windows; there's a regression from TI on those too.)

One thing I should note: one particular test (test_scrolling) accounts for most of the regression from the TI landing on Tdhtml.  Does that tell us anything?
(In reply to Boris Zbarsky (:bz) from comment #61)
> One thing I should note: one particular test (test_scrolling) accounts for
> most of the regression from the TI landing on Tdhtml.  Does that tell us
> anything?

Here for references the test in question: http://hg.mozilla.org/build/talos/file/1128691728d8/page_load_test/dhtml/scrolling.html
Another thought.... would a cache similar to the eval cache allow us to not allocate as many JSScripts here and sidestep the problem?  I seem to recall there being talk about maybe getting to it sometime, but if JSScript allocations are getting more expensive then maybe we should look into it more urgently.

In any case, we should still check for algorithms that walk live scripts.
(In reply to Boris Zbarsky (:bz) from comment #63)
> Another thought.... would a cache similar to the eval cache allow us to not
> allocate as many JSScripts here and sidestep the problem?  I seem to recall
> there being talk about maybe getting to it sometime, but if JSScript
> allocations are getting more expensive then maybe we should look into it
> more urgently.

We do not need to cache JSScript to help with the benchmark, it is sufficient to recognize a common pattern like setTimeout("f()", t) and turn that into a call to f. However, when I tried to hack that it fact regressed the benchmarks even more. I have no idea why.
This bug is currently blocking bug 676732 -- both bugs modify some of the memory reporters in xpcjsruntime.cpp so there are conflicts.  I fixed the conflicts when this patch originally landed, and I didn't want to re-fix them in reverse when the patch went out, so I was planning to wait until this landed again.  But it sounds like that might not happen soon, so maybe I should re-fix and land bug 676732?
I think this can probably land, at least in terms of Tdhtml; we've already taken the perf hit from TI landing...

We should file and fix (for mozilla9) a followup bug on that, though.
Attached patch v6 (obsolete) — Splinter Review
The new patch is merge with infer changes plus the 3 following changes:

1. JSScript are reorganized to get denser packing and to ensure 8-byte alignment in view of extra INFER-related fields.

2. I have added JSGCTraceKind enum and have used it in internal API that takes GC tracing kind. This way GCC warns about switches that have missing case and allowed to fix couple of places where I have missed to add JSTRACE_SCRIPT.

3. The patch removes JSScript::links and instead uses GC cell enumeration code to loop over all scripts in a copmartment. To make those iterations more convenient I have added iterator classes like CellIter and CellIterUnderGC in addition to ForEachArenaAndCell. These classes allows to keep the body of the current script iteration loops intact and add extra checks to ensure that no GC or allocation is possible when iterating over the GC things outside the GC.

These changes are in jsapi.*, jscompartment.cpp, jsgc*, jsacript.* . The rest of files are either unchanged from v5 or contains trivial extra changes like turning script->compartment into script->compartment().
Attachment #553120 - Attachment is obsolete: true
Attachment #554635 - Attachment is obsolete: true
Attachment #557107 - Flags: review?(bhackett1024)
Attached patch v7Splinter Review
The new version fixes the static assert fail about sizeof(JSScript) % CellSize == 0 on 32 bit CPU in optimized builds. I have missed that JSScript::id is debug-only. Now it should compile again.
Attachment #557107 - Attachment is obsolete: true
Attachment #557107 - Flags: review?(bhackett1024)
Attachment #557114 - Flags: review?(bhackett1024)
Blocks: 683471
Comment on attachment 557114 [details] [diff] [review]
v7

Review of attachment 557114 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgcinlines.h
@@ +365,5 @@
> +#endif
> +        if (lists)
> +            lists->clearInArenas(thingKind);
> +    }
> +};

CellIter looks great.  I think ForEachArenaAndCell can be removed entirely (maybe with an ArenaIter), though no need to do that now.

::: js/src/jsgcmark.cpp
@@ +864,5 @@
> +        script->jitNormal->trace(trc);
> +    if (script->jitCtor)
> +        script->jitCtor->trace(trc);
> +#endif
> +}

It might be worth adding a ScanScript variant of this function; doing this was helpful for type objects, and there are more scripts created than types.  This would need to unpack Bindings::trace, which is a MarkShape wrapper.

::: js/src/jsscript.h
@@ +560,5 @@
>           * - Temporary scripts created by obj_eval, JS_EvaluateScript, and
>           *   similar functions never have these objects; such scripts are
>           *   explicitly destroyed by the code that created them.
>           */
>          JSObject    *object;

Can't script->u.object and ScriptClass be removed entirely now that the script's lifetime is managed by the GC?  Or is this intended as a followup (not sure what dependencies other parts of the code base e.g. debugger have on script objects existing).
Attachment #557114 - Flags: review?(bhackett1024) → review+
Igor: are you planning to land this soon?  If so, I'll just hold off on landing Bug 683256, as you are fixing that here anyways, and that will save a little bit of rebasing.
(In reply to Andrew McCreight [:mccr8] from comment #71)
> Igor: are you planning to land this soon?  If so, I'll just hold off on
> landing Bug 683256, as you are fixing that here anyways, and that will save
> a little bit of rebasing.

I plan to land it in couple of hours if the try server is green.
Some of the JS memory reporters were screwed up by this change -- we ended up with two "shapes" reporters, one under "gc-heap" and one not.  And "type-objects" were labelled as "shapes".  I will fix these when I land bug 676732 tomorrow.
http://hg.mozilla.org/mozilla-central/rev/de4425a74643
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
Target Milestone: --- → mozilla9
Version: Other Branch → Trunk
(In reply to Nicholas Nethercote [:njn] from comment #75)
> Some of the JS memory reporters were screwed up by this change -- we ended
> up with two "shapes" reporters, one under "gc-heap" and one not.  And
> "type-objects" were labelled as "shapes". 

Sorry for not checking that *carefully* in about:memory output.
Blocks: 683862
Blocks: 683863
Blocks: 684010
Reading symbols from /home/clever/mozilla-central/objdir-netbook/dist/bin/js...done.
(gdb) run
js> quit();

*** glibc detected *** /home/clever/mozilla-central/objdir-netbook/dist/bin/js: free(): invalid pointer: 0xf6c050b4 ***
...

(gdb) bt
...
#11 JSScript::finalize (this=0x0, cx=0xffffc8c0) at /home/clever/mozilla-central/js/src/jsscript.cpp:1382
#12 0x080d66d8 in finalize<JSScript> (this=0x83ef170, cx=0x83de268) at /home/clever/mozilla-central/js/src/jsgc.cpp:236
#13 FinalizeArenas<JSScript> (this=0x83ef170, cx=0x83de268) at /home/clever/mozilla-central/js/src/jsgc.cpp:279

looks like this patch has broken the js and xpcshells
it seems to be finalizing a null JSScript*
(In reply to clever from comment #79)
> looks like this patch has broken the js and xpcshells
> it seems to be finalizing a null JSScript*

I cannot reproduce the bug locally. Could you run this under valgrind?
cant get valgrind to work right now, i'll check the backtrace more closely in gdb to see whats going on
commit de4425a74643 doesnt have the fault i found, i'll do a bisest and see if i can find the cause
odd, the problem vanished after a few recompiles
This is what I use to fix about:memory reports until bug 676732 is landed
No longer blocks: 684796
Depends on: 684796
Blocks: 687966
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: