Last Comment Bug 778724 - Allow purging analysis-temporary while retaining jitcode
: Allow purging analysis-temporary while retaining jitcode
Status: RESOLVED FIXED
[games:p2]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Other Branch
: x86 Mac OS X
: -- normal (vote)
: mozilla17
Assigned To: Brian Hackett (:bhackett)
:
Mentors:
Depends on: 780653 781657 781941 785543 785595 786491 788822 841956
Blocks: gecko-games 781767 775994 781849 785905
  Show dependency treegraph
 
Reported: 2012-07-30 07:36 PDT by Brian Hackett (:bhackett)
Modified: 2013-11-06 14:06 PST (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
part 1 (090fd1585e34) (70.97 KB, patch)
2012-07-30 13:20 PDT, Brian Hackett (:bhackett)
luke: review+
Details | Diff | Review
part 2 (previous) (3.81 KB, patch)
2012-07-30 13:51 PDT, Brian Hackett (:bhackett)
luke: review+
Details | Diff | Review
part 3 WIP (previous) (53.06 KB, patch)
2012-07-31 12:27 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review
part 3 WIP (previous) (62.79 KB, patch)
2012-08-01 07:50 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review
analysis, compilation, purging details for a 100MB cap (3.06 MB, text/plain)
2012-08-01 07:55 PDT, Brian Hackett (:bhackett)
no flags Details
part 3 WIP (previous) (87.43 KB, patch)
2012-08-01 15:25 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review
analysis, compilation, purging details for a 100MB cap (1.30 MB, patch)
2012-08-01 15:27 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review
part 3 WIP (previous) (80.89 KB, patch)
2012-08-01 16:17 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review
part 3 (previous) (99.32 KB, patch)
2012-08-06 18:47 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review
part 3 (previous) (164.74 KB, patch)
2012-08-09 17:50 PDT, Brian Hackett (:bhackett)
luke: review+
Details | Diff | Review
part 4 (previous) (110.39 KB, patch)
2012-08-10 14:47 PDT, Brian Hackett (:bhackett)
luke: review+
Details | Diff | Review
partial revert of part 1 (1bc0e4eac6e5) (58.79 KB, patch)
2012-08-24 19:26 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review

Description Brian Hackett (:bhackett) 2012-07-30 07:36:23 PDT
TI used to work this way, up to the aborted initial landing attempt last summer.  This was cut and replaced with the discard-everything-on-GC strategy, mainly due to a ton of extra time taken during GC while condensing the constraints generated during TI/compilation.  This was not a great design, and it shouldn't be hard to improve so that the compiler itself computes the condensed constraints (which don't refer to any temporary type sets), and jitcode-retaining purging can be done by just copying those condensed constraints and the contents of persistent type sets / objects into a new arena allocator.

This would be helpful for pages with an enormous amount of code and memory used during analysis, e.g. bug 775994.
Comment 1 Brian Hackett (:bhackett) 2012-07-30 13:20:36 PDT
Created attachment 647272 [details] [diff] [review]
part 1 (090fd1585e34)

This separates the freeze constraints added during compilation into implicit and explicit portions.  All type sets which describe stack values (SSA values, locals, args) are implicitly frozen, and whenever they change the associated script (and anything it was inlined into) will be recompiled.  Type sets for object properties must be explicitly frozen, along with constraints describing the properties or objects they are associated with.

This split avoids the need for condensed constraints.  Instead, when purging analysis-temporary we can just regenerate the implicit constraints for the stack from scratch, and copy over the freeze constraints for the heap and certain TI-generated constraints for property reads and call returns.

I haven't benchmarked this patch but I doubt it affects ss/v8/kraken much.  There is a potential for more recompilation but generally in oddball scenarios like arguments or 'this' which are never accessed at all.  Of more concern is the effect on chunked compilation, as any change in the stack types for a script will cause all its jitcode to be discarded, not just that for chunks which used those types.  That is not a huge problem I think: chunked compilation benefits more from incremental compilation than incremental discarding, and for code which is working on typed arrays (i.e. almost everything that benefits from chunked compilation) propagating types more eagerly should totally mitigate this issue.  Will port that code over from bug 767223 in a future patch.
Comment 2 Brian Hackett (:bhackett) 2012-07-30 13:51:24 PDT
Created attachment 647285 [details] [diff] [review]
part 2 (previous)

Propagate types more eagerly for reads from typed arrays and from singleton objects (globals) or objects where the property can be found on the prototype chain.  Also, don't generate type constraints at all for accesses on typed arrays.
Comment 3 Brian Hackett (:bhackett) 2012-07-31 12:27:26 PDT
Created attachment 647638 [details] [diff] [review]
part 3 WIP (previous)

Allow purging analysis-temporary outside of a GC.  Currently this is only triggered via a new JS_GC_ZEAL setting, but jit-tests -mna pass when purging analysis state every 100 or 200 allocations.  Next step is to stick in some heuristics and see how effective / performant this is on code-intensive demos.
Comment 4 Brian Hackett (:bhackett) 2012-08-01 07:50:52 PDT
Created attachment 647951 [details] [diff] [review]
part 3 WIP (previous)

Trigger purging when a compartment's analysis-temporary is above N bytes after compiling (specified with PURGE_TRIGGER_BYTES env var).  Also adds a bunch of spew.  Without the spew, if I set the cap at 150MB then things run fine, 100MB is a little choppy, 50MB very choppy.  150MB saves about 300MB over peak memory usage, but I think we should be able to do better.

Going through the spew I get for 100MB (will attach in a minute), 78% of the compilations being done are recompilations without any intervening GC.  If this is right, these are almost all going to be due to changing type information, which shouldn't be happening for this typed array heavy code if part 2 is doing its job.  So something still needs to be fixed, will investigate.

If these recompilations are fixed, things should be in much better shape, with much less need for reanalysis as well and less purging.  The number of analysis bytes after a purge is never above 30MB, so it might be possible to get a 50MB cap to work.

Also of note is that 87% of the compilations being done are not in the first chunk of a script, and 49% of compilations are at chunk 20 or higher.  So a lot of time is being spent in scripts which are very large, and chunked compilation will be important for IM on pages like this.  It also seems like we only compile a few chunks out of these large scripts, e.g. 13 chunks out of a script with more than 600, and chunking things at the bytecode level will allow incremental analysis and compilation throughout the pipeline.
Comment 5 Brian Hackett (:bhackett) 2012-08-01 07:55:14 PDT
Created attachment 647953 [details]
analysis, compilation, purging details for a 100MB cap

Spew when running the demo with a 100MB cap.
Comment 6 Brian Hackett (:bhackett) 2012-08-01 15:25:45 PDT
Created attachment 648108 [details] [diff] [review]
part 3 WIP (previous)

Fix various issues that were triggering recompilations for no good reason, some due to this patch series and some outstanding.  This wipes out nearly all of the recompilations, except for those triggered when inlining calls in hot scripts and after a GC.

vs. the previous patch, with a 100MB cap I get 10218 compilations instead of 30329, and 24 purges vs. 65.  The purges are still a little choppy --- they take about 50ms on my machine, about 30ms of which is freeing up the analysis arena in the foreground (moving this to the GC background thread should wipe this out, but the background thread needs a little refactoring first).

This still has profiling spew in it, will post a clean patch in a few minutes.
Comment 7 Brian Hackett (:bhackett) 2012-08-01 15:27:53 PDT
Created attachment 648110 [details] [diff] [review]
analysis, compilation, purging details for a 100MB cap

Update spew for the most recent patch.
Comment 8 Brian Hackett (:bhackett) 2012-08-01 16:17:38 PDT
Created attachment 648134 [details] [diff] [review]
part 3 WIP (previous)

Patch with profiling spew removed.  Vlad, if you want to check out how this behaves on the demo, go ahead.  Various issues:

- The demo does occasionally crash when accessing type objects, due to what looks like a dangling reference to the old type allocator.  Haven't had time to debug this yet.

- The default cap is 100MB, which can be changed with the PURGE_TRIGGER_BYTES env var or by hand editing js/src/methodjit/Compiler.cpp.  The final patch will make this a config option I think.

- Pauses from purging aren't logged anywhere, in the final patch they'll be logged to the console in the same way as GC timing info.
Comment 9 Andrew McCreight [:mccr8] 2012-08-01 16:20:25 PDT
(In reply to Brian Hackett (:bhackett) from comment #8)
> - Pauses from purging aren't logged anywhere, in the final patch they'll be
> logged to the console in the same way as GC timing info.
It would also be nice to have telemetry for that, in a followup bug if you prefer.  If you have it at the point where it is being logged to the console, anybody would be able to hook up the telemetry.
Comment 10 Brian Hackett (:bhackett) 2012-08-06 18:47:50 PDT
Created attachment 649516 [details] [diff] [review]
part 3 (previous)
Comment 11 Brian Hackett (:bhackett) 2012-08-09 17:50:40 PDT
Created attachment 650738 [details] [diff] [review]
part 3 (previous)

Deeper restructuring of type information to greatly reduce the cost of analysis purges.  On the demo, reduces the typical cost of purges for me from 50ms to 5ms.  This is done by splitting a new analysisLifoAlloc off from typeLifoAlloc, where the former contains all script analysis data and is (nearly) totally cleared during analysis purges, and the latter contains retained type data about object properties etc. and is completely left alone during purges.

FWIW, this patch series also seems to improve my startup times for the demo too, though I haven't measured anything.  Potentially the savings from fewer page faults and swapping more than makes up for the cost of doing reanalysis.
Comment 12 Luke Wagner [:luke] 2012-08-10 10:53:25 PDT
Comment on attachment 647272 [details] [diff] [review]
part 1 (090fd1585e34)

Review of attachment 647272 [details] [diff] [review]:
-----------------------------------------------------------------

I was giving a quick skim before digging in, and I had one high level request:

This patch classifies TypeSets as being for stack locations or not and partitions TypeSet members into operating on stack TypeSets, !stack TypeSets or either.  This would suggest adding a StackTypeSet and NonStackTypeSet (HeapTypeSet would be better (match HeapPtr), as long as there isn't some excluded middle) and giving them the appropriate operations.  Then, if you had some TypeSet *t, you could write t->asStack().getKnownTypeTag() (or perhaps t->asStackTypeSet()).

::: js/src/frontend/BytecodeEmitter.cpp
@@ +2642,5 @@
>          return false;
> +
> +    /*
> +     * Always end the script with a JSOP_STOP. Some other parts of the codebase
> +     * depend on this opcode, e.g. js_InternalInterpret.

Good idea to put a comment here.  Could you also include FrameRegs::setToEndOfScript as a less esoteric example?
Comment 13 Brian Hackett (:bhackett) 2012-08-10 11:05:27 PDT
(In reply to Luke Wagner [:luke] from comment #12)
> Comment on attachment 647272 [details] [diff] [review]
> part 1 (090fd1585e34)
> 
> Review of attachment 647272 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I was giving a quick skim before digging in, and I had one high level
> request:
> 
> This patch classifies TypeSets as being for stack locations or not and
> partitions TypeSet members into operating on stack TypeSets, !stack TypeSets
> or either.  This would suggest adding a StackTypeSet and NonStackTypeSet
> (HeapTypeSet would be better (match HeapPtr), as long as there isn't some
> excluded middle) and giving them the appropriate operations.  Then, if you
> had some TypeSet *t, you could write t->asStack().getKnownTypeTag() (or
> perhaps t->asStackTypeSet()).

This would be fine to do, but better as a followup I think.  part 3 changes this stuff again to a more operational typeset->constraintsPurged().
Comment 14 Luke Wagner [:luke] 2012-08-10 11:17:09 PDT
(In reply to Brian Hackett (:bhackett) from comment #13)
Follow-ups don't get done.  Let's introduce the code in a good state.  Part 3 can build on it if it adds something else type-y.
Comment 15 Brian Hackett (:bhackett) 2012-08-10 11:38:09 PDT
This is a pretty minor thing, and strongly asserted as is.  It will be easier for me to do as a followup than to change an early patch in the series and need to rebase everything else top of it.
Comment 16 Luke Wagner [:luke] 2012-08-10 12:50:39 PDT
Alright; so you'll include that patch in this bug?
Comment 17 Brian Hackett (:bhackett) 2012-08-10 12:59:07 PDT
Yeah, I'll have it ready later today.
Comment 18 Luke Wagner [:luke] 2012-08-10 13:37:25 PDT
Comment on attachment 647272 [details] [diff] [review]
part 1 (090fd1585e34)

Review of attachment 647272 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsinfer.cpp
@@ +1808,5 @@
> +    {}
> +
> +    void newType(JSContext *cx, TypeSet *source, Type type)
> +    {
> +        AddPendingRecompile(cx, script, NULL);

TypeConstraintFreeze has a 'typedAdded' flag; why doesn't this constraint need it?

@@ +3975,5 @@
> +    if (!script->hasFreezeConstraints) {
> +        /*
> +         * Freeze type sets for arguments, locals and monitored type sets. This
> +         * includes all type sets in the TypeScript except the script's return
> +         * value types.

This comment explains what the code below does (which is pretty easy to see); it would be useful if the comment explained why and the general scheme being used.

::: js/src/jsinfer.h
@@ +500,5 @@
>       */
>      bool needsBarrier(JSContext *cx);
>  
>      /* The type set is frozen if no barrier is needed. */
>      bool propertyNeedsBarrier(JSContext *cx, jsid id);

this can be removed (or, better, replaced with something that describes the query)
Comment 19 Luke Wagner [:luke] 2012-08-10 14:07:33 PDT
Comment on attachment 647285 [details] [diff] [review]
part 2 (previous)

Review of attachment 647285 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good

::: js/src/jsinfer.cpp
@@ +986,5 @@
> +            return Type::UnknownType();
> +
> +        const Shape *shape = obj->nativeLookup(cx, id);
> +        if (shape && shape->hasDefaultGetter() && shape->hasSlot()) {
> +            Value v = obj->getSlot(shape->slot());

use HasDataProperty instead

@@ +1039,5 @@
> +     * of defined global variables or from the prototype of the object. This
> +     * reduces the need to monitor cold code as it first executes.
> +     */
> +    if (!assign) {
> +        JSObject *singleton = object->singleton ? object->singleton : object->proto;

So this code will eagerly merge in the type of any property on the proto chain into the type set of the get, ignoring the own properties of the object?  Could you document that we are speculating that it is not common to shadow a prototype property with an own property of a different type?

@@ +1040,5 @@
> +     * reduces the need to monitor cold code as it first executes.
> +     */
> +    if (!assign) {
> +        JSObject *singleton = object->singleton ? object->singleton : object->proto;
> +        if (singleton) {

if (JSObject *singleton = ...) {
Comment 20 Luke Wagner [:luke] 2012-08-10 14:17:46 PDT
LifoAlloc seems to be a twitch wrt performance.  Your change makes sense; but can you split off those changes into a separate bug blocking this that I'll review quickly and we can land independently?
Comment 21 Brian Hackett (:bhackett) 2012-08-10 14:47:38 PDT
Created attachment 651005 [details] [diff] [review]
part 4 (previous)

Subclass StackTypeSet and HeapTypeSet from TypeSet.  All methods for adding a constraint on a type set (other than TypeSet::add(), which is only callable from jsinfer.cpp) are in one of these subclasses.  Nicely, uses of the type sets partition such that no dynamic downcast methods are needed, and only a couple short methods are on both subclasses (e.g. addSubset).
Comment 22 Brian Hackett (:bhackett) 2012-08-10 14:51:56 PDT
(In reply to Luke Wagner [:luke] from comment #18)
> Comment on attachment 647272 [details] [diff] [review]
> part 1 (090fd1585e34)
> 
> Review of attachment 647272 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jsinfer.cpp
> @@ +1808,5 @@
> > +    {}
> > +
> > +    void newType(JSContext *cx, TypeSet *source, Type type)
> > +    {
> > +        AddPendingRecompile(cx, script, NULL);
> 
> TypeConstraintFreeze has a 'typedAdded' flag; why doesn't this constraint
> need it?

This ties into the comment above TypeConstraintFreezeStack.  TypeConstraintFreeze only applies to one compilation, so once a type is added that invalidates the compilation, no more invalidation is needed.  (This is not actually necessary anymore given pierron's changes to how invalidation works, I'll remove it).  TypeConstraintFreezeStack applies to all compilations of the script; any time type information within the script changes, all its jitcode is discarded.  This means the constraint can fire multiple times, and shouldn't disable itself after the first time it invalidates any code.
Comment 23 Luke Wagner [:luke] 2012-08-10 15:21:05 PDT
Comment on attachment 651005 [details] [diff] [review]
part 4 (previous)

Review of attachment 651005 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: js/src/jsanalyze.cpp
@@ +291,3 @@
>              } else {
>                  JS_ASSERT(nTypeSets == UINT16_MAX);
> +                code->observedTypes = (types::StackTypeSet *) &typeArray[nTypeSets - 1];

Can you add a TypeSet::asStackTypeSet() { JS_ASSERT(isStack()); return this; } restricted assert and use it here to avoid the cast-hammer?

::: js/src/jsinfer.h
@@ +449,5 @@
> +{
> +  public:
> +
> +    /*
> +     * Make an type set with the specified debugging name, not embedded in

a

::: js/src/jsinferinlines.h
@@ +595,4 @@
>  TypeScript::SlotTypes(JSScript *script, unsigned slot)
>  {
>      JS_ASSERT(slot < js::analyze::TotalSlots(script));
> +    return (StackTypeSet *) script->types->typeArray() + script->nTypeSets + slot;

Ditto above for all these above C-style casts.
Comment 24 Luke Wagner [:luke] 2012-08-13 14:03:04 PDT
Comment on attachment 650738 [details] [diff] [review]
part 3 (previous)

Review of attachment 650738 [details] [diff] [review]:
-----------------------------------------------------------------

This approach is pretty nice; I like that you set it up so that you can basically just clear one LifoAlloc.

To land this, it seems like we need some actual measurements here.  If you browser around random JS-heavy sites, how often are there purges and how big are the pauses?  For the demo and for BananaBread, how much time is spent in purges and how is compile-time affected (as you pointed out and as the optimizations in this patch suggest, it was impacted negatively)?

::: js/src/jsanalyze.h
@@ +610,5 @@
> +        Lifetime *lifetimes;
> +    };
> +
> +    LifetimeVariable *variables() {
> +        return (LifetimeVariable *) this;

Could this be variable(size_t i) { JS_ASSERT(i < ... some length); return ((LifetimeVariable *)this)[i]; } ?

::: js/src/jsinfer.cpp
@@ +1910,5 @@
> +    }
> +#endif
> +
> +    if (!jitsOK)
> +        cx->compartment->types.addPendingRecompile(cx, script, pc);

IIUC, the whole point of this new blob is to avoid recompiling when the given PC is already barriered/monitored and the caller indicates that is all they want.  The structure (and lack of comments) make this a bit confusing.  It seems like this could be simplified by having functions:

  bool AlreadyMonitored(script, pc)
  bool AlreadyBarriered(script, pc)

and then have the callers write:

  if (!AlreadyMonitored(script, pc))
    AddPendingRecompile(script, pc)

Stepping back further, though, the real problem seems to be that Bytecode::monitoredTypes/typeBarriers are incorrect when they are recreated after being purged.  Could they not be set correctly upon recreation?

@@ +5067,5 @@
> +    jsbytecode *ignorePC = pc + GetBytecodeLength(pc);
> +    if (*ignorePC == JSOP_INT8 && GET_INT8(ignorePC) == -1) {
> +        ignorePC += JSOP_INT8_LENGTH;
> +        if (*ignorePC != JSOP_BITAND)
> +            ignorePC = NULL;

This is embarrassingly special-cased.  If the performance was intolerable before, it seems like there is a real problem that needs to be investigated.

@@ +6070,5 @@
> +    AutoEnterTypeInference enter(cx);
> +
> +    uint64_t start = PRMJ_Now();
> +
> +    /* Make sure all JITScripts have a ScriptLiveness attached for GC compat. */

"GC compat" is rather mysterious...

::: js/src/jsinfer.h
@@ +425,3 @@
>  
> +    bool purged() { return !!(flags & TYPE_FLAG_PURGED); }
> +    void setPurged() { flags |= TYPE_FLAG_PURGED | TYPE_FLAG_CONSTRAINTS_PURGED; }

I think isStack()/TYPE_FLAG_STACK was a better name (and matches StackTypeSet better in the next patch).

::: js/src/jsinferinlines.h
@@ +1357,5 @@
> +             */
> +            unsigned count = getPropertyCount();
> +            for (unsigned i = 0; i < count; i++) {
> +                Property *prop = getProperty(i);
> +                if (prop)

if (Property *prop = ...)

::: js/src/methodjit/Compiler.cpp
@@ +337,5 @@
>                  okay = false;
>                  break;
>              }
>  
> +            types::TypeSet::WatchObjectStateChange(cx, fun->getType(cx));

Could you comment as to why this is necessary?

@@ +485,5 @@
>  mjit::Compiler::performCompilation()
>  {
> +    char buf[100];
> +    snprintf(buf, sizeof(buf), "performCompilation:%d,%d,%d",
> +             (int) isConstructing, (int) cx->compartment->needsBarrier(), (int) chunkIndex);

rm

@@ +1003,5 @@
>          status = cc.compile();
>      }
>  
> +    /* Check if we have hit the threshold for purging analysis data. */
> +    cx->compartment->types.maybePurgeAnalysis(cx);

It seems strange to put this in the jit.  Is there a more natural place to put it?  E.g., the !compartment->activeInference branch of ~AutoEnterTypeInference comes to mind.

::: js/src/methodjit/Compiler.h
@@ +430,5 @@
>      js::Vector<DoublePatch, 16, CompilerAllocPolicy> doubleList;
>      js::Vector<JSObject*, 0, CompilerAllocPolicy> rootedTemplates;
>      js::Vector<RegExpShared*, 0, CompilerAllocPolicy> rootedRegExps;
> +    js::Vector<uint32_t> monitoredBytecodes;
> +    js::Vector<uint32_t> typeBarrierBytecodes;

Can you write a comment explaining what these lists mean and what should/shouldn't be added?

::: js/src/methodjit/MethodJIT.h
@@ +679,5 @@
>      js::mjit::CallSite *callSites() const;
>      JSObject **rootedTemplates() const;
>      RegExpShared **rootedRegExps() const;
> +    uint32_t *monitoredBytecodes() const;
> +    uint32_t *typeBarrierBytecodes() const;

Needs a comment explaining how it is used and what all needs to be included.

::: js/src/vm/Stack.cpp
@@ +649,5 @@
> +
> +    if (analysis && !analysis->ranLifetimes())
> +        analysis = NULL;
> +
> +    JS_ASSERT(analysis || liveness);

This whole blob could use a comment as, on first glance, what is happening is non-obvious.  It's rather weird that we have these two representations of liveness; one might naively hope that was only 1.  Is the reason that most analyzed scripts will live and die without getting purged and therefore the ScriptLiveness::create call would usually be a waste?

Also, will ScriptLiveness go away with the recent bug filed about flushing everything to the stack so GC doesn't need liveness info (can't find bug # atm)?

If 'yes' to the above: can we name it something gross like PurgedScriptGCLiveness/JITScript::purgedScriptGCLiveness to make this all more clear and discourage other uses from popping up?

Also, how much memory, after purging (in say the demo) is spent on ScriptLiveness?
Comment 25 Brian Hackett (:bhackett) 2012-08-13 22:18:19 PDT
(In reply to Luke Wagner [:luke] from comment #24)
> To land this, it seems like we need some actual measurements here.  If you
> browser around random JS-heavy sites, how often are there purges and how big
> are the pauses?  For the demo and for BananaBread, how much time is spent in
> purges and how is compile-time affected (as you pointed out and as the
> optimizations in this patch suggest, it was impacted negatively)?

I haven't seen any website besides this demo which uses enough analysis memory to need purging at the default 100MB point.  Per comment 11 purges are about 5ms each, and I get about 10 when running the demo.  I instrumented the demo with times in analysis phases / compilation and get this before and after the entire patch series.  Numbers are total ms.

Before:

analyzeBytecode: 3320
analyzeLifetimes: 254
analyzeSSA: 2936
analyzeTypes: 5979
performCompilation: 19608

After:

analyzeBytecode: 2903
analyzeLifetimes: 587
analyzeSSA: 2308
analyzeTypes: 2109
performCompilation: 3550

I suspect much of the difference in performCompilation is due to the changes made to reduce recompilations, but the amount of time spent in other phases has improved significantly.  Even with the patch series applied, performance is better with a 100MB purge threshold than a much higher one.

> IIUC, the whole point of this new blob is to avoid recompiling when the
> given PC is already barriered/monitored and the caller indicates that is all
> they want.  The structure (and lack of comments) make this a bit confusing. 
> It seems like this could be simplified by having functions:
> 
>   bool AlreadyMonitored(script, pc)
>   bool AlreadyBarriered(script, pc)
> 
> and then have the callers write:
> 
>   if (!AlreadyMonitored(script, pc))
>     AddPendingRecompile(script, pc)
> 
> Stepping back further, though, the real problem seems to be that
> Bytecode::monitoredTypes/typeBarriers are incorrect when they are recreated
> after being purged.  Could they not be set correctly upon recreation?

I wanted to do this, but this approach doesn't work out with the analysis design.  The opcodes which have type barriers and/or are monitored depends on type information, and are determined during inference.type analysis.  If we reanalyze the types in a script, we would need to know that the opcodes being monitored/barriered are the same as in the previous incarnation of the type information, which we can't know because type information may have changed since the last time the script was analyzed.

> @@ +5067,5 @@
> > +    jsbytecode *ignorePC = pc + GetBytecodeLength(pc);
> > +    if (*ignorePC == JSOP_INT8 && GET_INT8(ignorePC) == -1) {
> > +        ignorePC += JSOP_INT8_LENGTH;
> > +        if (*ignorePC != JSOP_BITAND)
> > +            ignorePC = NULL;
> 
> This is embarrassingly special-cased.  If the performance was intolerable
> before, it seems like there is a real problem that needs to be investigated.

Eh, we pattern match bytecode in plenty of places, and for stupider reasons (s/RegExp.exec/RegExp.test/).  Performance isn't intolerable without this, but this cuts a fair number of recompilations and it doesn't make sense to leave the gain on the table.  int32 coercions generated by autotranslators are easy enough to pattern match.

> It seems strange to put this in the jit.  Is there a more natural place to
> put it?  E.g., the !compartment->activeInference branch of
> ~AutoEnterTypeInference comes to mind.

We do this a bunch, every time type information is queried or changed, and checking whether to purge requires counting the number of bytes in the allocator.

> Also, will ScriptLiveness go away with the recent bug filed about flushing
> everything to the stack so GC doesn't need liveness info (can't find bug #
> atm)?

Yeah, this would go away with bug 781657.  The complexity cost here is the only really significant one, we still don't create that much jitcode and liveness info is much smaller.
Comment 26 Luke Wagner [:luke] 2012-08-14 09:43:52 PDT
(In reply to Brian Hackett (:bhackett) from comment #25)
> We do this a bunch, every time type information is queried or changed, and
> checking whether to purge requires counting the number of bytes in the
> allocator.

There is no other good pinch point for catching analysis-purge in analysis?  If not, file a bug to make sure we add maybePurge to IM in the right place.
Comment 27 Luke Wagner [:luke] 2012-08-14 09:47:18 PDT
Comment on attachment 650738 [details] [diff] [review]
part 3 (previous)

There were several other changes requested.
Comment 28 Brian Hackett (:bhackett) 2012-08-15 05:13:51 PDT
I see a bunch of nits and requests for comments.  I can incorporate those into the finished push without needing a review, right?  Can you explain specifically what you are looking for?
Comment 29 Luke Wagner [:luke] 2012-08-15 10:30:17 PDT
Comment on attachment 650738 [details] [diff] [review]
part 3 (previous)

Oops, I thought there was something else outstanding.
Comment 30 Brian Hackett (:bhackett) 2012-08-22 05:32:39 PDT
Landing this with purging disabled (see TypeCompartment::maybePurgeAnalysis) until bug 781657 lands.  See bug 781657 comment 5.

https://hg.mozilla.org/integration/mozilla-inbound/rev/d62929fa4325
Comment 31 Ed Morley [:emorley] 2012-08-22 06:55:08 PDT
Backed out for talos crashes:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4c0af103f0f8
Comment 32 Brian Hackett (:bhackett) 2012-08-22 11:30:59 PDT
Ack, didn't show up with a -t none try run.  TypeSet::addTypesToConstraint didn't cope correctly with the case when the constraint triggered type changes on the original type set.

https://hg.mozilla.org/integration/mozilla-inbound/rev/bf07c6253287
Comment 33 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-08-22 14:59:03 PDT
This caused a bunch of Talos metrics to jump in various directions, fwiw (SunSpider 0.9.1 regression, V8 version 7 improvement, V8 regression, all on Mac):
https://groups.google.com/forum/?fromgroups=#!topic/mozilla.dev.tree-management/xFIgt-1tTzU
Comment 34 Ed Morley [:emorley] 2012-08-23 03:49:13 PDT
https://hg.mozilla.org/mozilla-central/rev/bf07c6253287
Comment 35 Paul Wright 2012-08-23 08:26:56 PDT
FWIW, it appears this may have caused some significant regressions on AWFY as well.
Comment 36 Brian Hackett (:bhackett) 2012-08-23 08:52:26 PDT
Link?  I didn't see any difference on benchmarks when testing this.
Comment 37 Paul Wright 2012-08-23 09:52:07 PDT
I'm looking at the 64-bit MBP numbers on AWFY.  Something between revs f27460e427cc and 9209d9af04d4 caused a regression on V8.  There are also some clear differences in the "Assorted Tests".  I cannot say, for sure, that any of these regressions were caused by this bug (so I apologize if this bug isn't the culprit), but something in the range certainly changed things.  I do not have the ability to bisect.
Comment 38 David Anderson [:dvander] 2012-08-23 20:08:51 PDT
I bisected the AWFY regression to this patch. I can reproduce it in the shell, the easiest way is running v8/run-richards.js which regresses from a score of 9900-10000 to 9500-9800. The tree is closed right now but this should be backed out once it's reopened.
Comment 39 Vladimir Vukicevic [:vlad] [:vladv] 2012-08-23 20:26:47 PDT
We probably should, but please make sure that we consider & balance that regression against the huge memory usage win (which we don't really track/report on!).
Comment 40 Brian Hackett (:bhackett) 2012-08-23 20:30:34 PDT
I tested this a lot before pushing and didn't see any SS or v8 regression on my machine.  v8-richards is very cache sensitive as it allocates a small number of objects and accesses them over and over again.  Please don't back things out just because you see a shell regression, we've been down this road a million times before.
Comment 41 David Anderson [:dvander] 2012-08-23 21:57:18 PDT
(In reply to Brian Hackett (:bhackett) from comment #40)
> I tested this a lot before pushing and didn't see any SS or v8 regression on
> my machine.  v8-richards is very cache sensitive as it allocates a small
> number of objects and accesses them over and over again.  Please don't back
> things out just because you see a shell regression, we've been down this
> road a million times before.

richards was an example of how to reproduce a regression. In fact, this regressed the following benchmarks on AWFY:

 SunSpider:
  crypto-aes
  bitops-bits-in-byte
  3d-raytrace

 V8:
  deltablue
  raytrace
  richards

 Kraken:
  beat-detection
  crypto-ccm
  crypto-sha256-iterative

We need to back this out before it's too late - we've been down *that* road before very recently (see: 7/24 e-mail from Dave).
Comment 42 David Anderson [:dvander] 2012-08-23 22:00:56 PDT
I'll test this in-browser shortly, I guess.
Comment 43 Brian Hackett (:bhackett) 2012-08-24 19:26:34 PDT
Created attachment 655255 [details] [diff] [review]
partial revert of part 1 (1bc0e4eac6e5)

This partially reverts the changes in part 1 so that freeze constraints generated during compilation remain the same as before; implicit freezing of all stack contents only occurs after a compartment's analysis data has been purged.  If anyone can reproduce an in-browser regression it would be good to check if this patch fixes it.

Putting this up as I was working on this for bug 785358 before realizing the problem was something else.  Would rather not commit this if no one can confirm it helps anything, as it adds back some complexity I'd been hoping to remove.

Note You need to log in before you can comment on or make changes to this bug.