Closed Bug 1966773 Opened 10 months ago Closed 1 month ago

Pretenure thisObject allocations for scripted JSOp::New

Categories

(Core :: JavaScript: GC, task, P2)

task

Tracking

()

RESOLVED FIXED
149 Branch
Tracking Status
firefox149 --- fixed

People

(Reporter: iain, Assigned: iain)

References

(Blocks 1 open bug)

Details

(Whiteboard: [js-perf-next])

Attachments

(4 files)

In bug 1894012, Jon added logging to help find cases where we might want to add allocation sites. I enabled the logging for a run of JS3 to track how often we had missing allocation sites that would have been marked as LongLived if they existed.

Looking at the results and bucketing by op, a clear pattern emerges:

      1  RecreateLexi
      2  AsyncResolve
      2  GlobalOrEval
      4  Generator   
      5  NewContent  
      8  FunWithProto
      8  ObjWithProto
     19  SpreadSuperC
     24  PushLexicalE
     47  AsyncAwait  
     97  Call        
    169  SuperCall   
   1284  New         

The most frequent op is JSOp::New, making up 3/4 of all candidates on its own. The next most frequent is JSOp::SuperCall, which is effectively the same thing. Together, they make up 87% of missing-but-pretenurable alloc sites.

We currently don't use alloc sites when allocating the this object for scripted constructors. When we do so in baseline, it's part of a complicated dance where we generate CacheIR that looks like this:

... (guards omitted)
        MetaScriptedThisShape         thisShapeOffset 40
        CallScriptedFunction          calleeId 1, argcId 0, flags (format 1, isConstructing, isSameRealm), argcFixed 0

CallScriptedFunction is generic, and is used by all sorts of calls (including non-constructor calls). If the isConstructing flag is set, then it will call CreateThisFromIC. MetaScriptedThisShape doesn't generate any code in baseline ICs, but tells the CacheIR transpiler what shape to use. IIRC, this was all part of an attempt to avoid bloating the set of fields for CallScriptedFunction.

We could probably simplify this by splitting out a CallKnownScriptedConstructor op with a shape field, and eliminating MetaScriptedThisShape. Then we could put an alloc site field in that op as well, and use it in BaselineCacheIRCompiler::createThis.

Based on the log I've attached, it looks like typescript has the most missing pretenuring. If I run a local comparison report on that benchmark, I do see minor GC as the most important bucket, with a 40x difference in time. WSL also seems to have quite a bit, and a comparison report there again shows that minor GC is the top bucket with a 20x difference.

We probably won't close the entire gap by pretenuring, but that's still enough evidence to mark this as perf-next.

js-perf-next: Add alloc sites to scripted constructor calls in baseline, and use this information to pretenure them. Bug 1898270 is a recent example of adding alloc sites for lambdas. As described above, the tricky part with this patch is that the thisObject is allocated by this code, which is inside shared code used by a variety of different calls. (In particular, both CallScriptedFunction and CallBoundScriptedFunction can reach this point.) Right now, information about the this object is stored in a preceding MetaScriptedThisShape op that generates no code, but is used in the transpiler to allocate this.

I think there are two reasonable options for making this work. One, as described above, is to add a new CallKnownScriptedConstructor op that would replace the MetaScriptedThisShape/CallScriptedFunction pair. (This might also need a CallKnownBoundScriptedConstructor op to handle the bound function case?) The second option, easier but more of a hack, would be to add an alloc site to MetaScriptedThisShape (and maybe rename it), rewrite CacheIRCompiler::emitMetaScriptedThis to store an allocSite offset internally in the CacheIRCompiler, and then use that allocsite offset if it exists in BaselineCacheIRCompiler::createThis to load the alloc site out of the stub fields and use it.

Priority: -- → P2
Whiteboard: [js-perf-next]

(Additional evidence of value: Jon's SP3 numbers here show that JSOp::New was the second most common op for missed pretenuring, second only to JSOp::Lambda (which has already been fixed).

Severity: -- → N/A

I previously started work on this in bug 1740277 but got stuck.

See Also: → 1740277

Looking at that patch, I think the two options I described in comment 2 should both avoid the problem of doing a VM call too early.

I based the pretenuring parts of this on bug 1898270.

The name of the CacheIR op is no longer entirely accurate. I rename it in the next patch in preparation for adding more information to allow allocating this in jitcode.

Assignee: nobody → iireland
Status: NEW → ASSIGNED

These patches show a significant improvement on the splay benchmark in the shell due to greatly reduced minor GC time.

macOS: +8% splay, +17% splay latency
Linux: +5% splay, +11% splay latency

Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → 149 Branch
Regressions: 2014624
Regressions: 2015100
Regressions: 2015133
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: