Closed Bug 606631 Opened 14 years ago Closed 13 years ago

Simple constructors should be inlined

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: billm, Assigned: billm)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

The v8-earley-boyer benchmark makes a lot of calls to this constructor:

  function sc_Pair(car, cdr) {
      this.car = car;
      this.cdr = cdr;
  }

If you make a trivial change (adding an increment operation), the score for the V8 engine on this benchmark drops from 21149 to 15530. Adding additional increments decreases the score by only a little bit more.

It turns out that V8 pattern matches on constructors that just perform simple property assignments to this. They inline the whole thing. We should do something like this.
This will also help access-binary-trees. V8 becomes 1.5/1.6x slower if I add a "var x" statement to TreeNode (had to modify the test to do more iterations to get meaningful numbers)
Blocks: JaegerSpeed
It would be nice to go the more general route and inline whole functions, not just simple pattern matched constructors.  The trickiest problem with inlining is reconstructing the stack if the function throws, but I don't think this pattern matching avoids that problem.  Object.prototype could have a scripted setter on 'car' which does who-knows-what.

It would also be nice to know how much of this gain for V8 comes from avoiding the call, and how much comes from pregenerating the shape and avoiding an addprop IC.  In that case, pattern matching like this would ensure no other code can observe the new value before the end of the constructor.
I guess I misread the code. They're not actually inlining the constructor at the call site. They still set up and tear down a stack frame. But from that point on, they execute specialized constructor code.

So that means that all the win comes from addprop optimizations.
This could be done using a similar approach to bug 606477.  That handles {a:0,b:1} by pregenerating the shape and then doing direct slot writes.  This would also need a single shape check on the prototypes --- object initializers do not invoke prototype setters, but assignments in constructors do.

Marking blocked on bug 606477 because this would need some of the machinery there, like letting the Compiler and JITScripts pin objects/shapes against GC.
Depends on: 606477
(In reply to comment #4)
> Marking blocked on bug 606477 because this would need some of the machinery
> there, like letting the Compiler and JITScripts pin objects/shapes against GC.

I was just going to generate a stub that would get invalidated by a GC. I don't think any pinning is needed.
Yeah, using a PIC the script doesn't need to pin anything.
No longer depends on: 606477
Attached patch patch attempt (obsolete) — Splinter Review
This patch adds a new kind of call IC stub for these kinds of simple constructors. It identifies them by bytecode inspection. The stub that gets generated inlines everything, including popping an object off the freelist.

To select the right object size, it starts with the number of fields that get set in the constructor. Before actually compiling a stub, it manually constructs an object and returns it. The first one of these objects is saved. When it finally decides to compile, it checks if this object has had any fields added in the interim. The compiled code makes sure to reserve enough space for these additional slots. This handles a case in v8-splay.

It's hard to microbenchmark this, because performance depends so heavily on the GC. Even with the constructor fast path, we're still a lot slower than V8 because we have a larger heap (to avoid GC) and so we take a lot more cache misses.

The patch improves access-binary-trees by about 2ms, with no other SS changes. On v8-bench, our score goes from 3114 to 3290. EarleyBoyer goes from 5203 to 6869. Splay goes from 2985 to 3598. No other changes. I was hoping for improvements in RayTrace, but we're not able to inline its constructor because it uses the weird this.initialize.apply trick. (All of these are measured methodjit-only).

I was hoping for a bigger win, but I think that the cache miss problem is really killing us. We take 4.25 times as many cache misses as V8. We also touch 1.6 times as much memory, maybe because our values are fatter. I think we also have more fields in our objects.
Assignee: general → wmccloskey
Status: NEW → ASSIGNED
Attachment #488327 - Flags: review?(lw)
This might make a difference for bug 590379 -- realtime raytracer.  That code create massive amounts of vector objects via a constructor:

var V3 = function(x, y, z) {
    this.x = x;
    this.y = y;
    this.z = z;
}
> it uses the weird this.initialize.apply trick. (All of these are measured

Luke is fixing that -- or has fixed already. Or is the problem that you can't see through the layering and consider the initialize code to be "part of the ctor"?

/be
(In reply to comment #9)
> > it uses the weird this.initialize.apply trick. (All of these are measured
> 
> Luke is fixing that -- or has fixed already. Or is the problem that you can't
> see through the layering and consider the initialize code to be "part of the
> ctor"?

Right, it's just a simple syntactic check. There may be something we could do about this, but I'm a little skeptical. I guess Luke would know best though.
I'll have to look at how your patch works more closely, but I think it would be difficult to achieve the "create with pre-computed shape and assign into pre-computed slots" combo with an apply in the middle.

OOC Bill, can you observe v8 doing anything significantly better than us (now that the apply-ic has landed) for this constructor/apply pattern? 

It seems like two optimizations we've previously discussed could come together to speed up constructors in v8-raytrace.  One is bug 547327, to allocate right-sized objects.  The second, which we discussed once at lunch, is to recognize simple linear sequences of property assignments:
 o.x = ...
 o.y = ...
 o.z = ...
where the ... is sufficiently side-effect free (constants, vars, params) and generate a single composite ic that would guard on o's incoming shape, take o directly to its final shape, and then store directly into slots.  Actually, this sounds a lot like bhackett's INITPROP patch (bug 606477)...
Comment on attachment 488327 [details] [diff] [review]
patch attempt

Mostly high level questions/requests:

>     /* Minimum size for dynamically allocated slots. */
>-    static const uint32 SLOT_CAPACITY_MIN = 8;
>+    static const uint32 SLOT_CAPACITY_MIN = 2;

IIUC, this patch only does the num-slots-predicting trick for the simple constructors that you can pattern-match so other constructors still create an object with SLOT_CAPACITY_MIN.  So my question is: assuming SLOT_CAPACITY_MIN was set to 8 for a reason, isn't it a bit premature to set it back to 2?

Next, regarding isSimpleConstructor, it seems like the pattern could be generalized a bit:
 - match additional constants: JSOP_INT32, JSOP_UINT24, JSOP_VOID, JSOP_DOUBLE, JSOP_STRING, JSOP_THIS, JSOP_TRUE, JSOP_FALSE
 - match 'return this' as the last statement

Also, since the constructor is effectively being inlined, would it be hard to tryUpdateSimpleConstructor even when ic.argc != fun->nargs (in CallCompiler::update).

Regarding manualNew and tryUpdateSimpleConstructor: there is a lot of delicate object/property manipulation that makes me uncomfortable.  What I would like to see is some functions factored out of methodjit/ and put in jsobj.{h,cpp} that have a simple explicit interface (say, GetOwnConstructorDataProperty and CreateThisWithGivenConstantProperties) and try as hard as possible to minimize duplication of object/property logic.  An example (albeit a simpler one) is how HasNativeMethod was factored out.  This could be done with a preliminary patch that could be reviewed carefully by brendan or jorendorff.  I only make such a fuss since this is so delicate.  For example, I think there might be a bug atm in tryUpdateSimpleConstructor with using callee->getSlot(protoShape->slot) without checking whether protoShape is a data property (but I could be wrong).

>+/* This function cancels a call from one of the functions above. */
>+void InlineReturn(VMFrame &f);

Should this be JS_FASTCALL ?
Attachment #488327 - Flags: review?(lw)
Thanks, Luke. I'm posting an updated version in two patches. This first patch adds code to jsinterp.cpp and jsobj.cpp to support the inlining. It mostly deals with property access and such. Besides going over this patch, Brendan, could you say something about when slotSpan is well-defined for an object? I'm now using it in the other patch and I'm not sure it's correct.
Attachment #488327 - Attachment is obsolete: true
Attachment #490615 - Flags: review?(brendan)
Attachment #488327 - Flags: review?(brendan)
Attached patch new patchSplinter Review
This patch actually has most of the constructor inlining changes. I think I fixed everything you asked for.

InlineReturn isn't actually a stub, so I don't think it needs to be FASTCALL.
Attachment #490617 - Flags: review?(lw)
Talking this over with Luke, this probably isn't the right time to do this optimization. It adds some pretty tricky, specialized code and I don't think the performance benefit is worth it.

Once we get a better GC, I think a patch like this may become more important. As our cache footprint go down, reducing instruction count, as this patch does, will become more significant.
It's also worth looking at type inference for this (not for 4.0, of course).  Things which needed to be complicated inline caches to handle things like prototype setters can instead generate their code directly, having statically guarded against setters in the prototype.  The script marks its dependencies on this lack of setters, and should the prototype dynamically get a setter in the future, the script will be recompiled.  Examples are jsop_getelem_dense and jsop_setelem_dense in the JM branch (only ICs that have been supplemented with type info), which generate faster code and do so in a more straightforward manner than the ICs.
Brian already fixed this in TI.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: