Open Bug 834614 Opened 11 years ago Updated 2 years ago

Optimize object representation of old objects when using a moving GCs.

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect

Tracking

()

People

(Reporter: nbp, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

Attached file Minimal test case.
The attached test case produces 2 identical objects with different shapes.  Instrumenting Bug 813425's attachment 706268 [details] [diff] [review] to highlight the state of the bytecode shape show that we are switching from monomorphic when a flow in the load function, to polymorphic when b flow into the load function.

The strangest thing is that both path are exercising the same code, except that the first run is done in the interpreter and the second one is done under JM / Ion (when setting the constant to 20000).

This might cause us to produce code to handle polymorphic cases when only a monomorphic case should be enough.
This is a TI based optimization where the size class of 'new' objects is adjusted based not just on how many properties they are assigned in the constructor, but also on any properties added after the constructor finishes.  If, after the constructor finishes, we try to grow the slots vector on the 'new' object, we'll reshape the shape for 'new' objects that used the same constructor so that they have more fixed slots.  See the '!obj->hasLazyType() && !oldCount && obj->type()->newScript' branch in JSObject::growSlots.

var ROM = function() {
    this.mapperName = 0;
};
ROM.prototype.load = function() {
    this.header = 0;
    this.vromCount = 50;
    for (var i = 0; i < this.vromCount; i++) ;
};

This optimization is pretty important for both benchmarks and real web pages, in large part because growing slots requires a malloc which is extremely expensive.  It's possible that this will not be as important when we can allocate slot vectors from the nursery, but I think we'll still want it --- nursery slot vectors will still be more expensive to deal with than inline slots, and property accesses on inline slots are more efficient even if a shape guard is required.

We could keep the optimization *and* allow purely monomorphic accesses when the object gets predictable properties shortly after being constructed, once we have a generational GC.  How this could work:

- Allocate the first object in the nursery with fewer fixed slots than required.
- When we try to grow the object's slots, check if the object is in the nursery.
- If so, do a nursery collection, but when promoting the object to the major heap give it a new size class and the same shape as the newScript's updated shape.

This might need some fiddling (it won't work if the first constructed object's slot vector grows multiple times, for example), but if nursery collections are cheap enough then paying that cost one time for the constructor might be worth getting purely monomorphic objects out of it.
Oh, another, simpler approach: if only a few objects are being constructed but they are each accessed many times, we might want to skip the optimization and leave the objects with monomorphic shape.  The slot accesses would still be out of line but the cost of dealing with the dynamic slot vectors would be minimal.  This could be done without needing a moving GC, by adding a counter to TypeObject::newScript and only reshaping the 'new' objects once enough have had their slot vectors grown.  This has risks though in that if we *do* end up reshaping 'new' objects then there will be more objects of the original shape floating around and caches could be more polymorphic as a result.
A third approach is to beef up the definite properties analysis so that it can cope with properties added after the object is constructed.  This would not be so robust so would need to be done in concert with the existing TI optimization, but would see that e.g. load() is called on the ROM objects after construction and before the object escapes to the heap.  Any properties added to the 'this' object during load() are thus definite properties of the object and we'd get the correct slot vector size off the bat, with no reshaping later and (as a potentially large added bonus, see bug 832329 for an example) would not need shape checks at all for accessing those properties.

Which constructor in pdf.js is causing this problem?  crypto.js?  (bug 807391 seems related)  Also, it seems like these shape guard bugs all keep hitting JSNES regressions so we could be having similar problems there.  Do we know which objects the shape guards are important for there?
(In reply to Brian Hackett (:bhackett) from comment #3)
> Which constructor in pdf.js is causing this problem?  crypto.js?  (bug
> 807391 seems related)  Also, it seems like these shape guard bugs all keep
> hitting JSNES regressions so we could be having similar problems there.  Do
> we know which objects the shape guards are important for there?

This test case is a minimized version of JS NES, but based on the monoparent optim that I made for PdfJS, and the poor result that I got on all other benchmarks, I won't be surprised if these are identical issues.  I noticed the same thing on a micro benchmark, where JM forgot the original shape after its IC get purged.

I was thinking of adding a flag on the old shape to flag that any object flowing in with an old shape should be reshaped too.  AFAIU, the current optimization is only working with only one allocation site, but if slots are in identical orders (which is easier to check with shape equality), then we should probably reshape all TypeObject flowing at the same spot, as we don't want to handle non-existing polymorphism.

I don't know if this is a duplicate of bug 807391, but with Bug 813425, this encodes a polymorphic use case which produce a get/set property IC, and thus provide bad performances on Crypto.  Handling multiple properties would be part of another Bug.  Here I just want to address that we have non-existing polymorphism.

I will just add some instrumentation to see if crypto is just a special case of this Bug or if we have a few unknown objects flowing in.

I will open a meta bug for optimizing property accesses, and I will attach my micro benchmark to it.
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #4)
> I don't know if this is a duplicate of bug 807391, but with Bug 813425, this
> encodes a polymorphic use case which produce a get/set property IC, and thus
> provide bad performances on Crypto.  Handling multiple properties would be
> part of another Bug.  Here I just want to address that we have non-existing
> polymorphism.

It is not a dup of 807391. Bug 807391 is not about a two objects that have exactly the same layout.
The following code emulates the problem.

function Test() {
   this.s = 1;
}

function test() {
  var a, b = new Test();
  b.t = 1;

  ...

  for (var i=0; i < 10000000; i++) {
     if (i % 10240 == 0)
       a = new Test();
     foo(a, b);
  }
}

function foo(a, b) { // v8-crypto:431
    a.s++;
    a.t = b.t
    ...
}

In this example variable "a" is only important. And the problem is visible in function "foo". It starts as shape with {s:1} in the interpreter. After the first call to foo, "a.t" gets set. Now the shape is {s:1, t:1}. We compile in JM and see only 1 shape. Eventually we compile with IM and looking at the caches in JM we decide it is a monomorphic call. Just as we enter IM, i%10240 == 0 and a becomes {s:1} and will fail on the shapeguard in the "foo" call we will invalidate...

Solution:
1) Disable looking at JM caches for hints on a per getprop/setprop bases instead of per function bases. The function in v8-crypto is much longer and the body after is a lot bigger and everything in the function can get inlined, because that is truly monomorhpic. It is only the first "a.s++" that ain't.

2) Don't remove MJIT caches and add the entry in the cache... This sounds really painful and eventuelly baseline jit will keep it's information. So this problem should get resolved by baseline jit or at least make it easier to fix.

3) Don't invalidate, but bail the first times this happens. Here we need a metric to decide when we would rather invalidate instead of bailing

4) Instead of disabling MJIT monomorhpic path when first time we see a different shape, we could also create an opcode that does "MShapeGuard + LoadSlot" and that has an OOL path to do the different shape. That would make sense because in MJIT we haven't seen the second shape for 10000 useCount. So it is a path that doesn't happen often...
Blocks: 835306
Checking with the shape::matches function, I obtain the following results where monoparent indicates the number of locations which have changed from a monomorphic case to a monoparent case (so they are still sharing they parent, this is weird as match check that the order is identical), and also the number of locations which have changed from a monoparent to a polymorphic case (which is likely to be caused by changes of of the number of fixed size slots)

Richards: monoparent: 6 polymorph: 6
DeltaBlue: monoparent: 14 polymorph: 14
Crypto: monoparent: 0 polymorph: 0
RayTrace: monoparent: 69 polymorph: 12
EarleyBoyer: monoparent: 0 polymorph: 0
NavierStokes: monoparent: 2 polymorph: 2
PdfJS: monoparent: 4 polymorph: 7
Mandreel: monoparent: 0 polymorph: 0
Gameboy: monoparent: 422 polymorph: 895 <-- Whoa!
CodeLoad: monoparent: 0 polymorph: 0
Box2D: monoparent: 20 polymorph: 7
Score monoparent: 0 polymorph: 0

Sunspider is not affected at all by this reshape case, I haven't checked with kraken, but I don't expect it to appear here.
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #6)
> Checking with the shape::matches function, […]

Doing the same on JSNES, it appear that the second warm-up cause everything to be recompiled with polymorphic accesses instead of monomorphic one.  At this point I will just mentioned that the harness of the JSNES benchmark is badly made as it re-execute init functions which are doing this much trouble by producing polymorphic cases where none are present.
I checked the results obtained if I guard the latest monomorphic shape added, checking the slot names to determine if the property is still monomorphic or not.  It appears that doing so regress Richard & DeltaBlue by a factor of 5, and increase Gameboy by 40%.  Richards and DeltaBlue regression probably means that the old & new objects are still flowing into functions causing recompilations of the shape guards.

I think the best solution for this bug is to avoid heuristics and to use the moving GC to do a re-shape of old objects, such as we can discard old shapes as soon as we get rid of the code which handle these.

In the mean time, I will focus on the optimization of polymorphic shapes, which will be needed and will provide a temporary solution for the reshape issue.

Other solutions are:
- Pushing the counter used to decide if a reshape is needed.
  The problem is that this will increase the number of objects having a non-optimized shape, and if we later do a reshape of objects we should minimize the number of object with a non-optimal shape.

- Using some static analysis ahead of time to estimate the shape-size.
  The problem here would be the complexity of the solution against something which would be optimized over time.  With the future of JavaScript we should expect huge pieces of JavaScript, adding such pre-processing on the entire content of JavaScript files goes against the idea of parsing lazily any JavaScript, as such optimization need to look at every function to be more efficient.  The reshape technique should be good enough, and I don't think we need such extra complexity in our parser.

- Disable hint on a per-function basis.
  Doing so will cause us to use GetProp and SetProp ICs.  These caches they are 4-5 times slower than a shape guard IC (no idea why for the moment) and the fallback is at least 20 times slower than the shape guard fallback.

- Don't remove MJIT cache entry.
  This is more-less what Bug 813425 is doing, except that this emulates what the baseline will provide. The current implementation og Bug 813425 only have one Shape entry.

  Knowing that we can have identical slots with different object representations, I think I will modify Bug 813425 to account for more than a one shape cache such as we can handle reshape of objects.  So this will look a bit similar to the input we should expect from the baseline.

- Bail then invalidate.
  This is again the same question: define « often » (aka Bug 825268).  Currently this is too complex in Ion and this will only solve crypto's issue.  In addition, Bug 813425 alreay remove the eager invalidation caused by shape guards, and it only invalidate if the shape is not a parent of the current shape.

- GuardShape + LoadSlot instruction with OOL for others.
  This sounds like a GetProp IC.
Summary: One unconditional construction path can produce different shapes. → Optimize object representation of old objects when using a moving GCs.
This also bites v8-richards with baseline. See TaskControlBlock:

function TaskControlBlock(link, id, priority, queue, task) {
  this.link = link;
  this.id = id;
  this.priority = priority;
  this.queue = queue;
  this.task = task;
  if (queue == null) {
    this.state = STATE_SUSPENDED;
  } else {
    this.state = STATE_SUSPENDED_RUNNABLE;
  }
}

There are TaskControlBlock objects with two shapes, so the caches are not monomorphic. We can inline polymorphic GETPROPs with a small number of shapes directly, but having a single shape will be faster as we can hoist the shape guard.

Working around this by adding "this.state = 0;" before the "if (queue == null)" statement gives:

m-c before: 14300
m-c after : 16100

baseline before: 14000
baseline after : 16700-17000
Blocks: 807188
Assignee: nicolas.b.pierron → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: