Closed
Bug 561506
Opened 15 years ago
Closed 14 years ago
JM: optimize adding properties with PIC
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: dmandelin, Assigned: bhackett1024)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(1 file, 7 obsolete files)
21.39 KB,
patch
|
dmandelin
:
review+
|
Details | Diff | Splinter Review |
Currently, JM adds properties a bit slower than the interpreter, because it has to use the slow path for that (which means doing what the interpreter does, plus a bit of overhead). It should instead do pretty much what the tracer does for that case.
I've started this out, and reached the point of a WIP that can run a simple microbenchmark and show a speedup there. It doesn't look too hard, but it does require a cross-platform method for calling functions with 3 (or maybe more) arguments, which we don't have yet (we just have our standard stub calls and 1-argument calls).
Reporter | ||
Comment 1•15 years ago
|
||
This looks a little tricky to do well, so I slowed down to take some measurements. First, I measured how long it takes to do a property add in various systems:
JM 63 ns
JM+TM 25 ns
SM 50 ns
JSC 10 ns
V8 47 ns
JM is slower than SM because right now we do a pure slow stub call, after trying the PIC, so it's basically SM+overhead. Note that JSC is super-fast somehow. I think they are doing roughly what we do.
Second, I measured how many slow stub calls we do. I think most of the ones left are for adds, although I don't know for sure. In SunSpider, there are 125k left, basically all in access-binary trees. In V8, there are about 3M left, 2M in earley-boyer, 1M in raytrace, .7M in deltablue, and .2M in splay.
So, if we can make this as fast in JM as on trace, by doing basically the same thing, then we have left about 5 ms of win on SS and 120 ms on V8. I think this is fairly important to get done at some point, but it doesn't have to be high priority, as nothing depends on it and the perf win is modest and predictable. Also, it will be easier and more effective if bug 558451 is done first.
Depends on: 558451
Reporter | ||
Comment 2•15 years ago
|
||
Now some stats on which paths are taken through the fast-case add. I did this by instrumenting the fast case in jsops.cpp. I measured two dimensions: bump slot vs. realloc slots, and extend scope vs. do AddProperty.
slot allocation scope update
program bump realloc extend addprop total
access-binary-trees 126213 0 126213 0 126213
v8-deltablue 537997 109287 647284 0 647284
v8-earley-boyer 2126231 0 2126231 0 2126231
v8-raytrace 1154537 9005 1163542 0 1163542
v8-splay 60350 24471 84821 0 84821
If we just do the common case, we are only leaving 5ms on V8 of win behind, and the fast case should be simpler and faster.
Reporter | ||
Comment 3•15 years ago
|
||
This version fully inlines the common path. It still has some bugs.
Attachment #441226 -
Attachment is obsolete: true
Comment 4•15 years ago
|
||
Great work. We need to get property trees per thread (bug 511591), as adding a property involves taking a lock, which probably accounts for much of our slowdown compared to JSC (confirmation needed).
/be
Comment 5•15 years ago
|
||
Do we need to take a lock in the predictable-shape-evolution case? It seems like, for ST objects, no shared state needs to mutate.
Comment 6•15 years ago
|
||
The lock is for adding to the property tree. We tolerate races to lookup and find nothing, then add, so we tolerate dups, but adding a new tree node itself requires a lock or you can race off the end of a kids-chunk list.
So the GC lock is taking for shortish critical sections when adding, but not when looking. If the property tree node already exists, no locking required. So maybe the locking cost isn't the problem -- hard to say without more data.
/be
Comment 7•15 years ago
|
||
(In reply to comment #6)
> The lock is for adding to the property tree. We tolerate races to lookup and
> find nothing, then add, so we tolerate dups, but adding a new tree node itself
> requires a lock or you can race off the end of a kids-chunk list.
And add two kids chunks list at the same next link, leaking one.
> So the GC lock is tak[en] for shortish critical sections when adding, but not
> when looking. If the property tree node already exists, no locking required. So
> maybe the locking cost isn't the problem -- hard to say without more data.
Dave, can you profile and find out why add costs us, in detail? Thanks,
/be
Reporter | ||
Comment 8•15 years ago
|
||
(In reply to comment #7)
> Dave, can you profile and find out why add costs us, in detail? Thanks,
My stats are for basic shell builds, so locking doesn't apply. Shark says that of the total time within js_AddProperty, 40% of it is for js_IdIsIndex (via js_AddProperty -> JSScope::extend -> JSScope::updateFlags). Otherwise, the time seems to be distributed across the active parts of the function.
There do seem to be a lot of opportunities for upfront specialization for the hot paths. My WIP 2 still has bugs, so it may be slower when it is done, but currently it is showing a time of about 4ns per add.
Another relevant issue here is that js_GetMutableScope can't be practically inlined into JIT code, so I do the slow path for the case where the base object has a shared mutable scope. Of course, we could still get some speedup by calling out to a simplified builtin, but I think it would be nicer to do after bug 558451.
Reporter | ||
Comment 9•15 years ago
|
||
OK, this is it. It passes shell tests. My laptop is no good for perf testing, which I want to do before landing.
Attachment #441675 -
Attachment is obsolete: true
Reporter | ||
Comment 10•15 years ago
|
||
I'm still getting a few test failures on the v8-v4 benchmarks.
Reporter | ||
Comment 11•15 years ago
|
||
Forgot to update the shape in the previous version. Strangely, that version actually passed all of our existing shell tests.
Attachment #441950 -
Attachment is obsolete: true
Reporter | ||
Comment 12•15 years ago
|
||
Attachment #442460 -
Attachment is obsolete: true
Reporter | ||
Comment 13•15 years ago
|
||
After further analysis, I think I'm going to put this on hold for now. Roughly, it looks like this could be done the easy way for a puny <50ms win on V8, or the hard way for a win of up to 150ms. 50ms is hardly big enough to care about, and 150ms for a lot of effort doesn't seem worth it right now. And it appears that there is essentially no effect on SunSpider no matter how we do this.
I'll document what I learned for when we get back to this. In the interpreter, a property add takes about 50 ns:
do prop cache stuff 50 ns
In the tracer, it's about 25 ns, which breaks down something like this:
actually do property add 7 ns
function call overhead 8 ns
call js_IdIsIndex 10 ns
So we can take 3 steps to speed things up:
1. Do it the tracer way instead of the interpreter way. This is relatively easy; it just means compiling two guards and a call to js_AddProperty.
One thing to note is that the tracer guards on JSRuntime::protoHazardShape, but this seems to be kind of unfortunate. In v8-earley-boyer (in a test with PICs), protoHazardShape changes after only 2000 adds or so (out of 2M), invalidating the PIC. It's probably better to check the shapes up the prototype chain instead.
2. Don't call js_IdIsIndex. This call can be pushed up to PIC generation time; basically we can just not generate a fast path if it returns true.
3. Inline js_AddProperty for non-empty, non-table scopes. The case for table scopes can't be inlined, but that case is rare (nonexistent in the benchmarks). Empty scopes are common (1M of the 2M adds in earley-boyer). It might help if we started objects with a mutable scope if they are created for a constructor that does sets (or we otherwise know statically at object creation point that they will be mutated). Then that case could be inlined as well. Even inlining the easy case is a bunch of work--it requires calls to lock/unlock functions and/or js_AllocSlots in the slow cases.
Assignee: dmandelin → general
Reporter | ||
Comment 14•14 years ago
|
||
We lose about 7ms on adding properties in binary-trees. This should become easier once the scope changes land.
Blocks: JaegerSpeed
Comment 15•14 years ago
|
||
This should be pretty easy now. Brian, you game?
/be
Assignee: general → bhackett1024
Assignee | ||
Comment 16•14 years ago
|
||
Sure. Word of warning, I'll be gone next week (last trip of the year). Assuming TM merges into JM today, this should be done and stable before then.
Assignee | ||
Comment 17•14 years ago
|
||
This patch gives me 4ms on SS, and 150ms on V8. Existing tests and trace-tests pass, though this adds another test which breaks, and depends on bug 592412 to figure out the proper fix.
Attachment #442467 -
Attachment is obsolete: true
Comment 18•14 years ago
|
||
I put up a patch for bug 592412, but it changes only the set-not-add (i.e., set of a pre-existing property) path. Is that relevant here?
/be
Assignee | ||
Comment 19•14 years ago
|
||
The trick here is that this PIC makes addprop look like setprop with some extra tests and assignments. Since setprop is currently broken for the joined functions, I don't know yet what the right fix for addprop is. I suspect that both addprop and setprop should slowpath direct assignments of 'function() {}', which will use your patch in bug 592412. But I'm still working through how the code works, e.g. I don't know why this is correct:
function f() {
var x, y;
for (var i = 0; i < 2; i++) {
x = function() { return 0; };
if (i == 0)
y = x;
}
print(x + " " + y);
}
f();
Assignee | ||
Comment 20•14 years ago
|
||
er, make that print(x == y), which is 'false' in TM tip (correctly so).
Is the setprop IC broken for joined functions?
Assignee | ||
Comment 22•14 years ago
|
||
Yes, and so is setprop for the interpreter and tracer.
Comment 23•14 years ago
|
||
Comment on attachment 470981 [details] [diff] [review]
addprop patch
>diff --git a/js/src/jsobj.h b/js/src/jsobj.h
>--- a/js/src/jsobj.h
>+++ b/js/src/jsobj.h
>@@ -325,18 +325,23 @@ struct JSObject {
> OWN_SHAPE = 0x80
> };
>
> enum {
> JS_NSLOTS_BITS = 24,
> JS_NSLOTS_LIMIT = JS_BIT(JS_NSLOTS_BITS)
> };
>
>- uint32 flags: 32-JS_NSLOTS_BITS, /* flags */
>- freeslot: JS_NSLOTS_BITS; /* next free slot in abstract slot space */
>+ union {
>+ struct {
>+ uint32 flags: 32-JS_NSLOTS_BITS, /* flags */
>+ freeslot: JS_NSLOTS_BITS; /* next free slot in abstract slot space */
>+ };
>+ uint32 flagsAndFreeslot;
>+ };
> uint32 objShape;
Nit: line up declarators if possible, at least flagsAndFreeslot and objShape.
Does the union and struct wrapping lose packing in 64 bits on any targets? If so, I say use big fields and bit twiddling. BUT:
I want to get rid of freeslot. I'm trying to file a bug on it (Minefield crashed and lost my work :-(). There, bug 592556. Want me to prioritize it and make this bug depend on it?
/be
dmandelin came up with a solution for the flags/freeslot bitfield in the shape merge. see JSObject::flagsOffset()
Assignee | ||
Comment 25•14 years ago
|
||
(In reply to comment #24)
> dmandelin came up with a solution for the flags/freeslot bitfield in the shape
> merge. see JSObject::flagsOffset()
Wow, that is something.
Getting rid of freeslot would be good, especially if it's not just an Imm32 write to update the flags/freeslot (i.e. two objects with the same shape have the same flags/freeslot. I know this doesn't hold for DELEGATE, just realized that is partly broken in this patch. Are there other JSObject::flags which could differ between two objects with the same shape?)
I'd be surprised if the flagsAndFreeSlot union changed the struct layout on a 64-bit target (union is between two 32-bit quantities), but I haven't tested.
Comment 26•14 years ago
|
||
JSObject::flagsOffset is clearly IS_LITTLE_ENDIAN-only -- please ifdef it with a #error in the #else clause.
DELEGATE and SYSTEM look to me (and I should know :-) like the flag bits that can change without any necessary shape change. However, SYSTEM may be saved because of disjoint system/non-system prototypes, and I hope SYSTEM is going away with compartments (mrbkap would know).
/be
Comment 27•14 years ago
|
||
(In reply to comment #19)
> The trick here is that this PIC makes addprop look like setprop with some extra
> tests and assignments. Since setprop is currently broken for the joined
> functions, I don't know yet what the right fix for addprop is.
I would separate cases since a program location is usually either adding or setting, rarely doing both depending on object -- and if so, the object shapes for the two cases will be disjoint sets.
> I suspect that
> both addprop and setprop should slowpath direct assignments of 'function() {}',
> which will use your patch in bug 592412. But I'm still working through how the
> code works, e.g. I don't know why this is correct:
>
> function f() {
> var x, y;
> for (var i = 0; i < 2; i++) {
> x = function() { return 0; };
> if (i == 0)
> y = x;
> }
> print(x + " " + y);
> }
> f();
The joined function object optimization applies only to setprop (not to assignment to an unqualified name, whether SETLOCAL or other) and initprop. See the JSOP_SETMETHOD and JSOP_INITMETHOD bytecodes. This tier of specialization is done by the compiler, but other tiers (funobj is a null closure with the right parent, obj->canHaveMethodBarrier()) await runtime.
/be
Assignee | ||
Comment 28•14 years ago
|
||
This fixes the addprop IC to work correctly on SETMETHOD, and disables the setprop IC for SETMETHOD on a non-method shape, so that when bug 592412 is fixed JM should be fixed too. The addprop IC is also fixed to always guard on the flagsAndFreeslot value, to catch delegate/system objects slipping in (this guard could be removed if making an object delegate/system were to change its shape).
I tested the flagsAndFreeslot thing and found no effect on the offsets/size of JSObject in either x86 or x64 on OS X. JSObject::flagsOffset isn't enough for this patch, as the addprop IC needs to be able to read the combined value for guarding on.
I also tested the effect of purging PICs at every GC (as this patch does). On V8 this affects the time by maybe 2-4ms, i.e. not worth bothering to improve right now.
Attachment #470981 -
Attachment is obsolete: true
Attachment #471201 -
Flags: review?(dmandelin)
Reporter | ||
Comment 29•14 years ago
|
||
Comment on attachment 471201 [details] [diff] [review]
updated patch
>diff --git a/js/src/jsobj.h b/js/src/jsobj.h
>--- a/js/src/jsobj.h
>+++ b/js/src/jsobj.h
>- uint32 flags: 32-JS_NSLOTS_BITS, /* flags */
>- freeslot: JS_NSLOTS_BITS; /* next free slot in abstract slot space */
>+ union {
>+ struct {
>+ uint32 flags: 32-JS_NSLOTS_BITS, /* flags */
>+ freeslot: JS_NSLOTS_BITS; /* next free slot in abstract slot space */
>+ };
>+ uint32 flagsAndFreeslot;
>+ };
> uint32 objShape; /* copy of lastProp->shape, or override if different */
This is clearly better than the search-for-the-bit address computation. Can you change that function to use offsetof(JSObject, flagsAndFreeslot) and see if the getelem stub generator is OK with that?
>diff --git a/js/src/methodjit/Compiler.cpp b/js/src/methodjit/Compiler.cpp
>--- a/js/src/methodjit/Compiler.cpp
>+++ b/js/src/methodjit/Compiler.cpp
>- if (pics[i].kind == ic::PICInfo::SET) {
>+ if (pics[i].kind == ic::PICInfo::SET ||
>+ pics[i].kind == ic::PICInfo::SETMETHOD) {
I think the repeated checks for a kind of SET or SETMETHOD would be better as a new "isSet" or "isSetKind" method on the pic class. But I don't want to delay this patch over such a minor issue, so change that only if it is quick.
Also, the generateStub function is getting too long (probably was too long before this patch anyway), but I don't want to delay over that either.
>diff --git a/js/src/methodjit/PolyIC.cpp b/js/src/methodjit/PolyIC.cpp
>--- a/js/src/methodjit/PolyIC.cpp
>+++ b/js/src/methodjit/PolyIC.cpp
>+ for (size_t ind = 0; ind < chainLength; ind++)
>+ masm.loadPtr(Address(pic.shapeReg, offsetof(JSObject, proto)), pic.shapeReg);
|i| is the SM canonical variable name here.
>+ if (pic.kind == ic::PICInfo::SETMETHOD) {
>+ /*
>+ * Guard that the value is equal to the shape's method.
>+ * We already know it is a function, so test the payload.
>+ * REVIEW: is this necessary? This mismatch does not seem to
>+ * occur anywhere in tests/trace-tests.
>+ */
>+ JS_ASSERT(shape->isMethod());
>+ JSObject *funobj = &shape->methodObject();
>+ if (pic.u.vr.isConstant) {
>+ JS_ASSERT(funobj == &Valueify(pic.u.vr.u.v).toObject());
>+ } else {
>+ Jump mismatchedFunction =
>+ masm.branchPtr(Assembler::NotEqual, pic.u.vr.u.s.data, ImmPtr(funobj));
>+ if (!slowExits.append(mismatchedFunction))
>+ return false;
>+ }
>+ }
I'm not entirely sure, but I think JSOP_SETMETHOD is generated only for statements of the form:
[object expr].m = [function expr]
where the function doesn't need to be cloned (at least not right away). So, I think isConstant will always be true, so all we really need is the assertion.
Might want to check with brendan on that.
>+ /* Check capacity. */
>+ masm.load32(Address(pic.shapeReg, -sizeof(Value)), pic.shapeReg);
>+ Jump overCapacity = masm.branch32(Assembler::LessThanOrEqual, pic.shapeReg,
>+ Imm32(shape->slot));
>+ if (!slowExits.append(overCapacity))
>+ return false;
I think here you need to use loadPrivate instead of load32: the capacity value is a private uint32.
>+ /* REVIEW: object locking not necessary for natives, right? */
>+ if (!obj->ensureClassReservedSlots(f.cx))
>+ return false;
Correct.
Attachment #471201 -
Flags: review?(dmandelin)
Comment 30•14 years ago
|
||
(In reply to comment #29)
> |i| is the SM canonical variable name here.
Thanks -- canonical names are important for grep'ability and consistency. Squash any other |ind| deviations, rs=me.
> >+ if (pic.kind == ic::PICInfo::SETMETHOD) {
> >+ /*
> >+ * Guard that the value is equal to the shape's method.
> >+ * We already know it is a function, so test the payload.
> >+ * REVIEW: is this necessary? This mismatch does not seem to
> >+ * occur anywhere in tests/trace-tests.
> >+ */
> >+ JS_ASSERT(shape->isMethod());
> >+ JSObject *funobj = &shape->methodObject();
> >+ if (pic.u.vr.isConstant) {
> >+ JS_ASSERT(funobj == &Valueify(pic.u.vr.u.v).toObject());
> >+ } else {
> >+ Jump mismatchedFunction =
> >+ masm.branchPtr(Assembler::NotEqual, pic.u.vr.u.s.data, ImmPtr(funobj));
> >+ if (!slowExits.append(mismatchedFunction))
> >+ return false;
> >+ }
> >+ }
>
> I'm not entirely sure, but I think JSOP_SETMETHOD is generated only for
> statements of the form:
>
> [object expr].m = [function expr]
>
> where the function doesn't need to be cloned (at least not right away).
Correct, although runtime checking is required since [object expr] can't be proven to be well-behaved (have no magic setter, e.g., that copies the joined function object).
> So, I think isConstant will always be true, so all we really need is the
> assertion.
The function is definitely constant -- it comes from JSOP_LAMBDA and it has not been unjoined (cloned), as proven by shape->isMethod().
/be
Assignee | ||
Comment 31•14 years ago
|
||
(In reply to comment #30)
> Correct, although runtime checking is required since [object expr] can't be
> proven to be well-behaved (have no magic setter, e.g., that copies the joined
> function object).
The shape guards should cover this, right? For addprop we know there's no such property in the object, and no scripted setter in the prototype chain.
Comment 32•14 years ago
|
||
(In reply to comment #31)
> (In reply to comment #30)
> > Correct, although runtime checking is required since [object expr] can't be
> > proven to be well-behaved (have no magic setter, e.g., that copies the joined
> > function object).
>
> The shape guards should cover this, right? For addprop we know there's no such
> property in the object, and no scripted setter in the prototype chain.
Yes, sorry I didn't confirm this. My comment was about setprop vs. initprop. We won't fill the relevant cache with the joined function object in the first place if the runtime checks fail, we'll clone and fill differently (different shapes). So shape guards suffice.
/be
Assignee | ||
Comment 33•14 years ago
|
||
This removes the check on the incoming object in SETMETHOD adds. It also adds a few more disables in update().
Attachment #471201 -
Attachment is obsolete: true
Attachment #471280 -
Flags: review?(dmandelin)
Assignee | ||
Comment 34•14 years ago
|
||
(In reply to comment #29)
> I think here you need to use loadPrivate instead of load32: the capacity value
> is a private uint32.
Private uint32s and pointers are represented differently on 32 and 64 bit systems. On 64 bits there is an extra shift for pointers, but not for uint32s. loadPrivate is doing the pointer version, so I think load32 is correct here (there needs to be a payloadOf though, will fix that).
Reporter | ||
Updated•14 years ago
|
Attachment #471280 -
Flags: review?(dmandelin) → review+
Reporter | ||
Comment 35•14 years ago
|
||
(In reply to comment #34)
> (In reply to comment #29)
> > I think here you need to use loadPrivate instead of load32: the capacity value
> > is a private uint32.
>
> Private uint32s and pointers are represented differently on 32 and 64 bit
> systems. On 64 bits there is an extra shift for pointers, but not for uint32s.
> loadPrivate is doing the pointer version, so I think load32 is correct here
> (there needs to be a payloadOf though, will fix that).
I'm confused about the question now, because the updated patch does use loadPrivate. In any case, the only important considerations are:
- That there is some abstraction layer involved in reading out the capacity in dslots[-1] to protect against changes, and
- The code for doing that in the addprop stubs uses the same idiom as the code that reads dslots[-1] in JSOP_GETFCSLOT in Compiler.cpp.
Anything with those properties is good.
Assignee | ||
Comment 36•14 years ago
|
||
(In reply to comment #35)
> I'm confused about the question now, because the updated patch does use
> loadPrivate. In any case, the only important considerations are:
>
> - That there is some abstraction layer involved in reading out the capacity in
> dslots[-1] to protect against changes, and
> - The code for doing that in the addprop stubs uses the same idiom as the code
> that reads dslots[-1] in JSOP_GETFCSLOT in Compiler.cpp.
>
> Anything with those properties is good.
Oops, that shouldn't be loadPrivate. I don't see anywhere GETFCSLOT loads from dslots[-1], nor anywhere else in JM. Capacity checks on dense arrays are accessing a private uint32, and those just use masm.payloadOf.
Reporter | ||
Comment 37•14 years ago
|
||
(In reply to comment #36)
> (In reply to comment #35)
> > I'm confused about the question now, because the updated patch does use
> > loadPrivate. In any case, the only important considerations are:
> >
> > - That there is some abstraction layer involved in reading out the capacity in
> > dslots[-1] to protect against changes, and
> > - The code for doing that in the addprop stubs uses the same idiom as the code
> > that reads dslots[-1] in JSOP_GETFCSLOT in Compiler.cpp.
> >
> > Anything with those properties is good.
>
> Oops, that shouldn't be loadPrivate. I don't see anywhere GETFCSLOT loads from
> dslots[-1], nor anywhere else in JM. Capacity checks on dense arrays are
> accessing a private uint32, and those just use masm.payloadOf.
You're right. GETFCSLOT is doing something else; the dense array check is the right thing to match.
Assignee | ||
Comment 38•14 years ago
|
||
Tried landing this but backed out as compilers that aren't gcc-4.2 or msvc don't like anonymous structures. So back to the old JSObject::flagsOffset (at least until bug 592556 lands). There were a couple assert failures too, so retrying on the try server.
Assignee | ||
Comment 39•14 years ago
|
||
Last tryserver push didn't crash but timed out on a few mochitests. Disabling the pic when overriding a prototype property fixes things locally so will tryserver that, but this should be revisited
(10ms or so on v8-splay).
Assignee | ||
Comment 40•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/fc7630d987f8
Fixed a stupid bug in the patch where objects with addProperty hooks weren't filtered out, and a trickier bug in existing code where resetting a PIC could use the wrong slow path address with insufficient stack syncing, leading to later incorrect behavior.
For JM on AWFY this looks like 7ms on SS and 240ms on V8, for JM+TM 3ms and 180ms.
Assignee | ||
Comment 41•14 years ago
|
||
Backing this out because of intermittent Talos failures (don't show up on tryserver) which I don't know how to debug. If someone wants to pick this up feel free, otherwise I'll look at it again next week.
Relanded as http://hg.mozilla.org/tracemonkey/rev/9a8b156c7396 and seems to have stuck. I suspect bug 593918 was the problem.
Whiteboard: fixed-in-tracemonkey
Comment 43•14 years ago
|
||
Yay! Now, JSObject::freeslot is *mine* (bug 592556).
/be
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•