Closed
Bug 599214
Opened 15 years ago
Closed 15 years ago
JM: Object equality tests are slow
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: billm, Assigned: billm)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(2 files, 4 obsolete files)
|
902 bytes,
application/javascript
|
Details | |
|
46.77 KB,
patch
|
n.nethercote
:
review+
|
Details | Diff | Splinter Review |
When we do |o1 == o2|, where both vars are objects, we take a stub call in JM. However, the stub just does a pointer comparison. The microbenchmark that is attached exhibits this behavior. V8 runs it about 2x faster than JM. This is important because there are a lot of these tests in the Kraken ai-astar benchmark.
| Assignee | ||
Comment 1•15 years ago
|
||
This patch adds object equality as an out-of-line fast path. It speeds up the benchmark by about 95%.
The patch has two problems. First, it doesn't check obj->getClass()->ext.equality, which is required for correctness. Second, it would be much better to do this as a PIC. I'll keep working on this.
Assignee: general → wmccloskey
Status: NEW → ASSIGNED
Updated•15 years ago
|
Assignee: wmccloskey → general
Status: ASSIGNED → NEW
Updated•15 years ago
|
Assignee: general → wmccloskey
Nice. JSC does not appear to fast-path this case, and spends 70% of its time in cti_op_eq.
v8 inlines the integer case. If this fails, it also generates an OOL path at runtime that effectively inlines the entire stub call (see CompareStub::Generate in ia32/code-stubs-ia32.cc). This approach also seems reasonable, and would definitely be faster than calling our StubEqualityOp.
That might be more appealing if a full-blown PIC is too complicated.
Comment 3•15 years ago
|
||
Is the speed-up TM-only? Presumably in a TM+JM run all the time is spent in trace-generated code, since astar is dominated by a single small loop.
| Assignee | ||
Comment 4•15 years ago
|
||
(In reply to comment #3)
> Is the speed-up TM-only? Presumably in a TM+JM run all the time is spent in
> trace-generated code, since astar is dominated by a single small loop.
The speedup is JM-only (I think that's what you meant). The trace tuning heuristics make the "wrong" decision here and say that the loop will not trace well. This is because of all the branchiness in the main loop (see the neighbors method). Rather than fudge this somehow, it's easier just to fix the performance problem in JM. With this patch, the two JITs are close to even.
Comment 5•15 years ago
|
||
This seems like basically a perfect case for tracing, and there are ways to make it faster still on trace (see bug 579285), so it seems like longer term we want to make the JIT selection pick up these sorts of hot loops.
Comment 6•15 years ago
|
||
(In reply to comment #4)
>
> The speedup is JM-only (I think that's what you meant).
Yes, that's what I meant, sorry for any confusion.
> The trace tuning
> heuristics make the "wrong" decision here and say that the loop will not trace
> well. This is because of all the branchiness in the main loop (see the
> neighbors method).
The benchmark is totally dominated by the performance of the loop in findGraphNode(). If you have one or more calls to a function containing a very tight loop, and those calls are within an outer loop that is very branchy, I would have thought that the inner loops would definitely be traced and the outer loop maybe not...
Comment 7•15 years ago
|
||
Bill, what's the neighbors method you mention?
| Assignee | ||
Comment 8•15 years ago
|
||
(In reply to comment #7)
> Bill, what's the neighbors method you mention?
I was looking at the main loop, where neighbors is called. I missed the inner loop, because it was getting blacklisted due to a low initial iteration count. I'm not entirely convinced that the inner loop should be traced, but this bug probably isn't the right place to discuss that. I guess I should have posted this patch independent of trace tuning, since it's really just an improvement to the method JIT.
Regarding the trace tuning, I'll post an updated patch for that in Bug 580468 soon. Then we can talk about what to do for A*.
| Assignee | ||
Comment 9•15 years ago
|
||
Attachment #478847 -
Flags: review?(dvander)
| Assignee | ||
Comment 10•15 years ago
|
||
Comment on attachment 478847 [details] [diff] [review]
new patch
I fixed the equality case. I didn't have time for the PIC. I think it makes sense to get this in and then do additional work later, if it seems necessary.
Comment 11•15 years ago
|
||
I just tested this benchmark, extracted from astar, with the new patch:
function objEq(a, b) {
for (var i = 0; i < 4000; ++i) {
if (a == b)
break;
}
}
var t0 = new Date;
var a = {}, b = {};
for (var i = 0; i < 10000; ++i)
objEq(a, b);
var t1 = new Date;
print('objEq ' + (t1-t0));
Before the patch, the methodjit is 7x slower than the tracer. After the patch, it is still 2x slower. If I change |==| to |===|, then the methodjit is as fast as the tracer was on ==. (The tracer gets much slower when I do that--should I file a bug on that?) I didn't look at this patch in detail, but it seems there is still room to make it faster.
Updated•15 years ago
|
Blocks: JaegerSpeed
| Assignee | ||
Comment 12•15 years ago
|
||
(In reply to comment #11)
> I didn't look at this patch in detail, but it seems there
> is still room to make it faster.
Okay, maybe it's time to do the PIC. I'll look into this.
| Assignee | ||
Comment 13•15 years ago
|
||
Okay, this version uses an inline cache. It always generates the int path. The object and string paths are generated as needed. To avoid issues with TRACE MICs, I made the decision that we should only use inline caching for forward jumps. Otherwise, things would get really complex. And anyway, I doubt there are many loops whose condition is an object or string comparison. (Loops of the form |while (x!=null)| don't count, I think, because null is its own type.)
The patch also adds a guard for the equality extension to the tracer, since it was missing.
The performance of the IC version on A* doesn't really improve at all over the non-IC version. I guess this isn't too surprising, since we really only avoid one extra branch for the string case. Nevertheless, the code is cleaner, I think. And it will be a lot easier to add additional cases, if needed.
I don't see too big a difference between the tracer and the method JIT on Dave's microbenchmark: 113ms for tracer versus 138ms for the method JIT. Something very strange is going on, though. The equality extension guard has no performance impact on the tracer. However, if I take it out of the method JIT, performance on the microbenchmark goes down to 113ms---exactly parallel with the tracer. I tried generating exactly the same assembly ops in the two, but it has no effect. Weird.
Attachment #478152 -
Attachment is obsolete: true
Attachment #478847 -
Attachment is obsolete: true
Attachment #481656 -
Flags: review?(dvander)
Attachment #478847 -
Flags: review?(dvander)
Comment 14•15 years ago
|
||
What about doing this for strict object equality too? V8's earley-boyer calls stubs::StrictEq about 1 million times. It's called 500.000 times with both lhs and rhs objects.
Comment on attachment 481656 [details] [diff] [review]
inline cache version
Sorry for the late review.
As discussed IRL, we should move the HAS_EQUALITY flag set into XML object creation, to avoid adding overhead to the object allocation path.
>+void
>+mjit::Compiler::passEqualityICAddress(EqualityGenInfo &eic)
>+{
>+ eic.addrLabel = stubcc.masm.moveWithPatch(ImmPtr(NULL), Registers::ArgReg1);
>+}
This only gets used in one place, I'd just inline it.
Also a general comment, I've been removing "mic" and "pic" in favor of just "ic" as the catch-all canonical name for inline caches.
>-mjit::Compiler::jumpAndTrace(Jump j, jsbytecode *target, Jump *slowOne, Jump *slowTwo)
>+mjit::Compiler::jumpAndTrace(Jump j, jsbytecode *target, Jump *slow)
Yay.
>+void JS_FASTCALL
>+ic::Equality(VMFrame &f, ic::EqualityICInfo *eic)
>+{
>+ EqualityCompiler cc(f, *eic);
>+ if (!cc.update())
>+ THROW();
>+}
It doesn't look like the IC ever disables itself. We probably want to do that.
Patch looks great otherwise, should probably review the final changes though.
Attachment #481656 -
Flags: review?(dvander) → review+
Comment on attachment 481656 [details] [diff] [review]
inline cache version
Sorry for the late review.
As discussed IRL, we should move the HAS_EQUALITY flag set into XML object creation, to avoid adding overhead to the object allocation path.
>+void
>+mjit::Compiler::passEqualityICAddress(EqualityGenInfo &eic)
>+{
>+ eic.addrLabel = stubcc.masm.moveWithPatch(ImmPtr(NULL), Registers::ArgReg1);
>+}
This only gets used in one place, I'd just inline it.
Also a general comment, I've been removing "mic" and "pic" in favor of just "ic" as the catch-all canonical name for inline caches.
>-mjit::Compiler::jumpAndTrace(Jump j, jsbytecode *target, Jump *slowOne, Jump *slowTwo)
>+mjit::Compiler::jumpAndTrace(Jump j, jsbytecode *target, Jump *slow)
Yay.
>+void JS_FASTCALL
>+ic::Equality(VMFrame &f, ic::EqualityICInfo *eic)
>+{
>+ EqualityCompiler cc(f, *eic);
>+ if (!cc.update())
>+ THROW();
>+}
It doesn't look like the IC ever disables itself. We probably want to do that.
Patch looks great otherwise, should probably review the final changes though.
| Assignee | ||
Comment 17•15 years ago
|
||
This resolves the equality extension issue in the way we discussed. It's slightly tricky ensuring that the bit gets set in the right places, but I think I got them all.
I also added the ability to disable the IC.
Attachment #481656 -
Attachment is obsolete: true
Attachment #482730 -
Flags: review?(dvander)
Updated•15 years ago
|
Attachment #482730 -
Flags: review?(dvander) → review+
| Assignee | ||
Comment 18•15 years ago
|
||
Attachment #482730 -
Attachment is obsolete: true
Attachment #484111 -
Flags: review?(nnethercote)
Comment 19•15 years ago
|
||
Comment on attachment 484111 [details] [diff] [review]
updated patch
>@@ -112,36 +113,37 @@ static const nanojit::AccSet ACCSET_CX
> static const nanojit::AccSet ACCSET_EOS = (1 << 4);
> static const nanojit::AccSet ACCSET_ALLOC = (1 << 5);
> static const nanojit::AccSet ACCSET_FRAMEREGS = (1 << 6);
> static const nanojit::AccSet ACCSET_STACKFRAME = (1 << 7);
> static const nanojit::AccSet ACCSET_RUNTIME = (1 << 8);
>
> // Nb: JSObject::{lastProp,map,flags} don't have an AccSet because they are never accessed on trace
Nit: remove ",flags" from that comment.
The rest of the tracer-related changes look fine. I didn't look at the methodjit-related changes, I assume previous reviews covered them.
Attachment #484111 -
Flags: review?(nnethercote) → review+
Comment 20•15 years ago
|
||
Please remember to set the whiteboard and post the changeset to the bug(s) when pushed. Thanks.
| Assignee | ||
Comment 21•15 years ago
|
||
Whiteboard: fixed-in-tracemonkey
Comment 22•15 years ago
|
||
Alas, this broke ARM's trace tests. I'm going to back this change out for the time being so we can work on PICs (backout link to follow).
Failing trace tests:
-m basic/bug524826-2.js
-m basic/bug524826.js
-m basic/equalInt.js
-m basic/strings.js
-m basic/truthies.js
-m sunspider/check-date-format-tofte.js
-m sunspider/check-string-tagcloud.js
-m v8-v5/check-crypto.js
-m v8-v5/check-deltablue.js
This patch blocks tracer tuning, which we want to land tomorrow. It might be better to #ifdef POLYIC the new code. The ARM ICs are in-flux anyway.
Comment 24•15 years ago
|
||
(In reply to comment #23)
> It might be
> better to #ifdef POLYIC the new code. The ARM ICs are in-flux anyway.
I don't think I understand what you're saying we should ifdef out -- this causes a regression on ARM with a clean tip and methodjit enabled. I can investigate it, but we shouldn't leave the tree broken on a platform, right?
Comment 25•15 years ago
|
||
Oh, maybe you meant to disable JS_MONOIC on ARM for the time being -- I'll try that and see if everything is copacetic.
Comment 26•15 years ago
|
||
Okay, disabled MONOIC for ARM: http://hg.mozilla.org/tracemonkey/rev/b7c0fd2370c6
See bug 605415 for ARM debugging progress.
Comment 27•15 years ago
|
||
I'm confused about how |ic.jumpToStub.executableAddress()| can be safely used unconditionally in |EqualityCompiler::linkForIC| when it is conditionally initialized in |Compiler::jsop_equality_int_string| and, subsequently, |Compiler::finishThisUp|. I might figure it out as I probe more, but having a recommended comment to throw into my commit would be appreciated in any case!
| Assignee | ||
Comment 28•15 years ago
|
||
Sorry about this, Chris. I ran the tests you mentioned through valgrind and it comes up clean, so it'll be hard for me to debug it.
I'm pretty sure the jumpToStub thing is safe, although admittedly not very clear. Here's the rationale. Inside jsop_equality_int_string, the only way jumpToStub won't be set is if both operands are known integers. In this case, there's no chance we'll ever enter the IC, so linkForIC is never called.
Comment 29•15 years ago
|
||
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•