599214 - JM: Object equality tests are slow

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Description

•

15 years ago

Attached file object equality benchmark — Details

When we do |o1 == o2|, where both vars are objects, we take a stub call in JM. However, the stub just does a pointer comparison. The microbenchmark that is attached exhibits this behavior. V8 runs it about 2x faster than JM. This is important because there are a lot of these tests in the Kraken ai-astar benchmark.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 1

•

15 years ago

Attached patch WIP patch (obsolete) — Details — Splinter Review

This patch adds object equality as an out-of-line fast path. It speeds up the benchmark by about 95%. The patch has two problems. First, it doesn't check obj->getClass()->ext.equality, which is required for correctness. Second, it would be much better to do this as a PIC. I'll keep working on this.

Assignee: general → wmccloskey

Status: NEW → ASSIGNED

Ryan VanderMeulen [:RyanVM]

Updated

•

15 years ago

Assignee: wmccloskey → general

Status: ASSIGNED → NEW

Ryan VanderMeulen [:RyanVM]

Updated

•

15 years ago

Assignee: general → wmccloskey

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 2

•

15 years ago

Nice. JSC does not appear to fast-path this case, and spends 70% of its time in cti_op_eq. v8 inlines the integer case. If this fails, it also generates an OOL path at runtime that effectively inlines the entire stub call (see CompareStub::Generate in ia32/code-stubs-ia32.cc). This approach also seems reasonable, and would definitely be faster than calling our StubEqualityOp. That might be more appealing if a full-blown PIC is too complicated.

Nicholas Nethercote [inactive]

Comment 3

•

15 years ago

Is the speed-up TM-only? Presumably in a TM+JM run all the time is spent in trace-generated code, since astar is dominated by a single small loop.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 4

•

15 years ago

(In reply to comment #3) > Is the speed-up TM-only? Presumably in a TM+JM run all the time is spent in > trace-generated code, since astar is dominated by a single small loop. The speedup is JM-only (I think that's what you meant). The trace tuning heuristics make the "wrong" decision here and say that the loop will not trace well. This is because of all the branchiness in the main loop (see the neighbors method). Rather than fudge this somehow, it's easier just to fix the performance problem in JM. With this patch, the two JITs are close to even.

Mike Shaver (:shaver emeritus)

Comment 5

•

15 years ago

This seems like basically a perfect case for tracing, and there are ways to make it faster still on trace (see bug 579285), so it seems like longer term we want to make the JIT selection pick up these sorts of hot loops.

Nicholas Nethercote [inactive]

Comment 6

•

15 years ago

(In reply to comment #4) > > The speedup is JM-only (I think that's what you meant). Yes, that's what I meant, sorry for any confusion. > The trace tuning > heuristics make the "wrong" decision here and say that the loop will not trace > well. This is because of all the branchiness in the main loop (see the > neighbors method). The benchmark is totally dominated by the performance of the loop in findGraphNode(). If you have one or more calls to a function containing a very tight loop, and those calls are within an outer loop that is very branchy, I would have thought that the inner loops would definitely be traced and the outer loop maybe not...

Boris Zbarsky [:bzbarsky]

Comment 7

•

15 years ago

Bill, what's the neighbors method you mention?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 8

•

15 years ago

(In reply to comment #7) > Bill, what's the neighbors method you mention? I was looking at the main loop, where neighbors is called. I missed the inner loop, because it was getting blacklisted due to a low initial iteration count. I'm not entirely convinced that the inner loop should be traced, but this bug probably isn't the right place to discuss that. I guess I should have posted this patch independent of trace tuning, since it's really just an improvement to the method JIT. Regarding the trace tuning, I'll post an updated patch for that in Bug 580468 soon. Then we can talk about what to do for A*.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 9

•

15 years ago

Attached patch new patch (obsolete) — Details — Splinter Review

Attachment #478847 - Flags: review?(dvander)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 10

•

15 years ago

Comment on attachment 478847 [details] [diff] [review] new patch I fixed the equality case. I didn't have time for the PIC. I think it makes sense to get this in and then do additional work later, if it seems necessary.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Updated

•

15 years ago

Blocks: 580468

David Mandelin [:dmandelin]

Comment 11

•

15 years ago

I just tested this benchmark, extracted from astar, with the new patch: function objEq(a, b) { for (var i = 0; i < 4000; ++i) { if (a == b) break; } } var t0 = new Date; var a = {}, b = {}; for (var i = 0; i < 10000; ++i) objEq(a, b); var t1 = new Date; print('objEq ' + (t1-t0)); Before the patch, the methodjit is 7x slower than the tracer. After the patch, it is still 2x slower. If I change |==| to |===|, then the methodjit is as fast as the tracer was on ==. (The tracer gets much slower when I do that--should I file a bug on that?) I didn't look at this patch in detail, but it seems there is still room to make it faster.

David Mandelin [:dmandelin]

Updated

•

15 years ago

Blocks: JaegerSpeed

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 12

•

15 years ago

(In reply to comment #11) > I didn't look at this patch in detail, but it seems there > is still room to make it faster. Okay, maybe it's time to do the PIC. I'll look into this.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 13

•

15 years ago

Attached patch inline cache version (obsolete) — Details — Splinter Review

Okay, this version uses an inline cache. It always generates the int path. The object and string paths are generated as needed. To avoid issues with TRACE MICs, I made the decision that we should only use inline caching for forward jumps. Otherwise, things would get really complex. And anyway, I doubt there are many loops whose condition is an object or string comparison. (Loops of the form |while (x!=null)| don't count, I think, because null is its own type.) The patch also adds a guard for the equality extension to the tracer, since it was missing. The performance of the IC version on A* doesn't really improve at all over the non-IC version. I guess this isn't too surprising, since we really only avoid one extra branch for the string case. Nevertheless, the code is cleaner, I think. And it will be a lot easier to add additional cases, if needed. I don't see too big a difference between the tracer and the method JIT on Dave's microbenchmark: 113ms for tracer versus 138ms for the method JIT. Something very strange is going on, though. The equality extension guard has no performance impact on the tracer. However, if I take it out of the method JIT, performance on the microbenchmark goes down to 113ms---exactly parallel with the tracer. I tried generating exactly the same assembly ops in the two, but it has no effect. Weird.

Attachment #478152 - Attachment is obsolete: true

Attachment #478847 - Attachment is obsolete: true

Attachment #481656 - Flags: review?(dvander)

Attachment #478847 - Flags: review?(dvander)

Jan de Mooij [:jandem]

Comment 14

•

15 years ago

What about doing this for strict object equality too? V8's earley-boyer calls stubs::StrictEq about 1 million times. It's called 500.000 times with both lhs and rhs objects.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Updated

•

15 years ago

Blocks: 596364

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 15

•

15 years ago

Comment on attachment 481656 [details] [diff] [review] inline cache version Sorry for the late review. As discussed IRL, we should move the HAS_EQUALITY flag set into XML object creation, to avoid adding overhead to the object allocation path. >+void >+mjit::Compiler::passEqualityICAddress(EqualityGenInfo &eic) >+{ >+ eic.addrLabel = stubcc.masm.moveWithPatch(ImmPtr(NULL), Registers::ArgReg1); >+} This only gets used in one place, I'd just inline it. Also a general comment, I've been removing "mic" and "pic" in favor of just "ic" as the catch-all canonical name for inline caches. >-mjit::Compiler::jumpAndTrace(Jump j, jsbytecode *target, Jump *slowOne, Jump *slowTwo) >+mjit::Compiler::jumpAndTrace(Jump j, jsbytecode *target, Jump *slow) Yay. >+void JS_FASTCALL >+ic::Equality(VMFrame &f, ic::EqualityICInfo *eic) >+{ >+ EqualityCompiler cc(f, *eic); >+ if (!cc.update()) >+ THROW(); >+} It doesn't look like the IC ever disables itself. We probably want to do that. Patch looks great otherwise, should probably review the final changes though.

Attachment #481656 - Flags: review?(dvander) → review+

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 16

•

15 years ago

Comment on attachment 481656 [details] [diff] [review] inline cache version Sorry for the late review. As discussed IRL, we should move the HAS_EQUALITY flag set into XML object creation, to avoid adding overhead to the object allocation path. >+void >+mjit::Compiler::passEqualityICAddress(EqualityGenInfo &eic) >+{ >+ eic.addrLabel = stubcc.masm.moveWithPatch(ImmPtr(NULL), Registers::ArgReg1); >+} This only gets used in one place, I'd just inline it. Also a general comment, I've been removing "mic" and "pic" in favor of just "ic" as the catch-all canonical name for inline caches. >-mjit::Compiler::jumpAndTrace(Jump j, jsbytecode *target, Jump *slowOne, Jump *slowTwo) >+mjit::Compiler::jumpAndTrace(Jump j, jsbytecode *target, Jump *slow) Yay. >+void JS_FASTCALL >+ic::Equality(VMFrame &f, ic::EqualityICInfo *eic) >+{ >+ EqualityCompiler cc(f, *eic); >+ if (!cc.update()) >+ THROW(); >+} It doesn't look like the IC ever disables itself. We probably want to do that. Patch looks great otherwise, should probably review the final changes though.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 17

•

15 years ago

Attached patch updated patch (obsolete) — Details — Splinter Review

This resolves the equality extension issue in the way we discussed. It's slightly tricky ensuring that the bit gets set in the right places, but I think I got them all. I also added the ability to disable the IC.

Attachment #481656 - Attachment is obsolete: true

Attachment #482730 - Flags: review?(dvander)

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

15 years ago

Attachment #482730 - Flags: review?(dvander) → review+

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 18

•

15 years ago

Attached patch updated patch — Details — Splinter Review

Attachment #482730 - Attachment is obsolete: true

Attachment #484111 - Flags: review?(nnethercote)

Nicholas Nethercote [inactive]

Comment 19

•

15 years ago

Comment on attachment 484111 [details] [diff] [review] updated patch >@@ -112,36 +113,37 @@ static const nanojit::AccSet ACCSET_CX > static const nanojit::AccSet ACCSET_EOS = (1 << 4); > static const nanojit::AccSet ACCSET_ALLOC = (1 << 5); > static const nanojit::AccSet ACCSET_FRAMEREGS = (1 << 6); > static const nanojit::AccSet ACCSET_STACKFRAME = (1 << 7); > static const nanojit::AccSet ACCSET_RUNTIME = (1 << 8); > > // Nb: JSObject::{lastProp,map,flags} don't have an AccSet because they are never accessed on trace Nit: remove ",flags" from that comment. The rest of the tracer-related changes look fine. I didn't look at the methodjit-related changes, I assume previous reviews covered them.

Attachment #484111 - Flags: review?(nnethercote) → review+

Paul Wright

Comment 20

•

15 years ago

Please remember to set the whiteboard and post the changeset to the bug(s) when pushed. Thanks.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 21

•

15 years ago

http://hg.mozilla.org/tracemonkey/rev/81d0ca612cc8

Whiteboard: fixed-in-tracemonkey

Chris Leary [:cdleary] (not checking bugmail)

Comment 22

•

15 years ago

Alas, this broke ARM's trace tests. I'm going to back this change out for the time being so we can work on PICs (backout link to follow). Failing trace tests: -m basic/bug524826-2.js -m basic/bug524826.js -m basic/equalInt.js -m basic/strings.js -m basic/truthies.js -m sunspider/check-date-format-tofte.js -m sunspider/check-string-tagcloud.js -m v8-v5/check-crypto.js -m v8-v5/check-deltablue.js

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 23

•

15 years ago

This patch blocks tracer tuning, which we want to land tomorrow. It might be better to #ifdef POLYIC the new code. The ARM ICs are in-flux anyway.

Chris Leary [:cdleary] (not checking bugmail)

Comment 24

•

15 years ago

(In reply to comment #23) > It might be > better to #ifdef POLYIC the new code. The ARM ICs are in-flux anyway. I don't think I understand what you're saying we should ifdef out -- this causes a regression on ARM with a clean tip and methodjit enabled. I can investigate it, but we shouldn't leave the tree broken on a platform, right?

Chris Leary [:cdleary] (not checking bugmail)

Comment 25

•

15 years ago

Oh, maybe you meant to disable JS_MONOIC on ARM for the time being -- I'll try that and see if everything is copacetic.

Chris Leary [:cdleary] (not checking bugmail)

Comment 26

•

15 years ago

Okay, disabled MONOIC for ARM: http://hg.mozilla.org/tracemonkey/rev/b7c0fd2370c6 See bug 605415 for ARM debugging progress.

Chris Leary [:cdleary] (not checking bugmail)

Comment 27

•

15 years ago

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 28

•

15 years ago

Sorry about this, Chris. I ran the tests you mentioned through valgrind and it comes up clean, so it'll be hard for me to debug it. I'm pretty sure the jumpToStub thing is safe, although admittedly not very clear. Here's the rationale. Inside jsop_equality_int_string, the only way jumpToStub won't be set is if both operands are known integers. In this case, there's no chance we'll ever enter the IC, so linkForIC is never called.

Robert Sayre

Comment 29

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/81d0ca612cc8

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

object equality benchmark 15 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 902 bytes, application/javascript		Details
WIP patch 15 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 12.77 KB, patch		Details \| Diff \| Splinter Review
new patch 15 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 16.27 KB, patch		Details \| Diff \| Splinter Review
inline cache version 15 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 35.27 KB, patch	dvander : review+	Details \| Diff \| Splinter Review
updated patch 15 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 39.45 KB, patch	dvander : review+	Details \| Diff \| Splinter Review
updated patch 15 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 46.77 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review