Heuristics make us fall into JM on the attached testcases, where it's over 3x slower than tracing

RESOLVED FIXED

Status

()

Core
JavaScript Engine
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: bz, Assigned: billm)

Tracking

(Blocks: 1 bug, {regression})

Trunk
x86
Mac OS X
regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 betaN+)

Details

(Whiteboard: [jsperf] fixed-in-tracemonkey)

Attachments

(3 attachments)

The testcases linked to from bug 606648 end up in JM with profiling turned on.  But they're much faster when traced.
These are apparently based on some peacekeeper code.
Blocks: 499198
Hmm.  So we apparently blacklist the hot loops in processBuffers because we think the inner one has a very high branching factor.  Which it technically does, because each get() call has an if/else in it.  But in practice, the loop conditions on that loop make sure only one of the branches is taken so this traces well.

Note that tracer is 1.5-2x faster than V8 on these testcases, btw; JM is about 1.5-2x slower than V8...
blocking2.0: --- → ?
Keywords: regression
Whiteboard: [jsperf]
Created attachment 485863 [details] [diff] [review]
patch

This patch tries to treat the return value of typeof() as an approximate constant, meaning that we expect it to be the same most of the time. Then whenever there's a branch on an approximate constant, it ignores it.

I also cleaned up the short loop tracking to use this new code.

And I fixed a small problem with handling of JSOP_AND/JSOP_OR, where the profiler was finding more branchiness than was actually there.

With the patch, the unmodified Peacekeeper benchmark is still a little slower than with -j alone. I'm not sure if this is profiling overhead or if some other loop ought to be traced. Regardless, we beat v8, and performance is pretty satisfactory.

I'm going to test these changes on SS/V8 to make sure there are no regressions.
Assignee: general → wmccloskey
Status: NEW → ASSIGNED
Attachment #485863 - Flags: review?(bzbarsky)
No regressions.
Comment on attachment 485863 [details] [diff] [review]
patch

I'm not sure I'm competent to review this, at least with quick turnaround; dmandelin may be a better bet, since he reviewed bug 580468.

That said, passing "false" as the third arg of the StackValue ctor seems wrong.  

Also, maybe there should actually be two ctors (one taking just isConst for no value and one taking isConst and the value for cases with a value; setting hasValue can just be done based on which ctor is used).
Attachment #485863 - Flags: review?(bzbarsky) → review?(dmandelin)
We may also, of course, want a JM bug on it being slower than V8 here.  Davids, let me know if you want me to file that.
Hmm.  I just tried the attached patch, and I see 300ms and 545ms on my machine on the two testcases in this bug, respectively.

If I disable JM or profiling, I see 230ms and 190ms.

So there's pretty noticeable slowdown from jm+profiling on the first testcase, and a very large slowdown on the second.  In fact, the 545ms time is only a little bit faster than the pure-JM time on the second testcase....

It's good to hear that the original benchmark is fast now, but do we need a separate bug on the second testcase here?
(In reply to comment #9)
> It's good to hear that the original benchmark is fast now, but do we need a
> separate bug on the second testcase here?

Bill, how do you want to do it? Do you want to check that out before I review this patch, or do you want me to go ahead and review this now?
(In reply to comment #10)
> (In reply to comment #9)
> > It's good to hear that the original benchmark is fast now, but do we need a
> > separate bug on the second testcase here?
> 
> Bill, how do you want to do it? Do you want to check that out before I review
> this patch, or do you want me to go ahead and review this now?

Sorry, I should have been more clear that the patch is intended to fix only the first testcase. The second one (which is not part of any benchmark suite, IIUC) isn't addressed, and it would be harder to fix.

I think it's ready for review Dave, although I'll probably add a separate constructor as Boris suggested.
OK, I filed bug 607201 on the second testcase (which I think we should try not to regress, if we can: it's closer to how people actually write code than that benchmark is!) and bug 607202 on JM being Really Slow here in general.
Comment on attachment 485863 [details] [diff] [review]
patch

The patch looks fine to me, modulo the separate constructor. The only thing I'm not sure about is that this kind of duplicates the bytecode analyzer. But we throw away the results from that after compilation, and recomputing it might be too expensive. Let's just keep that in mind going forward each time we consider beefing up this particular analysis.

r+ with constructors fixed.
Attachment #485863 - Flags: review?(dmandelin) → review+
http://hg.mozilla.org/tracemonkey/rev/81fa7d0b8f00
Whiteboard: [jsperf] → [jsperf] fixed-in-tracemonkey

Comment 15

7 years ago
http://hg.mozilla.org/mozilla-central/rev/81fa7d0b8f00
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED

Updated

7 years ago
blocking2.0: ? → betaN+
Depends on: 683998
You need to log in before you can comment on or make changes to this bug.