Closed Bug 606650 Opened 14 years ago Closed 14 years ago

Heuristics make us fall into JM on the attached testcases, where it's over 3x slower than tracing

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

blocking2.0

---

betaN+

People

(Reporter: bzbarsky, Assigned: billm)

References

(Blocks 1 open bug)

Details

(Keywords: regression, Whiteboard: [jsperf] fixed-in-tracemonkey)

Attachments

(3 files)

Testcase 1 14 years ago Boris Zbarsky [:bzbarsky] 5.22 KB, text/html		Details
Testcase 2 14 years ago Boris Zbarsky [:bzbarsky] 5.04 KB, text/html		Details
patch 14 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 8.76 KB, patch	dmandelin : review+	Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

14 years ago

The testcases linked to from bug 606648 end up in JM with profiling turned on.  But they're much faster when traced.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 1

•

14 years ago

Attached file Testcase 1 — Details

This is from http://www.peacekeeper.therichins.net/test10b.html

Boris Zbarsky [:bzbarsky]

Reporter

Comment 2

•

14 years ago

Attached file Testcase 2 — Details

This is from http://www.peacekeeper.therichins.net/test10c.html

Boris Zbarsky [:bzbarsky]

Reporter

Comment 3

•

14 years ago

These are apparently based on some peacekeeper code.

Blocks: peacekeeper

Boris Zbarsky [:bzbarsky]

Reporter

Comment 4

•

14 years ago

Hmm.  So we apparently blacklist the hot loops in processBuffers because we think the inner one has a very high branching factor.  Which it technically does, because each get() call has an if/else in it.  But in practice, the loop conditions on that loop make sure only one of the branches is taken so this traces well.

Note that tracer is 1.5-2x faster than V8 on these testcases, btw; JM is about 1.5-2x slower than V8...

blocking2.0: --- → ?

Keywords: regression

David Mandelin [:dmandelin]

Updated

•

14 years ago

Whiteboard: [jsperf]

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 5

•

14 years ago

Attached patch patch — Details — Splinter Review

This patch tries to treat the return value of typeof() as an approximate constant, meaning that we expect it to be the same most of the time. Then whenever there's a branch on an approximate constant, it ignores it.

I also cleaned up the short loop tracking to use this new code.

And I fixed a small problem with handling of JSOP_AND/JSOP_OR, where the profiler was finding more branchiness than was actually there.

With the patch, the unmodified Peacekeeper benchmark is still a little slower than with -j alone. I'm not sure if this is profiling overhead or if some other loop ought to be traced. Regardless, we beat v8, and performance is pretty satisfactory.

I'm going to test these changes on SS/V8 to make sure there are no regressions.

Assignee: general → wmccloskey

Status: NEW → ASSIGNED

Attachment #485863 - Flags: review?(bzbarsky)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 6

•

14 years ago

No regressions.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 7

•

14 years ago

Comment on attachment 485863 [details] [diff] [review]
patch

I'm not sure I'm competent to review this, at least with quick turnaround; dmandelin may be a better bet, since he reviewed bug 580468.

That said, passing "false" as the third arg of the StackValue ctor seems wrong.  

Also, maybe there should actually be two ctors (one taking just isConst for no value and one taking isConst and the value for cases with a value; setting hasValue can just be done based on which ctor is used).

Attachment #485863 - Flags: review?(bzbarsky) → review?(dmandelin)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 8

•

14 years ago

We may also, of course, want a JM bug on it being slower than V8 here.  Davids, let me know if you want me to file that.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 9

•

14 years ago

Hmm.  I just tried the attached patch, and I see 300ms and 545ms on my machine on the two testcases in this bug, respectively.

If I disable JM or profiling, I see 230ms and 190ms.

So there's pretty noticeable slowdown from jm+profiling on the first testcase, and a very large slowdown on the second.  In fact, the 545ms time is only a little bit faster than the pure-JM time on the second testcase....

It's good to hear that the original benchmark is fast now, but do we need a separate bug on the second testcase here?

David Mandelin [:dmandelin]

Comment 10

•

14 years ago

(In reply to comment #9)
> It's good to hear that the original benchmark is fast now, but do we need a
> separate bug on the second testcase here?

Bill, how do you want to do it? Do you want to check that out before I review this patch, or do you want me to go ahead and review this now?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 11

•

14 years ago

(In reply to comment #10)
> (In reply to comment #9)
> > It's good to hear that the original benchmark is fast now, but do we need a
> > separate bug on the second testcase here?
> 
> Bill, how do you want to do it? Do you want to check that out before I review
> this patch, or do you want me to go ahead and review this now?

Sorry, I should have been more clear that the patch is intended to fix only the first testcase. The second one (which is not part of any benchmark suite, IIUC) isn't addressed, and it would be harder to fix.

I think it's ready for review Dave, although I'll probably add a separate constructor as Boris suggested.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 12

•

14 years ago

OK, I filed bug 607201 on the second testcase (which I think we should try not to regress, if we can: it's closer to how people actually write code than that benchmark is!) and bug 607202 on JM being Really Slow here in general.

David Mandelin [:dmandelin]

Comment 13

•

14 years ago

Comment on attachment 485863 [details] [diff] [review]
patch

The patch looks fine to me, modulo the separate constructor. The only thing I'm not sure about is that this kind of duplicates the bytecode analyzer. But we throw away the results from that after compilation, and recomputing it might be too expensive. Let's just keep that in mind going forward each time we consider beefing up this particular analysis.

r+ with constructors fixed.

Attachment #485863 - Flags: review?(dmandelin) → review+

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 14

•

14 years ago

http://hg.mozilla.org/tracemonkey/rev/81fa7d0b8f00

Whiteboard: [jsperf] → [jsperf] fixed-in-tracemonkey

Robert Sayre

Comment 15

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/81fa7d0b8f00

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Robert Sayre

Updated

•

14 years ago

blocking2.0: ? → betaN+

Daniel Holbert [:dholbert]

Updated

•

13 years ago

Depends on: 683998

You need to log in before you can comment on or make changes to this bug.