606890 - ai-astar should trace

Reporter

Description

•

14 years ago

Here are the instruction counts for ai-astar in the different shell modes: -j: 3237M -m: 4553M -j -m: 4285M -j -m -p: 4559M ai-astar spends about 98% of its time in this function's loop: Array.prototype.findGraphNode = function(obj) { for(var i=0;i<this.length;i++) { if(this[i].pos == obj.pos) { return this[i]; } } return false; }; This should be traced.

Boris Zbarsky [:bzbarsky]

Updated

•

14 years ago

Depends on: 580468

David Mandelin [:dmandelin]

Updated

•

14 years ago

Whiteboard: [jsperf]

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 1

•

14 years ago

I'd like to do this, but it could be some work. Right now this loop is being classified as a "short loop" because, when we profile it, it executes fewer than 16 iterations. However, it executes a lot more later on. The short loop check is pretty important for 3d-raytrace, so we can't take it out. However, if a loop is classified as short, we could decide to re-profile it later on. This requires some changes to the method jit, though. We already do pretty well on Kraken, so I'd like to put this off for now.

Boris Zbarsky [:bzbarsky]

Comment 2

•

14 years ago

Bill, I thought the short loop detection was using an average over a bunch of runs... is that something different?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 3

•

14 years ago

(In reply to comment #2) > Bill, I thought the short loop detection was using an average over a bunch of > runs... is that something different? That would be really nice, but right now it just uses what it sees for the one iteration that's profiled. It's really not robust at all.

Nicholas Nethercote [inactive]

Reporter

Comment 4

•

14 years ago

Ok, at least we know now what the issue is.

Nicholas Nethercote [inactive]

Reporter

Comment 5

•

14 years ago

Hmm, why are loops with few iterations not traced -- because the compile-time overhead is too high?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 6

•

14 years ago

(In reply to comment #5) > Hmm, why are loops with few iterations not traced -- because the compile-time > overhead is too high? Compilation is probably a factor. However, even if a short trace is used many times, it's usually still faster to go with the method jit. I think the problem is the cost of entering and exiting the trace.

Boris Zbarsky [:bzbarsky]

Comment 7

•

14 years ago

For what it's worth, this does cause ai-astar to run about 45% slower than it used to over here.... On my machine the difference is about 8% of the total Kraken runtime.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Updated

•

14 years ago

Assignee: general → wmccloskey

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 8

•

14 years ago

Attached patch patch to profile a maybeShort loop twice — Details — Splinter Review

This patch changes the behavior for maybeShort loops (ones where the iteration count is too low at profiling time). Rather than blacklist them, it runs them in the methodjit for 5000 iterations and then profiles again. If they're still maybeShort, it blacklists. Otherwise it traces them. A side benefit of this patch is that it adds a little code to the jumpAndTrace stub so that we can wait k iterations before calling into the tracer (for some k). Previously we always called into the tracer or never called into it (for a blacklisted loop). We can also use this code to avoid uselessly calling into the tracer HOTLOOP times before the loop gets hot. Consequently, it speeds up SunSpider by a few milliseconds (spread pretty evenly across all the benchmarks).

Attachment #489336 - Flags: review?(dmandelin)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 9

•

14 years ago

Attached patch patch to make ai-astar trace — Details — Splinter Review

This patch actually makes Kraken's ai-star trace. It also requires the patch in the previous comment. This patch does several things: 1. It factors reads of dense arrays into the goodOps calculation. I found that reads are faster in the tracer, while writes are faster in the methodjit. So only reads get counted. 2. Equality operations are also factored in, since these are faster in the tracer. 3. There is a check called isCompilationProfitable. It's designed to avoid tracing short loops with branches. Mainly, it saves us a few msecs on bitops-bits-in-byte. Unfortunately, it was causing problems for this ai-astar. I hacked up the check a bit so that it factors in goodOps when deciding whether a loop is unprofitable. I'm not terribly happy with this solution, but I couldn't think of another way to make both benchmarks go fast.

Attachment #489339 - Flags: review?(dmandelin)

Nicholas Nethercote [inactive]

Reporter

Comment 10

•

14 years ago

(In reply to comment #9) > > 1. It factors reads of dense arrays into the goodOps calculation. I found that > reads are faster in the tracer, while writes are faster in the methodjit. So > only reads get counted. Huh. Any idea why writes are faster in the methodjit? And thanks for looking into this!

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Updated

•

14 years ago

Depends on: 607539

David Mandelin [:dmandelin]

Comment 11

•

14 years ago

Comment on attachment 489336 [details] [diff] [review] patch to profile a maybeShort loop twice Great ideas. Just a few nits, r+ with those fixed. >+uint32 >+GetHotloop(JSContext *cx) This makes me think it's going to return a loop bytecode offset or something. How about |GetHotLoopCount| or |GetHotLoopThreshold| or some such? >+ scriptTICs[i].loopCounterStart = GetHotloop(cx); >+ scriptTICs[i].loopCounter = scriptTICs[i].loopCounterStart; I think it's better like this: scriptTICs[i].loopCounter = scriptTICs[i].loopCounterStart = GetHotloop(cx); > /* This data is used by the tracing JIT. */ > void *traceData; > uintN traceEpoch; >+ uint32 loopCounter; >+ uint32 loopCounterStart; > > bool hasSlowTraceHint : 1; Please add 1-line comments to these items.

Attachment #489336 - Flags: review?(dmandelin) → review+

David Mandelin [:dmandelin]

Updated

•

14 years ago

Attachment #489339 - Flags: review?(dmandelin) → review+

Boris Zbarsky [:bzbarsky]

Comment 12

•

14 years ago

> while writes are faster in the methodjit. Were these writes to holes? If so, were you hitting bug 610583? It's worth remeasuring with current TM tip plus the patch I'm about to post in that bug.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 13

•

14 years ago

http://hg.mozilla.org/tracemonkey/rev/d2dd2db256ce

Whiteboard: [jsperf] → [jsperf] fixed-in-tracemonkey

Boris Zbarsky [:bzbarsky]

Comment 14

•

14 years ago

Bill, if you post the test you used to measure read/write speed, I'm happy to give it a whirl now that bug 610583 is fixed in TM.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 15

•

14 years ago

Had to back this out http://hg.mozilla.org/tracemonkey/rev/cb76b2d61096

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 16

•

14 years ago

Attached file array benchmark — Details

Here's the array microbenchmark. For -j I get: and 11 read 16 write 33 And for -m I get: and 10 read 19 write 28 The & test is just to show that the array stuff is what's being measured (hopefully). The differences are not big, but I've also noticed that JSNES gets slower if I count array writes as being faster in the tracer. So I'd prefer to keep things this way for now.

Boris Zbarsky [:bzbarsky]

Comment 17

•

14 years ago

That's fine, but might be worth filing a bug to make TM array writes faster...

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Updated

•

14 years ago

Attachment #490020 - Attachment is patch: false

Mauricio Collares [:collares]

Updated

•

14 years ago

Whiteboard: [jsperf] fixed-in-tracemonkey → [jsperf]

Robert Sayre

Comment 18

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/d2dd2db256ce

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Robert Sayre

Updated

•

14 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Boris Zbarsky [:bzbarsky]

Updated

•

14 years ago

Blocks: 580468

No longer depends on: 580468

Boris Zbarsky [:bzbarsky]

Updated

•

14 years ago

Blocks: 608733

Daniel C

Comment 19

•

14 years ago

What needs to be done for this to be resolved?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 20

•

14 years ago

(In reply to comment #19) > What needs to be done for this to be resolved? The patch in this bug that factors array access into goodOps in the profiler makes us slower on SunSpider, because it causes us to trace some SunSpider benchmarks that shouldn't be traced. We could try to find a different way to avoid tracing those SunSpider benchmarks, or we could just accept the hit.

Boris Zbarsky [:bzbarsky]

Comment 21

•

14 years ago

Note that there are some benchmarks (including, say, roc's tracer thingie) that get faster if we count array ops as good for tracing... :(

Emanuel Hoogeveen [:ehoogeveen]

Comment 22

•

14 years ago

(In reply to comment #17) > That's fine, but might be worth filing a bug to make TM array writes faster... Was this ever investigated?

Nicholas Nethercote [inactive]

Reporter

Comment 23

•

14 years ago

(In reply to comment #22) > (In reply to comment #17) > > That's fine, but might be worth filing a bug to make TM array writes faster... > > Was this ever investigated? Array reads and writes are extremely common. They've already improved quite a lot since Firefox 3.6 -- they used to involve a call to a C++ function, now they just involve inline code (in the common cases), and CSE has improved enough that array writes that follow array reads can often avoid duplicated checks (eg. the "is this an array?" check). So yes, it's been extensively investigated, and filing a bug "make TM array writes faster" is likely to be useless without some concrete suggestions on how to make them faster. Type inference (bug 619423) should help some more, I plan to integrate type inference into TM as soon as Firefox 4.0 is done.

Daniel C

Comment 24

•

14 years ago

(In reply to comment #20) > (In reply to comment #19) > > What needs to be done for this to be resolved? > > The patch in this bug that factors array access into goodOps in the profiler > makes us slower on SunSpider, because it causes us to trace some SunSpider > benchmarks that shouldn't be traced. We could try to find a different way to > avoid tracing those SunSpider benchmarks, or we could just accept the hit. I was confused by comment 18 because it didn't also mention http://hg.mozilla.org/mozilla-central/rev/cb76b2d61096 so I thought that the patches had landed on mozilla-central. Thanks for the explanation.

Brendan Eich [:brendan]

Comment 25

•

14 years ago

I'm against accepting perf hits in the current competitive market setting. I'm also Captain Kirk: I don't believe in the no-win scenario. Reprogram the sim if you have to (cheat). Kidding aside, this does not seem like a no-win situation. /be

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 26

•

14 years ago

I checked in part of this patch. The rest will follow. http://hg.mozilla.org/tracemonkey/rev/b76f25a97aeb

Nicholas Nethercote [inactive]

Reporter

Comment 27

•

14 years ago

Bill, you landed these three patches together: # 94c6d2a90ae6 Bill McCloskey – Bug 607539 - Change how profiler aborts work (r=dmandelin) # 471a47358232 Bill McCloskey – Bug 606890 - Factor array reads and equality into profiler decision (r=dmandelin) # 4391ed5db11f Bill McCloskey – Bug 606890 - Change how profiler recognizes "expensive" inner loops (r=dmandelin) Together they regressed Kraken by 15.2% and Sunspider by 2.4%, according to AWFY2. Not good! I'm not sure which particular tests were affected, AFAICT AWFY2 doesn't have test-by-test results (dvander, is that right?)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 28

•

14 years ago

I don't know what's going on here. I expected a small SunSpider regression, although I didn't think it would be quite so large. The Kraken regression is bizarre. On my machine, these patches make Kraken 4.5% faster, since they fix ai-astar. A few of the Kraken benchmarks get slower, but not by much. I'd like to leave these in until tomorrow; they don't really affect any other code. Perhaps if there's some way to recover individual test data from AWFY2, I can back them out individually and get a better sense of what went wrong.

Nicholas Nethercote [inactive]

Reporter

Comment 29

•

14 years ago

dvander said on IRC that he'll try to add the individual test data to AWFY2 tomorrow morning.

Nicholas Nethercote [inactive]

Reporter

Comment 30

•

14 years ago

Ok, this is weird. I ran Cachegrind to get instruction counts for 60021:ae34ea3455f2 (just before your patches) and 60026:0ccd15d19393 (which includes your patches plus two tiny other ones that won't affect perf). Kraken results: --------------------------------------------------------------- | millions of instructions executed | | total | compiled (may overestimate) | --------------------------------------------------------------- | 4216.451 3522.743 (1.197x) | 4046.779 3330.801 (1.215x) | ai-astar | 1999.057 1817.284 (1.100x) | 1357.544 1323.573 (1.026x) | audio-beat-det | 1368.181 1368.180 (------) | 1170.662 1170.662 (------) | audio-dft | 1721.461 1721.058 (------) | 1248.983 1248.349 (1.001x) | audio-fft | 2608.416 2445.879 (1.066x) | 1790.584 1626.967 (1.101x) | audio-oscillat | 5607.396 5607.396 (------) | 4756.206 4756.205 (------) | imaging-gaussi | 1971.938 1971.935 (------) | 721.081 721.081 (------) | imaging-darkro | 4376.700 4376.704 (------) | 3533.216 3533.216 (------) | imaging-desatu | 641.743 641.743 (------) | 9.560 9.560 (------) | json-parse-fin | 434.922 434.922 (------) | 4.406 4.406 (------) | json-stringify | 1125.539 1125.538 (------) | 727.912 727.912 (------) | stanford-crypt | 671.762 671.734 (------) | 370.102 370.101 (------) | stanford-crypt | 1171.979 1171.962 (------) | 597.351 597.351 (------) | stanford-crypt | 528.166 528.148 (------) | 200.365 200.364 (------) | stanford-crypt ------- | 28443.717 27405.231 (1.038x) | 20534.758 19620.555 (1.047x) | all ie. it's a clear improvement, which matches the timing results on your machine. As for SunSpider, I saw 2% worse counts for fannkuch and tofte and everything else was unchanged. We need those individual results!

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 31

•

14 years ago

http://www.arewefastyet.com/awfy2.php?view=individual&runs=50

Nicholas Nethercote [inactive]

Reporter

Comment 32

•

14 years ago

Looks like ai-astar improved by 250ms, gaussian-blur worsened by 1300ms, others didn't change much.

Reece H. Dunn

Comment 33

•

14 years ago

ss-nbody regressed 100% by 7.9ms -- this was tracing (hovering near the orange line) but is not any more (hovering near the black line).

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 34

•

14 years ago

http://hg.mozilla.org/tracemonkey/rev/471a47358232 http://hg.mozilla.org/tracemonkey/rev/4391ed5db11f

Whiteboard: [jsperf] → [jsperf] [fixed-in-tracemonkey]

Chris Leary [:cdleary] (not checking bugmail)

Comment 35

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/471a47358232 http://hg.mozilla.org/mozilla-central/rev/4391ed5db11f http://hg.mozilla.org/mozilla-central/rev/b76f25a97aeb

Status: REOPENED → RESOLVED

Closed: 14 years ago → 14 years ago

Resolution: --- → FIXED

patch to profile a maybeShort loop twice 14 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 16.78 KB, patch	dmandelin : review+	Details \| Diff \| Splinter Review
patch to make ai-astar trace 14 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 9.57 KB, patch	dmandelin : review+	Details \| Diff \| Splinter Review
array benchmark 14 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) 478 bytes, text/plain		Details