Closed Bug 606890 Opened 14 years ago Closed 14 years ago

ai-astar should trace

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: billm)

References

Details

(Whiteboard: [jsperf] [fixed-in-tracemonkey])

Attachments

(3 files)

Here are the instruction counts for ai-astar in the different shell modes: -j: 3237M -m: 4553M -j -m: 4285M -j -m -p: 4559M ai-astar spends about 98% of its time in this function's loop: Array.prototype.findGraphNode = function(obj) { for(var i=0;i<this.length;i++) { if(this[i].pos == obj.pos) { return this[i]; } } return false; }; This should be traced.
Depends on: 580468
Whiteboard: [jsperf]
I'd like to do this, but it could be some work. Right now this loop is being classified as a "short loop" because, when we profile it, it executes fewer than 16 iterations. However, it executes a lot more later on. The short loop check is pretty important for 3d-raytrace, so we can't take it out. However, if a loop is classified as short, we could decide to re-profile it later on. This requires some changes to the method jit, though. We already do pretty well on Kraken, so I'd like to put this off for now.
Bill, I thought the short loop detection was using an average over a bunch of runs... is that something different?
(In reply to comment #2) > Bill, I thought the short loop detection was using an average over a bunch of > runs... is that something different? That would be really nice, but right now it just uses what it sees for the one iteration that's profiled. It's really not robust at all.
Ok, at least we know now what the issue is.
Hmm, why are loops with few iterations not traced -- because the compile-time overhead is too high?
(In reply to comment #5) > Hmm, why are loops with few iterations not traced -- because the compile-time > overhead is too high? Compilation is probably a factor. However, even if a short trace is used many times, it's usually still faster to go with the method jit. I think the problem is the cost of entering and exiting the trace.
For what it's worth, this does cause ai-astar to run about 45% slower than it used to over here.... On my machine the difference is about 8% of the total Kraken runtime.
Assignee: general → wmccloskey
This patch changes the behavior for maybeShort loops (ones where the iteration count is too low at profiling time). Rather than blacklist them, it runs them in the methodjit for 5000 iterations and then profiles again. If they're still maybeShort, it blacklists. Otherwise it traces them. A side benefit of this patch is that it adds a little code to the jumpAndTrace stub so that we can wait k iterations before calling into the tracer (for some k). Previously we always called into the tracer or never called into it (for a blacklisted loop). We can also use this code to avoid uselessly calling into the tracer HOTLOOP times before the loop gets hot. Consequently, it speeds up SunSpider by a few milliseconds (spread pretty evenly across all the benchmarks).
Attachment #489336 - Flags: review?(dmandelin)
This patch actually makes Kraken's ai-star trace. It also requires the patch in the previous comment. This patch does several things: 1. It factors reads of dense arrays into the goodOps calculation. I found that reads are faster in the tracer, while writes are faster in the methodjit. So only reads get counted. 2. Equality operations are also factored in, since these are faster in the tracer. 3. There is a check called isCompilationProfitable. It's designed to avoid tracing short loops with branches. Mainly, it saves us a few msecs on bitops-bits-in-byte. Unfortunately, it was causing problems for this ai-astar. I hacked up the check a bit so that it factors in goodOps when deciding whether a loop is unprofitable. I'm not terribly happy with this solution, but I couldn't think of another way to make both benchmarks go fast.
Attachment #489339 - Flags: review?(dmandelin)
(In reply to comment #9) > > 1. It factors reads of dense arrays into the goodOps calculation. I found that > reads are faster in the tracer, while writes are faster in the methodjit. So > only reads get counted. Huh. Any idea why writes are faster in the methodjit? And thanks for looking into this!
Comment on attachment 489336 [details] [diff] [review] patch to profile a maybeShort loop twice Great ideas. Just a few nits, r+ with those fixed. >+uint32 >+GetHotloop(JSContext *cx) This makes me think it's going to return a loop bytecode offset or something. How about |GetHotLoopCount| or |GetHotLoopThreshold| or some such? >+ scriptTICs[i].loopCounterStart = GetHotloop(cx); >+ scriptTICs[i].loopCounter = scriptTICs[i].loopCounterStart; I think it's better like this: scriptTICs[i].loopCounter = scriptTICs[i].loopCounterStart = GetHotloop(cx); > /* This data is used by the tracing JIT. */ > void *traceData; > uintN traceEpoch; >+ uint32 loopCounter; >+ uint32 loopCounterStart; > > bool hasSlowTraceHint : 1; Please add 1-line comments to these items.
Attachment #489336 - Flags: review?(dmandelin) → review+
Attachment #489339 - Flags: review?(dmandelin) → review+
> while writes are faster in the methodjit. Were these writes to holes? If so, were you hitting bug 610583? It's worth remeasuring with current TM tip plus the patch I'm about to post in that bug.
Whiteboard: [jsperf] → [jsperf] fixed-in-tracemonkey
Bill, if you post the test you used to measure read/write speed, I'm happy to give it a whirl now that bug 610583 is fixed in TM.
Attached file array benchmark
Here's the array microbenchmark. For -j I get: and 11 read 16 write 33 And for -m I get: and 10 read 19 write 28 The & test is just to show that the array stuff is what's being measured (hopefully). The differences are not big, but I've also noticed that JSNES gets slower if I count array writes as being faster in the tracer. So I'd prefer to keep things this way for now.
That's fine, but might be worth filing a bug to make TM array writes faster...
Attachment #490020 - Attachment is patch: false
Whiteboard: [jsperf] fixed-in-tracemonkey → [jsperf]
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 580468
No longer depends on: 580468
Blocks: 608733
What needs to be done for this to be resolved?
(In reply to comment #19) > What needs to be done for this to be resolved? The patch in this bug that factors array access into goodOps in the profiler makes us slower on SunSpider, because it causes us to trace some SunSpider benchmarks that shouldn't be traced. We could try to find a different way to avoid tracing those SunSpider benchmarks, or we could just accept the hit.
Note that there are some benchmarks (including, say, roc's tracer thingie) that get faster if we count array ops as good for tracing... :(
(In reply to comment #17) > That's fine, but might be worth filing a bug to make TM array writes faster... Was this ever investigated?
(In reply to comment #22) > (In reply to comment #17) > > That's fine, but might be worth filing a bug to make TM array writes faster... > > Was this ever investigated? Array reads and writes are extremely common. They've already improved quite a lot since Firefox 3.6 -- they used to involve a call to a C++ function, now they just involve inline code (in the common cases), and CSE has improved enough that array writes that follow array reads can often avoid duplicated checks (eg. the "is this an array?" check). So yes, it's been extensively investigated, and filing a bug "make TM array writes faster" is likely to be useless without some concrete suggestions on how to make them faster. Type inference (bug 619423) should help some more, I plan to integrate type inference into TM as soon as Firefox 4.0 is done.
(In reply to comment #20) > (In reply to comment #19) > > What needs to be done for this to be resolved? > > The patch in this bug that factors array access into goodOps in the profiler > makes us slower on SunSpider, because it causes us to trace some SunSpider > benchmarks that shouldn't be traced. We could try to find a different way to > avoid tracing those SunSpider benchmarks, or we could just accept the hit. I was confused by comment 18 because it didn't also mention http://hg.mozilla.org/mozilla-central/rev/cb76b2d61096 so I thought that the patches had landed on mozilla-central. Thanks for the explanation.
I'm against accepting perf hits in the current competitive market setting. I'm also Captain Kirk: I don't believe in the no-win scenario. Reprogram the sim if you have to (cheat). Kidding aside, this does not seem like a no-win situation. /be
I checked in part of this patch. The rest will follow. http://hg.mozilla.org/tracemonkey/rev/b76f25a97aeb
Bill, you landed these three patches together: # 94c6d2a90ae6 Bill McCloskey – Bug 607539 - Change how profiler aborts work (r=dmandelin) # 471a47358232 Bill McCloskey – Bug 606890 - Factor array reads and equality into profiler decision (r=dmandelin) # 4391ed5db11f Bill McCloskey – Bug 606890 - Change how profiler recognizes "expensive" inner loops (r=dmandelin) Together they regressed Kraken by 15.2% and Sunspider by 2.4%, according to AWFY2. Not good! I'm not sure which particular tests were affected, AFAICT AWFY2 doesn't have test-by-test results (dvander, is that right?)
I don't know what's going on here. I expected a small SunSpider regression, although I didn't think it would be quite so large. The Kraken regression is bizarre. On my machine, these patches make Kraken 4.5% faster, since they fix ai-astar. A few of the Kraken benchmarks get slower, but not by much. I'd like to leave these in until tomorrow; they don't really affect any other code. Perhaps if there's some way to recover individual test data from AWFY2, I can back them out individually and get a better sense of what went wrong.
dvander said on IRC that he'll try to add the individual test data to AWFY2 tomorrow morning.
Ok, this is weird. I ran Cachegrind to get instruction counts for 60021:ae34ea3455f2 (just before your patches) and 60026:0ccd15d19393 (which includes your patches plus two tiny other ones that won't affect perf). Kraken results: --------------------------------------------------------------- | millions of instructions executed | | total | compiled (may overestimate) | --------------------------------------------------------------- | 4216.451 3522.743 (1.197x) | 4046.779 3330.801 (1.215x) | ai-astar | 1999.057 1817.284 (1.100x) | 1357.544 1323.573 (1.026x) | audio-beat-det | 1368.181 1368.180 (------) | 1170.662 1170.662 (------) | audio-dft | 1721.461 1721.058 (------) | 1248.983 1248.349 (1.001x) | audio-fft | 2608.416 2445.879 (1.066x) | 1790.584 1626.967 (1.101x) | audio-oscillat | 5607.396 5607.396 (------) | 4756.206 4756.205 (------) | imaging-gaussi | 1971.938 1971.935 (------) | 721.081 721.081 (------) | imaging-darkro | 4376.700 4376.704 (------) | 3533.216 3533.216 (------) | imaging-desatu | 641.743 641.743 (------) | 9.560 9.560 (------) | json-parse-fin | 434.922 434.922 (------) | 4.406 4.406 (------) | json-stringify | 1125.539 1125.538 (------) | 727.912 727.912 (------) | stanford-crypt | 671.762 671.734 (------) | 370.102 370.101 (------) | stanford-crypt | 1171.979 1171.962 (------) | 597.351 597.351 (------) | stanford-crypt | 528.166 528.148 (------) | 200.365 200.364 (------) | stanford-crypt ------- | 28443.717 27405.231 (1.038x) | 20534.758 19620.555 (1.047x) | all ie. it's a clear improvement, which matches the timing results on your machine. As for SunSpider, I saw 2% worse counts for fannkuch and tofte and everything else was unchanged. We need those individual results!
Looks like ai-astar improved by 250ms, gaussian-blur worsened by 1300ms, others didn't change much.
ss-nbody regressed 100% by 7.9ms -- this was tracing (hovering near the orange line) but is not any more (hovering near the black line).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: