Status

()

Core
JavaScript Engine
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: njn, Assigned: billm)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [jsperf] [fixed-in-tracemonkey])

Attachments

(3 attachments)

Here are the instruction counts for ai-astar in the different shell modes:

  -j:       3237M
  -m:       4553M
  -j -m:    4285M
  -j -m -p: 4559M

ai-astar spends about 98% of its time in this function's loop:

  Array.prototype.findGraphNode = function(obj) {
      for(var i=0;i<this.length;i++) {
          if(this[i].pos == obj.pos) { return this[i]; }
      }
      return false;
  };

This should be traced.
Whiteboard: [jsperf]
I'd like to do this, but it could be some work. Right now this loop is being classified as a "short loop" because, when we profile it, it executes fewer than 16 iterations. However, it executes a lot more later on. The short loop check is pretty important for 3d-raytrace, so we can't take it out. However, if a loop is classified as short, we could decide to re-profile it later on. This requires some changes to the method jit, though.

We already do pretty well on Kraken, so I'd like to put this off for now.
Bill, I thought the short loop detection was using an average over a bunch of runs... is that something different?
(In reply to comment #2)
> Bill, I thought the short loop detection was using an average over a bunch of
> runs... is that something different?

That would be really nice, but right now it just uses what it sees for the one iteration that's profiled. It's really not robust at all.
Ok, at least we know now what the issue is.
Hmm, why are loops with few iterations not traced -- because the compile-time overhead is too high?
(In reply to comment #5)
> Hmm, why are loops with few iterations not traced -- because the compile-time
> overhead is too high?

Compilation is probably a factor. However, even if a short trace is used many times, it's usually still faster to go with the method jit. I think the problem is the cost of entering and exiting the trace.
For what it's worth, this does cause ai-astar to run about 45% slower than it used to over here....  On my machine the difference is about 8% of the total Kraken runtime.
Assignee: general → wmccloskey
Created attachment 489336 [details] [diff] [review]
patch to profile a maybeShort loop twice

This patch changes the behavior for maybeShort loops (ones where the iteration count is too low at profiling time). Rather than blacklist them, it runs them in the methodjit for 5000 iterations and then profiles again. If they're still maybeShort, it blacklists. Otherwise it traces them.

A side benefit of this patch is that it adds a little code to the jumpAndTrace stub so that we can wait k iterations before calling into the tracer (for some k). Previously we always called into the tracer or never called into it (for a blacklisted loop). We can also use this code to avoid uselessly calling into the tracer HOTLOOP times before the loop gets hot. Consequently, it speeds up SunSpider by a few milliseconds (spread pretty evenly across all the benchmarks).
Attachment #489336 - Flags: review?(dmandelin)
Created attachment 489339 [details] [diff] [review]
patch to make ai-astar trace

This patch actually makes Kraken's ai-star trace. It also requires the patch in the previous comment. This patch does several things:

1. It factors reads of dense arrays into the goodOps calculation. I found that reads are faster in the tracer, while writes are faster in the methodjit. So only reads get counted.

2. Equality operations are also factored in, since these are faster in the tracer.

3. There is a check called isCompilationProfitable. It's designed to avoid tracing short loops with branches. Mainly, it saves us a few msecs on bitops-bits-in-byte. Unfortunately, it was causing problems for this ai-astar. I hacked up the check a bit so that it factors in goodOps when deciding whether a loop is unprofitable. I'm not terribly happy with this solution, but I couldn't think of another way to make both benchmarks go fast.
Attachment #489339 - Flags: review?(dmandelin)
(In reply to comment #9)
> 
> 1. It factors reads of dense arrays into the goodOps calculation. I found that
> reads are faster in the tracer, while writes are faster in the methodjit. So
> only reads get counted.

Huh.  Any idea why writes are faster in the methodjit?

And thanks for looking into this!
Comment on attachment 489336 [details] [diff] [review]
patch to profile a maybeShort loop twice

Great ideas. Just a few nits, r+ with those fixed.

>+uint32
>+GetHotloop(JSContext *cx)

This makes me think it's going to return a loop bytecode offset or something. How about |GetHotLoopCount| or |GetHotLoopThreshold| or some such?

>+            scriptTICs[i].loopCounterStart = GetHotloop(cx);
>+            scriptTICs[i].loopCounter = scriptTICs[i].loopCounterStart;

I think it's better like this:

  scriptTICs[i].loopCounter = scriptTICs[i].loopCounterStart = GetHotloop(cx);

>     /* This data is used by the tracing JIT. */
>     void *traceData;
>     uintN traceEpoch;
>+    uint32 loopCounter;
>+    uint32 loopCounterStart;
> 
>     bool hasSlowTraceHint : 1;

Please add 1-line comments to these items.
Attachment #489336 - Flags: review?(dmandelin) → review+
Attachment #489339 - Flags: review?(dmandelin) → review+
> while writes are faster in the methodjit.

Were these writes to holes?  If so, were you hitting bug 610583?  It's worth remeasuring with current TM tip plus the patch I'm about to post in that bug.
http://hg.mozilla.org/tracemonkey/rev/d2dd2db256ce
Whiteboard: [jsperf] → [jsperf] fixed-in-tracemonkey
Bill, if you post the test you used to measure read/write speed, I'm happy to give it a whirl now that bug 610583 is fixed in TM.
Created attachment 490020 [details]
array benchmark

Here's the array microbenchmark. For -j I get:
and 11
read 16
write 33

And for -m I get:
and 10
read 19
write 28

The & test is just to show that the array stuff is what's being measured (hopefully).

The differences are not big, but I've also noticed that JSNES gets slower if I count array writes as being faster in the tracer. So I'd prefer to keep things this way for now.
That's fine, but might be worth filing a bug to make TM array writes faster...
Attachment #490020 - Attachment is patch: false
Whiteboard: [jsperf] fixed-in-tracemonkey → [jsperf]

Comment 18

7 years ago
http://hg.mozilla.org/mozilla-central/rev/d2dd2db256ce
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED

Updated

7 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 580468
No longer depends on: 580468

Comment 19

7 years ago
What needs to be done for this to be resolved?
(In reply to comment #19)
> What needs to be done for this to be resolved?

The patch in this bug that factors array access into goodOps in the profiler makes us slower on SunSpider, because it causes us to trace some SunSpider benchmarks that shouldn't be traced. We could try to find a different way to avoid tracing those SunSpider benchmarks, or we could just accept the hit.
Note that there are some benchmarks (including, say, roc's tracer thingie) that get faster if we count array ops as good for tracing...  :(
(In reply to comment #17)
> That's fine, but might be worth filing a bug to make TM array writes faster...

Was this ever investigated?
(In reply to comment #22)
> (In reply to comment #17)
> > That's fine, but might be worth filing a bug to make TM array writes faster...
> 
> Was this ever investigated?

Array reads and writes are extremely common.  They've already improved quite a lot since Firefox 3.6 -- they used to involve a call to a C++ function, now they just involve inline code (in the common cases), and CSE has improved enough that array writes that follow array reads can often avoid duplicated checks (eg. the "is this an array?" check).

So yes, it's been extensively investigated, and filing a bug "make TM array writes faster" is likely to be useless without some concrete suggestions on how to make them faster.

Type inference (bug 619423) should help some more, I plan to integrate type inference into TM as soon as Firefox 4.0 is done.

Comment 24

7 years ago
(In reply to comment #20)
> (In reply to comment #19)
> > What needs to be done for this to be resolved?
> 
> The patch in this bug that factors array access into goodOps in the profiler
> makes us slower on SunSpider, because it causes us to trace some SunSpider
> benchmarks that shouldn't be traced. We could try to find a different way to
> avoid tracing those SunSpider benchmarks, or we could just accept the hit.

I was confused by comment 18 because it didn't also mention http://hg.mozilla.org/mozilla-central/rev/cb76b2d61096 so I thought that the patches had landed on mozilla-central. Thanks for the explanation.
I'm against accepting perf hits in the current competitive market setting. I'm also Captain Kirk: I don't believe in the no-win scenario. Reprogram the sim if you have to (cheat). Kidding aside, this does not seem like a no-win situation.

/be
I checked in part of this patch. The rest will follow.

http://hg.mozilla.org/tracemonkey/rev/b76f25a97aeb
Bill, you landed these three patches together:

# 94c6d2a90ae6
Bill McCloskey – Bug 607539 - Change how profiler aborts work (r=dmandelin)

# 471a47358232
Bill McCloskey – Bug 606890 - Factor array reads and equality into profiler decision (r=dmandelin)

# 4391ed5db11f
Bill McCloskey – Bug 606890 - Change how profiler recognizes "expensive" inner loops (r=dmandelin)

Together they regressed Kraken by 15.2% and Sunspider by 2.4%, according to AWFY2.  Not good!  I'm not sure which particular tests were affected, AFAICT AWFY2 doesn't have test-by-test results (dvander, is that right?)
I don't know what's going on here. I expected a small SunSpider regression, although I didn't think it would be quite so large.

The Kraken regression is bizarre. On my machine, these patches make Kraken 4.5% faster, since they fix ai-astar. A few of the Kraken benchmarks get slower, but not by much.

I'd like to leave these in until tomorrow; they don't really affect any other code. Perhaps if there's some way to recover individual test data from AWFY2, I can back them out individually and get a better sense of what went wrong.
dvander said on IRC that he'll try to add the individual test data to AWFY2 tomorrow morning.
Ok, this is weird.  I ran Cachegrind to get instruction counts for 60021:ae34ea3455f2 (just before your patches) and 60026:0ccd15d19393 (which includes your patches plus two tiny other ones that won't affect perf).  Kraken results:

---------------------------------------------------------------
| millions of instructions executed                           |
| total                        | compiled (may overestimate)  |
---------------------------------------------------------------
|  4216.451  3522.743 (1.197x) |  4046.779  3330.801 (1.215x) | ai-astar
|  1999.057  1817.284 (1.100x) |  1357.544  1323.573 (1.026x) | audio-beat-det
|  1368.181  1368.180 (------) |  1170.662  1170.662 (------) | audio-dft
|  1721.461  1721.058 (------) |  1248.983  1248.349 (1.001x) | audio-fft
|  2608.416  2445.879 (1.066x) |  1790.584  1626.967 (1.101x) | audio-oscillat
|  5607.396  5607.396 (------) |  4756.206  4756.205 (------) | imaging-gaussi
|  1971.938  1971.935 (------) |   721.081   721.081 (------) | imaging-darkro
|  4376.700  4376.704 (------) |  3533.216  3533.216 (------) | imaging-desatu
|   641.743   641.743 (------) |     9.560     9.560 (------) | json-parse-fin
|   434.922   434.922 (------) |     4.406     4.406 (------) | json-stringify
|  1125.539  1125.538 (------) |   727.912   727.912 (------) | stanford-crypt
|   671.762   671.734 (------) |   370.102   370.101 (------) | stanford-crypt
|  1171.979  1171.962 (------) |   597.351   597.351 (------) | stanford-crypt
|   528.166   528.148 (------) |   200.365   200.364 (------) | stanford-crypt
-------
| 28443.717 27405.231 (1.038x) | 20534.758 19620.555 (1.047x) | all

ie. it's a clear improvement, which matches the timing results on your machine.

As for SunSpider, I saw 2% worse counts for fannkuch and tofte and everything else was unchanged.  We need those individual results!
Looks like ai-astar improved by 250ms, gaussian-blur worsened by 1300ms, others didn't change much.

Comment 33

7 years ago
ss-nbody regressed 100% by 7.9ms -- this was tracing (hovering near the orange line) but is not any more (hovering near the black line).
You need to log in before you can comment on or make changes to this bug.