Closed
Bug 604905
Opened 14 years ago
Closed 14 years ago
Significant javascript performance regression since ff4b6
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | betaN+ |
People
(Reporter: cers, Assigned: dvander)
References
(Blocks 2 open bugs, )
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(3 files)
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b8pre) Gecko/20101016 Firefox/4.0b8pre Build Identifier: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b8pre) Gecko/20101016 Firefox/4.0b8pre Testing out an image processing algorithm I've recently ported from C, I noticed a significant difference between the performance in b6 and recent nightlies. There appears to be a slowdown in the order of 1000% in the nightlies compared to b6. Here are representative scores from b6 and Gecko/20101016 Firefox/4.0b8pre using the same machine and profile. b6: (10x10) = 2ms (50x50) = 3ms (100x100) = 3ms (500x500) = 91ms (1000x1000) = 415ms b8pre: (10x10) = 3ms (50x50) = 7ms (100x100) = 23ms (500x500) = 912ms (1000x1000) = 5108ms Reproducible: Always Steps to Reproduce: 1. Go to http://www.geeksbynature.dk/projects/distance/test.html 2. Compare result with those from beta6 Actual Results: Large slowdown. Expected Results: Slight speedup.
I'm seeing the exact same issue on Windows XP: Mozilla/5.0 (Windows NT 5.1; rv:2.0b8pre) Gecko/20101016 Firefox/4.0b8pre ID:20101016082013 ~B
Updated•14 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 2•14 years ago
|
||
Regression range in tracemonkey tree. (On Windows build) Fast:20100909043955(f5f47e8fbc15) Slow:20100910044908(a1f43f4ef565) Pushlog: http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=f5f47e8fbc15&tochange=a1f43f4ef565
Comment 3•14 years ago
|
||
The code causes slowness is seemed the line 140 of distance.js row_copy = this.data.slice(this.lut[r], this.lut[r+1]); in my environment, it takes about 2.4 sec.(in case of 1000x1000)
Comment 4•14 years ago
|
||
Regression range in TM: Fast: http://hg.mozilla.org/tracemonkey/rev/f5f47e8fbc15 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b6pre) Gecko/20100909 Firefox/4.0b6pre ID:20100909043955 Slow: http://hg.mozilla.org/tracemonkey/rev/a1f43f4ef565 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b6pre) Gecko/20100910 Firefox/4.0b6pre ID:20100910044908 Pushlog in TM http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=f5f47e8fbc15&tochange=a1f43f4ef565
blocking2.0: --- → ?
Comment 5•14 years ago
|
||
In local build, following changeset causes the performance issue. 1b55ec0c7aee David Anderson — Fix various bugs in tracer integration (bug 593532, r=dmandelin).
Blocks: 593532
Comment 6•14 years ago
|
||
dvander, can you take a look at this regression? looks like your changes are blamed.
Comment 7•14 years ago
|
||
So the upshot here is twofold: 1) -j -m is a major slowdown over -j on this testcase. 2) -j is a slowdown over _interp_ on this testcase. Both seem like problems. #1 may be addressed by heuristics tweaking, but #2 seems like we should fix it anyway (maybe in a separate bug?). Data: mozilla% foreach args ( "" "-j" "-m" "-m -j" ) foreach? echo "args: $args" foreach? ./js/src/opt-obj/js $args ~/test.js foreach? end args: (10x10) = 0ms (50x50) = 6ms (100x100) = 22ms (500x500) = 566ms (1000x1000) = 2330ms args: -j (10x10) = 2ms (50x50) = 2ms (100x100) = 4ms (500x500) = 59ms (1000x1000) = 233ms args: -m (10x10) = 0ms (50x50) = 5ms (100x100) = 27ms (500x500) = 687ms (1000x1000) = 2641ms args: -m -j (10x10) = 2ms (50x50) = 4ms (100x100) = 15ms (500x500) = 485ms (1000x1000) = 1585ms
Comment 8•14 years ago
|
||
Er, I meant: 2) -m is a slowdown over interp on this testcase of course.
Comment 9•14 years ago
|
||
I did a shark profile of the 1000x1000 -m case: 22% in mjit-generated code. 26% under js_AtomizeString (called from stubs::GetElem) 17% under js_ValueToString (called from stubs::GetElem) 13% under TypedArrayTemplate<double>::obj_getProperty 9% in stubs::GetElem itself 8% under stubs::Interrupt (doing gc) 2% under stubs::SetElem 2% under stubs::Equal
Comment 10•14 years ago
|
||
That string/atom gunk is under js_InternNonIntElementId, and it happens because rref.isDouble(). The double's value is things like 218191, 216191, etc. For each value we hit the string gunk several times (at least 5; I stopped counting after that). This seems like problem #3 here... Problem #2 seems like bug 594247 might help (but why are we slower than interp? and why are those values doubles?). For problem #1 I'd love to try the patch in bug 580468, but I can't find a tracemonkey revision it actually applies to.
Comment 11•14 years ago
|
||
In fact, if I change the testcase to use regular arrays instead of Float64 ones (by just throwing in the try blocks there), I get these numbers: mozilla% foreach args ( "" "-j" "-m" "-m -j" ) foreach? echo "args: $args" foreach? ./js/src/opt-obj/js $args ~/test.js foreach? end args: (10x10) = 0ms (50x50) = 5ms (100x100) = 19ms (500x500) = 437ms (1000x1000) = 1740ms args: -j (10x10) = 2ms (50x50) = 3ms (100x100) = 3ms (500x500) = 56ms (1000x1000) = 268ms args: -m (10x10) = 1ms (50x50) = 1ms (100x100) = 4ms (500x500) = 84ms (1000x1000) = 356ms args: -m -j (10x10) = 2ms (50x50) = 1ms (100x100) = 3ms (500x500) = 28ms (1000x1000) = 130ms Reporter, are you possibly using values out of those arrays as array indices themselves?
Comment 12•14 years ago
|
||
fwiw, the v8 shell times (using untyped arrays) are: (10x10) = 1ms (50x50) = 1ms (100x100) = 2ms (500x500) = 53ms (1000x1000) = 191ms So -m -j beats it, but -m alone is 2x slower. For -m alone on untyped arrays, 91% of the time is in mjit-generated code.
Comment 13•14 years ago
|
||
Which may be another (4th?) bug we need here.
Comment 14•14 years ago
|
||
Note: tracer also handles double args to JSOP_GETELEM via the "convert it to string" slow path, as far as I can tell. So presumably tracer is NOT seeing doubles there for some reason???
Reporter | ||
Comment 15•14 years ago
|
||
(In reply to comment #11) > In fact, if I change the testcase to use regular arrays instead of Float64 ones > [snip] > Reporter, are you possibly using values out of those arrays as array indices > themselves? Well spotted - I've changed the relevant arrays to Uint32Array and the scores are much improved. I am still seeing a ~100% slowdown though, compared to the original b6 results and ~400% slowdown compared with the new version in b6. @@ -32,7 +32,7 @@ function ImgPUInt32(im, c, r) { try { this.data = Float64Array(im); - this.lut = Float64Array(r+1); + this.lut = Uint32Array(r+1); } catch (e) { this.data = im; this.lut = Array(r+1); @@ -103,9 +103,9 @@ var MAX = -1 >>> 1; try { - s = Float64Array(cols); - t = Float64Array(cols); - row_copy = Float64Array(cols); + s = Uint32Array(cols); + t = Uint32Array(cols); + row_copy = Uint32Array(cols); } catch (e) { s = Array(cols); t = Array(cols);
Reporter | ||
Comment 16•14 years ago
|
||
(In reply to comment #15) > @@ -32,7 +32,7 @@ > function ImgPUInt32(im, c, r) { > try { > this.data = Float64Array(im); > - this.lut = Float64Array(r+1); > + this.lut = Uint32Array(r+1); > } catch (e) { > this.data = im; > this.lut = Array(r+1); > @@ -103,9 +103,9 @@ > var MAX = -1 >>> 1; > > try { > - s = Float64Array(cols); > - t = Float64Array(cols); > - row_copy = Float64Array(cols); > + s = Uint32Array(cols); > + t = Uint32Array(cols); > + row_copy = Uint32Array(cols); > } catch (e) { > s = Array(cols); > t = Array(cols); Might in fact make sense to keep row_copy as Float64 - it's not used as index, and float seems faster. (Also, the data copied into it using that diff would be coming from a Float64Array, but changing both to Uint32Array was slightly slower)
Comment 17•14 years ago
|
||
If I make the changes suggested in the combination of comment 15 and comment 16, then I get these numbers: args: (10x10) = 0ms (50x50) = 7ms (100x100) = 21ms (500x500) = 538ms (1000x1000) = 2183ms args: -j (10x10) = 2ms (50x50) = 1ms (100x100) = 2ms (500x500) = 38ms (1000x1000) = 131ms args: -m (10x10) = 1ms (50x50) = 3ms (100x100) = 11ms (500x500) = 252ms (1000x1000) = 1027ms args: -j -m (10x10) = 2ms (50x50) = 3ms (100x100) = 10ms (500x500) = 180ms (1000x1000) = 729ms A profile of the -m case shows that the double-to-atom stuff is all gone; what remains seems to be mostly bug 594247. One interesting thing here is that I just did profiles for the -j case for both the original code (where we taks 230ms) and the new code (where we take 130ms). The former definitely shows some double-to-atom conversions, totaling about 25% of the time. There are also stub calls to js_GetPropertyByName, etc. So it's doing things similar to what -m is doing in that case, I think. The question is why it's still so much faster with -j. Given comment 10 paragraph 2, and this code in the testcase: var tmp = ( (u*u - (s[q]*s[q]) + this.data[u+img_row] - this.data[s[q]+img_row]) / (2*(u - s[q])) ); I wonder whether tracejit is just managing to CSE the property accesses... But I wouldn't have thought those would be CSE-eable...
Comment 18•14 years ago
|
||
And in fact, if I just count calls to js_AtomizeString while running the 1000x1000 case of the original code, I get 9367611 under methodjit and 777501 under tracejit. Not sure whether this is also worth investigating. dvander, do you want me to try to file separate bugs blocking this one for the various things that might be issues here, or just shut up and let you take a look too?
Comment 19•14 years ago
|
||
(In reply to comment #10) > Problem #2 seems like bug 594247 might help (but why are we slower than interp? > and why are those values doubles?). I would guess bug 594247 is the major malfunction here. It will probably be a bunch easier to figure out what else might be hurting us after we get that fixed. > For problem #1 I'd love to try the patch in bug 580468, but I can't find a > tracemonkey revision it actually applies to. We're going to land that one to TM very soon.
Comment 20•14 years ago
|
||
Agreed. Let's get those two bugs fixed (which might well just fix the -j -m case) and then remeasure.
Updated•14 years ago
|
blocking2.0: ? → betaN+
Comment 21•14 years ago
|
||
Assignee | ||
Comment 22•14 years ago
|
||
With the test case from comment #21, and bug 594247 applied: dvander@dvander:~/mozilla/tracemonkey/js/src$ ./Opt32/js -m -j -p 2k.js (10x10) = 1ms (50x50) = 2ms (100x100) = 3ms (500x500) = 64ms (1000x1000) = 259ms The original test case: dvander@dvander:~/mozilla/tracemonkey/js/src$ ./Opt32/js -m -j -p k.js (10x10) = 1ms (50x50) = 3ms (100x100) = 9ms (500x500) = 265ms (1000x1000) = 1086ms dvander@dvander:~/mozilla/tracemonkey/js/src$ ./Opt32/js -j k.js (10x10) = 2ms (50x50) = 1ms (100x100) = 2ms (500x500) = 51ms (1000x1000) = 230ms I'm hesitant to block on bug 608479, since adding an IC to convert double indices to integers is a little complicated, and an unusual use case (which here appears to have been accidental).
Comment 23•14 years ago
|
||
David, I think the main question here is why there are so many more double-to-string conversions with -m than with -j on the original testcase and whether we can easily fix that.....
Assignee | ||
Comment 24•14 years ago
|
||
Thanks, good point. The atomization only happens in GETELEM, and only because GETELEM, unlike other key-is-value opcodes, does not call ValueFitsInInt32. That seems worth fixing. That brings the slow version down to: dvander@dvander:~/mozilla/tracemonkey/js/src$ ./Opt32/js -m bz.js (10x10) = 1ms (50x50) = 2ms (100x100) = 5ms (500x500) = 139ms (1000x1000) = 558ms Which is a big improvement - but obviously, way better to use an integer array in the first place, which will IC.
Comment 25•14 years ago
|
||
Comment on attachment 489755 [details] [diff] [review] fix Because ValueFitsInInt32 will only be picking up the double case, and I can't see us adding any other value types to the set of "fits in int32", it like you could write: } else if (v.isDouble() && JSDOUBLE_IS_INT32(v.toDouble(), &i) && INT_FITS_IN_JSID(i)) { id = INT_TO_JSID(i); } r+ either way.
Attachment #489755 -
Flags: review?(lw) → review+
Comment 26•14 years ago
|
||
Yep, I can confirm that with the attached patch -m on the first testcase ends up 3x faster (and in fact ends up about as fast as on the second testcase). If I also apply the patch in bug 594247, then JM ends up within a factor of 2.5 of TM on the testcases here. Best yet, with the patch for bug 608480 I see -j -m -p being about as good as just -j on the attached testcases (and much faster than -j -m), for total win. ;)
Assignee | ||
Comment 27•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/9bff531727b2
Whiteboard: fixed-in-tracemonkey
Comment 28•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/9bff531727b2
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•