Closed Bug 691314 Opened 14 years ago Closed 9 years ago

Regression in Peacekeeper 2.0 between Fx7 and Fx10

Categories

(Core Graveyard :: Tracking, defect)

x86
macOS
defect
Not set
normal

Tracking

(firefox9+)

RESOLVED INCOMPLETE
Tracking Status
firefox9 + ---

People

(Reporter: zbraniecki, Unassigned)

References

Details

(Keywords: regression)

Spin off of bug 499198 comment 20. In the recently updated Peacekeeper, Fx10 is ~20% slower in most tests than Fx7. https://docs.google.com/spreadsheet/ccc?key=0Ah9TBa-qpKojdHBDNmFDNUduOHFnLUk3XzRvX0swaFE&hl=en_US#gid=0
Can we get a regression range here? I just tried turning off TI on tip and that does NOT get us back to the Fx7 numbers, for what it's worth... Then again, I'm not sure whether that actually caused the tracer to run or not. :(
Also, can we get numbers on beta and Aurora so we have an idea of which branches we need to track this on?
Keywords: regression
(In reply to Boris Zbarsky (:bz) from comment #1) > Then again, I'm not sure whether that actually caused the tracer to run or not. > :( It should, the tracer still works (and is used in, e.g., mobile chrome).
That's strange. I just ran stringFilter (http://peacekeeper.futuremark.com/run.action?repeat=1&forceSuiteName=string&forceTestName=stringstringFilter) and FF was close to Chrome (124000 vs. 140000). Running the test in the shell, I find us slightly faster than v8. So I'm not sure how we are so pathetically worse in comment 0.
I ran this on Aurora 9.0a2 (2011-10-03) and I'm seeing a slow down. My numbers are: render renderGrid01 57.81 56.91 render renderGrid02 63.36 59.69 render renderGrid03 4.4 4.75 render renderPhysics 21.6 24.68 experimental experimentalRipple01 11.58 16.67 experimental experimentalRipple02 4.44 6.38 array arrayCombined 12540.5 29007 array arrayWeighted 38008.5 46855 dom domGetElements 243153.5 361713.5 dom domDynamicCreationCreateElement 10846 11597 dom domDynamicCreationInnerHTML 16898.5 16139 dom domJQueryAttributeFilters 1956 2370 dom domJQueryBasicFilters 631 813 dom domJQueryBasics 1309.5 1617.5 dom domJQueryContentFilters 815 1103 dom domJQueryHierarchy 2423.5 2982.5 dom domQueryselector 11990 12864.5 string stringChat 28339 38985 string stringDetectBrowser 108291 143724.5 string stringFilter 2206.5 2159 The first number is from Aurora and the second number is from release. This is on Windows XP.
What about beta?
Here's with beta. The first column is Aurora, the second column is Beta, and the third is release: render renderGrid01 57.81 57.18 56.91 render renderGrid02 63.36 56.84 59.69 render renderGrid03 4.4 4.73 4.75 render renderPhysics 21.6 24.94 24.68 experimental experimentalRipple01 11.58 16.6 16.67 experimental experimentalRipple02 4.44 6.28 6.38 array arrayCombined 12540.5 27723.5 29007 array arrayWeighted 38008.5 50005 46855 dom domGetElements 243153.5 359478 361713.5 dom domDynamicCreationCreateElement 10846 12005 11597 dom domDynamicCreationInnerHTML 16898.5 15626.5 16139 dom domJQueryAttributeFilters 1956 2365.5 2370 dom domJQueryBasicFilters 631 840 813 dom domJQueryBasics 1309.5 1613.5 1617.5 dom domJQueryContentFilters 815 1083.5 1103 dom domJQueryHierarchy 2423.5 2941.5 2982.5 dom domQueryselector 11990 12800.5 12864.5 string stringChat 28339 35114.5 38985 string stringDetectBrowser 108291 148330 143724.5 string stringFilter 2206.5 2120 2159 A few tests have a small regression in beta, but the biggest regression is from beta to aurora.
(In reply to Luke Wagner [:luke] from comment #4) > That's strange. I just ran stringFilter > (http://peacekeeper.futuremark.com/run. > action?repeat=1&forceSuiteName=string&forceTestName=stringstringFilter) and > FF was close to Chrome (124000 vs. 140000). Running the test in the shell, > I find us slightly faster than v8. So I'm not sure how we are so > pathetically worse in comment 0. stringFilter is one of the few tests that didn't regress.
(In reply to developer@ckiweb.com from comment #8) Yeah, I know it's technically off-topic but 13x slower is huge (esp. when we beat them when the test is run in isolation (on my MBP just now); something weird is happening). Boris do you know any way to snapshot the test so I can edit it to stop/start shark during stringFilter?
Mats used to have an hg repo of individual tests for the old peacekeeper with shark hooks. Not sure how hard it would be to update that to the new tests... I hate benchmarks that hide the tests. Requesting tracking for 9 based on comment 7.
One more testing request (from comment 1): javascript.options.typeinference=false. (This requires a full browser restart for the change to have effect.)
I tried to figure out which nightly had the regression. It looks like it is between 8/29 and 8/30 for the Windows XP nightly. For two tests (stringDetectBrowser and domJQueryBasicFilters), I have the following numbers: 08/29/11 08/30/11 stringDetectBrowser 808 671 domJQueryBasicFilters 135000 118195
Here's the numbers with typeinference=false and after a full browser restart. render renderGrid01 57.73 render renderGrid02 63.15 render renderGrid03 4.64 render renderPhysics 24.03 experimental experimentalRipple01 13.09 experimental experimentalRipple02 4.94 array arrayCombined 20707 array arrayWeighted 44688 dom domGetElements 282188.5 dom domDynamicCreationCreateElement 11415 dom domDynamicCreationInnerHTML 16847 dom domJQueryAttributeFilters 2099 dom domJQueryBasicFilters 738 dom domJQueryBasics 1448.5 dom domJQueryContentFilters 917.54 dom domJQueryHierarchy 2677 dom domQueryselector 12371 string stringChat 33738 string stringDetectBrowser 120291.5 string stringFilter 2047.5
(In reply to Luke Wagner [:luke] from comment #9) I have a copy of the tests over at peacekeeper.therichins.net. stringFilter is test26.html. Also, bug 503107 deals with the stringFilter test. Its attachment (https://bugzilla.mozilla.org/attachment.cgi?id=387471) looks to be the test in a single javascript file.
(In reply to developer@ckiweb.com from comment #14) Thanks! Using that file I can repro the massive discrepancy and it looks like we are just missing a simple optimization: bug 691797. It seems like the link in comment 4 was just running totally different js.
(In reply to developer@ckiweb.com from comment #13) > Here's the numbers with typeinference=false and after a full browser restart. Thanks much for running these tests. Collecting, filtering, and rounding a bit, I see: Fx9/noTI Fx9 Fx8 Fx7 array arrayCombined 20700 12540 27723 29000 array arrayWeighted 44700 38000 50000 47000 dom domGetElements 282200 243150 360000 360000 string stringChat 34000 28000 35000 39000 string stringDetect 120000 108000 148000 143000 * It would help to know a typical variance here. E.g., is the stringChat regression from 7 to 8 real, or noise? - All of the above have big regressions from 8 to 9 (I picked them that way). - All of the regressions are mitigated by turning off TI. Only the stringChat regression is eliminated entirely by turning off TI. My first (rather obvious) questions are: 1. How does turning TI off help here? I.e., exactly what's going wrong with the TI optimizations, assuming that is the problem. 2. What's the remaining regression after TI is disabled? It could be from the type objects and other adaptations that enable TI. But it could also be from other sources. It would be good to get that aspect bisected or otherwise diagnosed.
> * It would help to know a typical variance here. E.g., is the stringChat > regression from 7 to 8 real, or noise? For stringChat, the lowest that I've seen in 7 is about 3700 but all the rest are above 3800 (I did about 10 refreshes). In 8, the highest I've seen is a little over 3500 (about 10 refreshes). So, I would think that 7 to 8 was a regression.
> 1. How does turning TI off help here? Presumably by turning TM back on, right? Some of the deps of bug 499198 were actually analyzed and fixed in TM at one point....
I haven't confirmed with profiling, but looked at the array/string tests and these are simple benchmarks which pound on array, string and regexp natives (not jitcode) over and over. The array natives are the main concern: these got slower with the TI landing because the VM does the simple, slow, dumb thing of updating type information whenever it modifies an array. e.g. a.concat(b) will cause the VM to explicitly update types in the result for every element in 'a' and 'b' (type updates in the VM are super slow; one of the main fixes for dromaeo DOM perf was moving these into jitcode for monitoring the results of property access stubs). Plan now is to do specialized natives for these. For the natives Dromaeo uses (concat/slice/splice/reverse/sort), this will be faster and easier to do than self hosted (main benefit for self hosted is things like forEach I think). The compiler knows the contents of the arrays and whether they are packed, so can update the types of modified arrays ahead of time (we already do this when inlining Array.push) and can call a native which knows its input array is packed. A lot of the array natives get simpler when the inputs are known to be packed, e.g. concat/slice/splice can be implemented as a memcpy. arrayCombine repeatedly sorts a 500 element packed array of integers. Every time it does this, we make a shadow array with every single element converted to a string, and sort the array using string comparison. We should be able to leverage knowledge about the array's contents to avoid this, using a comparator that sorts integer arrays lexicographically. stringChat calls String.replace over and over. I don't know why this would be affected by the TI merge. stringDetectBrowser does /regexp/.test(string) over and over. I noticed with the dromaeo DOM stuff that /regexp/ cloning got slower with the TI merge, haven't looked into why. The cloning can be avoided entirely for this pattern, though, since the compiler knows a regexp native is being called which won't let its 'this' value escape. Just bake in the parser-created regexp. The microbenchmarky allocate-things-in-a-loop nature of many of these tests means we can't catch Chrome without a generational GC.
Edit: 'For the natives Dromaeo uses ...' => 'For the natives Peacekeeper uses'
(In reply to Brian Hackett from comment #19) > The microbenchmarky allocate-things-in-a-loop nature of many of these tests > means we can't catch Chrome without a generational GC. I used to strongly hold this opinion, but it is becoming less clear. Try timing: var o; for (var i = 0; i < 1000000; ++i) o = {x:i}; js -m -n 32-bit is faster than v8. Ok, v8 does poorly in global code, so put it in a closure. Then we are at parity. Ok, this is unrealistic since it only stores, so add an access to o.x; SM is faster. I know TI is winning here and covering for losses due to additional cache misses due to lack of generational GC, but I found this result surprising as it seems to deemphasize generational GC's importance.
Depends on: 683804
Depends on: 692657
Depends on: 692847
Depends on: 692960
One other note. This was mentioned at http://forums.mozillazine.org/viewtopic.php?p=11198075#p11198075 about a month before this bug got filed. Too bad no bug ever got reported for that. :(
I am unable to complete Peacekeeper 2.0 benchmark on Nightly. It gives me error on String test.
Ahmad, can you please file a separate bug on that? When did the problem appear?
^OK, I am not very much expert in filing bug, if any stuff here and other occur please fix them please. I think it occur when Brian landed some performance patches for making Data benchmark faster.
(In reply to Ahmad Saleem (zlip) from comment #24) > I am unable to complete Peacekeeper 2.0 benchmark on Nightly. It gives me > error on String test. It WFM in the 11/1 nightly on Windows.
Marking all tracking bugs which haven't been updated since 2014 as INCOMPLETE. If this bug is still relevant, please reopen it and move it into a bugzilla component related to the work being tracked. The Core: Tracking component will no longer be used.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.