Closed Bug 691314 Opened 13 years ago Closed 8 years ago

Regression in Peacekeeper 2.0 between Fx7 and Fx10

Categories

(Core Graveyard :: Tracking, defect)

x86
macOS
defect
Not set
normal

Tracking

(firefox9+)

RESOLVED INCOMPLETE
Tracking Status
firefox9 + ---

People

(Reporter: zbraniecki, Unassigned)

References

Details

(Keywords: regression)

Spin off of bug 499198 comment 20.

In the recently updated Peacekeeper, Fx10 is ~20% slower in most tests than Fx7.

https://docs.google.com/spreadsheet/ccc?key=0Ah9TBa-qpKojdHBDNmFDNUduOHFnLUk3XzRvX0swaFE&hl=en_US#gid=0
Can we get a regression range here?  I just tried turning off TI on tip and that does NOT get us back to the Fx7 numbers, for what it's worth...  Then again, I'm not sure whether that actually caused the tracer to run or not.  :(
Also, can we get numbers on beta and Aurora so we have an idea of which branches we need to track this on?
Keywords: regression
(In reply to Boris Zbarsky (:bz) from comment #1)
> Then again, I'm not sure whether that actually caused the tracer to run or not. 
> :(

It should, the tracer still works (and is used in, e.g., mobile chrome).
That's strange.  I just ran stringFilter (http://peacekeeper.futuremark.com/run.action?repeat=1&forceSuiteName=string&forceTestName=stringstringFilter) and FF was close to Chrome (124000 vs. 140000).  Running the test in the shell, I find us slightly faster than v8.  So I'm not sure how we are so pathetically worse in comment 0.
I ran this on Aurora 9.0a2 (2011-10-03) and I'm seeing a slow down.  My numbers are:

render	renderGrid01	57.81	56.91
render	renderGrid02	63.36	59.69
render	renderGrid03	4.4	4.75
render	renderPhysics	21.6	24.68
experimental	experimentalRipple01	11.58	16.67
experimental	experimentalRipple02	4.44	6.38
array	arrayCombined	12540.5	29007
array	arrayWeighted	38008.5	46855
dom	domGetElements	243153.5	361713.5
dom	domDynamicCreationCreateElement	10846	11597
dom	domDynamicCreationInnerHTML	16898.5	16139
dom	domJQueryAttributeFilters	1956	2370
dom	domJQueryBasicFilters	631	813
dom	domJQueryBasics	1309.5	1617.5
dom	domJQueryContentFilters	815	1103
dom	domJQueryHierarchy	2423.5	2982.5
dom	domQueryselector	11990	12864.5
string	stringChat	28339	38985
string	stringDetectBrowser	108291	143724.5
string	stringFilter	2206.5	2159

The first number is from Aurora and the second number is from release.  This is on Windows XP.
What about beta?
Here's with beta.  The first column is Aurora, the second column is Beta, and the third is release:

render	renderGrid01	57.81	57.18	56.91
render	renderGrid02	63.36	56.84	59.69
render	renderGrid03	4.4	4.73	4.75
render	renderPhysics	21.6	24.94	24.68
experimental	experimentalRipple01	11.58	16.6	16.67
experimental	experimentalRipple02	4.44	6.28	6.38
array	arrayCombined	12540.5	27723.5	29007
array	arrayWeighted	38008.5	50005	46855
dom	domGetElements	243153.5	359478	361713.5
dom	domDynamicCreationCreateElement	10846	12005	11597
dom	domDynamicCreationInnerHTML	16898.5	15626.5	16139
dom	domJQueryAttributeFilters	1956	2365.5	2370
dom	domJQueryBasicFilters	631	840	813
dom	domJQueryBasics	1309.5	1613.5	1617.5
dom	domJQueryContentFilters	815	1083.5	1103
dom	domJQueryHierarchy	2423.5	2941.5	2982.5
dom	domQueryselector	11990	12800.5	12864.5
string	stringChat	28339	35114.5	38985
string	stringDetectBrowser	108291	148330	143724.5
string	stringFilter	2206.5	2120	2159

A few tests have a small regression in beta, but the biggest regression is from beta to aurora.
(In reply to Luke Wagner [:luke] from comment #4)
> That's strange.  I just ran stringFilter
> (http://peacekeeper.futuremark.com/run.
> action?repeat=1&forceSuiteName=string&forceTestName=stringstringFilter) and
> FF was close to Chrome (124000 vs. 140000).  Running the test in the shell,
> I find us slightly faster than v8.  So I'm not sure how we are so
> pathetically worse in comment 0.

stringFilter is one of the few tests that didn't regress.
(In reply to developer@ckiweb.com from comment #8)
Yeah, I know it's technically off-topic but 13x slower is huge (esp. when we beat them when the test is run in isolation (on my MBP just now); something weird is happening).  Boris do you know any way to snapshot the test so I can edit it to stop/start shark during stringFilter?
Mats used to have an hg repo of individual tests for the old peacekeeper with shark hooks.  Not sure how hard it would be to update that to the new tests...

I hate benchmarks that hide the tests.

Requesting tracking for 9 based on comment 7.
One more testing request (from comment 1): javascript.options.typeinference=false.  (This requires a full browser restart for the change to have effect.)
I tried to figure out which nightly had the regression.  It looks like it is between 8/29 and 8/30 for the Windows XP nightly.  For two tests (stringDetectBrowser and domJQueryBasicFilters), I have the following numbers:

                        08/29/11	08/30/11
stringDetectBrowser     808	   671
domJQueryBasicFilters   135000	   118195
Here's the numbers with typeinference=false and after a full browser restart.

render	renderGrid01	57.73
render	renderGrid02	63.15
render	renderGrid03	4.64
render	renderPhysics	24.03
experimental	experimentalRipple01	13.09
experimental	experimentalRipple02	4.94
array	arrayCombined	20707
array	arrayWeighted	44688
dom	domGetElements	282188.5
dom	domDynamicCreationCreateElement	11415
dom	domDynamicCreationInnerHTML	16847
dom	domJQueryAttributeFilters	2099
dom	domJQueryBasicFilters	738
dom	domJQueryBasics	1448.5
dom	domJQueryContentFilters	917.54
dom	domJQueryHierarchy	2677
dom	domQueryselector	12371
string	stringChat	33738
string	stringDetectBrowser	120291.5
string	stringFilter	2047.5
(In reply to Luke Wagner [:luke] from comment #9)

I have a copy of the tests over at peacekeeper.therichins.net.  stringFilter is test26.html.  Also, bug 503107 deals with the stringFilter test.  Its attachment (https://bugzilla.mozilla.org/attachment.cgi?id=387471) looks to be the test in a single javascript file.
(In reply to developer@ckiweb.com from comment #14)
Thanks!  Using that file I can repro the massive discrepancy and it looks like we are just missing a simple optimization: bug 691797.  It seems like the link in comment 4 was just running totally different js.
(In reply to developer@ckiweb.com from comment #13)
> Here's the numbers with typeinference=false and after a full browser restart.

Thanks much for running these tests. Collecting, filtering, and rounding a bit, I see:

                         Fx9/noTI     Fx9        Fx8        Fx7
array   arrayCombined    20700      12540      27723      29000
array   arrayWeighted    44700      38000      50000      47000
dom     domGetElements  282200     243150     360000     360000
string  stringChat       34000      28000      35000      39000
string  stringDetect    120000     108000     148000     143000

* It would help to know a typical variance here. E.g., is the stringChat regression from 7 to 8 real, or noise?

- All of the above have big regressions from 8 to 9 (I picked them that way).

- All of the regressions are mitigated by turning off TI. Only the stringChat regression is eliminated entirely by turning off TI.

My first (rather obvious) questions are:

1. How does turning TI off help here? I.e., exactly what's going wrong with the TI optimizations, assuming that is the problem.

2. What's the remaining regression after TI is disabled? It could be from the type objects and other adaptations that enable TI. But it could also be from other sources. It would be good to get that aspect bisected or otherwise diagnosed.
> * It would help to know a typical variance here. E.g., is the stringChat
> regression from 7 to 8 real, or noise?

For stringChat, the lowest that I've seen in 7 is about 3700 but all the rest are above 3800 (I did about 10 refreshes).  In 8, the highest I've seen is a little over 3500 (about 10 refreshes).  So, I would think that 7 to 8 was a regression.
> 1. How does turning TI off help here?

Presumably by turning TM back on, right?  Some of the deps of bug 499198 were actually analyzed and fixed in TM at one point....
I haven't confirmed with profiling, but looked at the array/string tests and these are simple benchmarks which pound on array, string and regexp natives (not jitcode) over and over.  The array natives are the main concern: these got slower with the TI landing because the VM does the simple, slow, dumb thing of updating type information whenever it modifies an array.  e.g. a.concat(b) will cause the VM to explicitly update types in the result for every element in 'a' and 'b' (type updates in the VM are super slow; one of the main fixes for dromaeo DOM perf was moving these into jitcode for monitoring the results of property access stubs).

Plan now is to do specialized natives for these.  For the natives Dromaeo uses (concat/slice/splice/reverse/sort), this will be faster and easier to do than self hosted (main benefit for self hosted is things like forEach I think).  The compiler knows the contents of the arrays and whether they are packed, so can update the types of modified arrays ahead of time (we already do this when inlining Array.push) and can call a native which knows its input array is packed.  A lot of the array natives get simpler when the inputs are known to be packed, e.g. concat/slice/splice can be implemented as a memcpy.  arrayCombine repeatedly sorts a 500 element packed array of integers.  Every time it does this, we make a shadow array with every single element converted to a string, and sort the array using string comparison.  We should be able to leverage knowledge about the array's contents to avoid this, using a comparator that sorts integer arrays lexicographically.

stringChat calls String.replace over and over.  I don't know why this would be affected by the TI merge.

stringDetectBrowser does /regexp/.test(string) over and over.  I noticed with the dromaeo DOM stuff that /regexp/ cloning got slower with the TI merge, haven't looked into why.  The cloning can be avoided entirely for this pattern, though, since the compiler knows a regexp native is being called which won't let its 'this' value escape.  Just bake in the parser-created regexp.

The microbenchmarky allocate-things-in-a-loop nature of many of these tests means we can't catch Chrome without a generational GC.
Edit: 'For the natives Dromaeo uses ...' => 'For the natives Peacekeeper uses'
(In reply to Brian Hackett from comment #19)
> The microbenchmarky allocate-things-in-a-loop nature of many of these tests
> means we can't catch Chrome without a generational GC.

I used to strongly hold this opinion, but it is becoming less clear.  Try timing:

  var o;
  for (var i = 0; i < 1000000; ++i)
    o = {x:i};

js -m -n 32-bit is faster than v8.  Ok, v8 does poorly in global code, so put it in a closure.  Then we are at parity.  Ok, this is unrealistic since it only stores, so add an access to o.x; SM is faster.  I know TI is winning here and covering for losses due to additional cache misses due to lack of generational GC, but I found this result surprising as it seems to deemphasize generational GC's importance.
Depends on: 683804
Depends on: 692657
Depends on: 692847
Depends on: 692960
One other note.  This was mentioned at http://forums.mozillazine.org/viewtopic.php?p=11198075#p11198075 about a month before this bug got filed.  Too bad no bug ever got reported for that.  :(
I am unable to complete Peacekeeper 2.0 benchmark on Nightly. It gives me error on String test.
Ahmad, can you please file a separate bug on that?  When did the problem appear?
^OK, I am not very much expert in filing bug, if any stuff here and other occur please fix them please. I think it occur when Brian landed some performance patches for making Data benchmark faster.
(In reply to Ahmad Saleem (zlip) from comment #24)
> I am unable to complete Peacekeeper 2.0 benchmark on Nightly. It gives me
> error on String test.

It WFM in the 11/1 nightly on Windows.
Marking all tracking bugs which haven't been updated since 2014 as INCOMPLETE.
If this bug is still relevant, please reopen it and move it into a bugzilla component related to the work
being tracked. The Core: Tracking component will no longer be used.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.