Last Comment Bug 499198 - (peacekeeper) [meta]Tracking performance on Peacekeeper benchmark
(peacekeeper)
: [meta]Tracking performance on Peacekeeper benchmark
Status: NEW
: meta
Product: Core
Classification: Components
Component: General (show other bugs)
: Trunk
: All All
: -- normal with 17 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
http://peacekeeper.futuremark.com/
Depends on: 592786 601176 606734 608648 608880 609704 609835 626165 692009 919992 920508 920659 171262 200505 484614 499199 499201 499235 503107 503141 504920 506813 540985 544477 553342 553348 562034 603872 605385 606648 606650 609212 609229 609296 610077 617136 691797 697343 723481 743615 916851 917247 917258 917814 917839 918746 922016 922018 922048 922053 922063 922071 929087 945737 1004923 1024132
Blocks: WBGP
  Show dependency treegraph
 
Reported: 2009-06-18 13:46 PDT by Boris Zbarsky [:bz]
Modified: 2016-07-05 11:40 PDT (History)
52 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
collected results (1.14 KB, text/plain)
2010-01-20 13:00 PST, nemo
no flags Details
jquery-1.3.1 (53.98 KB, application/x-javascript)
2010-11-02 20:54 PDT, Trev
no flags Details
assetmanager.js (2.96 KB, application/x-javascript)
2010-11-02 21:05 PDT, Trev
no flags Details
Peacekeeper 2.0 beta (2011-10-03) results on Linux x86-64 (no HWA) (7.33 KB, text/html)
2011-10-03 05:56 PDT, Mats Palmgren (vacation)
no flags Details
Peacekeeper tests in one file (23.92 KB, text/html)
2011-10-05 21:47 PDT, Trev
no flags Details
jquery-1.6.5.min.js (89.52 KB, application/x-javascript)
2011-10-17 20:37 PDT, Trev
no flags Details
Peacekeeper tests in one file (185.84 KB, text/html)
2011-10-17 20:43 PDT, Trev
no flags Details

Description Boris Zbarsky [:bz] 2009-06-18 13:46:19 PDT
This is a bug to track our performance on the peacekeeper benchmark.  Mats rocks and has created a locally-runnable version; you can get it from http://hg.mozilla.org/users/mpalmgren_mozilla.com/peacekeeper
Comment 1 Robert Sayre 2009-07-07 07:12:13 PDT
The fix for bug 200505 has improved our performance on the "data" segment of the test from 2328 (3.5.1pre) to 4300 (3.6a1) on my machine.
Comment 2 nemo 2010-01-19 13:28:38 PST
Peacekeeper tests seem to have gotten slower lately in past few months.
I assumed it was some temporary thing, but it hasn't gone away.
My highest Minefield score was back in 2009-09 / 2009-10 - 2636

Since then I've only seen declines in overall score, although the annoying interface made it hard to see what was dropping until I created a new test set.

Downloading the offline test suite and comparing a nightly from 2009-10-05 for windows against latest Minefield using a frest test run I get:
                    [2010-01-19]    [2009-10-05]        [% drop in latest]
Total:              2458            2544                 3.4%
Rendering:          1860            2268                18.0%
Social networking:  2639            2204               -19.7% (dramatic improvement)
Complex Graphics:   3923            4158                 5.7%
Data:               4580            4614                 0.7%
DOM operations:     1970            1987                 0.9%
Text parsing:       2026            2326                12.9%

Now, if I download the standalone version and try a few from the categories above that lost the most ground:
                    [2010-01-19]    [2009-10-05]        [% drop in latest]
renderChart         38.577ops       58.509ops           34.1%
renderPhotoZoom     12.309ops       56.743              78.3%
stringRegexpEmail   98039.216ops    129870.130ops       24.5%
stringValidateForm  58139.535ops    87719.298ops        33.7%

This machine is a dual core laptop, kinda surprises me things have gotten slower.  I thought Firefox was using more threading which would have an advantage on dual core, no?

Other oddity.
Running stringConcat I got 384615.385 ops in 2009-10-05 on every single run.
In 2010-01-19 that exact same test resulted in...  384615.385 on first run, 400000.000 on second.  The results continued to be one of those two, seemingly random.
Comment 3 nemo 2010-01-19 13:35:41 PST
Er. Sorry. Editing fail.
The first set of figures are from the online peacekeeper.  I downloaded nightly build 2009-10-05 for Windows and ran it against the online peacekeeper.  I then repeated this in the 2010-01-19 nightly.

The 2nd set are using the downloaded harness.
Comment 4 Boris Zbarsky [:bz] 2010-01-19 13:38:34 PST
It really depends...  but I bet what you're seeing has pretty much nothing to
do with threading.

Would you be willing to take, say, renderPhotoZoom and try some nightlies to
narrow down when it regressed for you?
Comment 5 nemo 2010-01-19 15:51:50 PST
Well. This is irritating.                                                                                                    
Completely unable to reproduce figures I was getting repeatedly for 2009-10-05 just hours ago.                               
                                                                                                                             
So. Starting from scratch the string variation was consistently reproducible but can't reproduce the rendering variation.

And on the exact same profile / firefox.  And since I was switching between one and the other while rendering some background process load seems unlikely.

Let me get back to you on this tomorrow.  Might only be the strings...
Comment 6 nemo 2010-01-20 12:53:40 PST
Alright. I still have no idea how I got my results from yesterday where 2009-10-05 was doing so dramatically better in rendering in both fresh runs of the online version and offline variant.  And I'm particularly puzzled as to how I got 2636 at some point in the past.  Perhaps was a fleeting thing in just one nightly.  
At any rate, I'm unable to reproduce after downloading a bunch of builds from 2009-08 on and being sure to test in fresh profiles with nothing else running.  The only loss is in the string tests, and that is recent.  After 2010-01-10 but before 2010-01-19 / 2010-01-20.

Collected results below.  Apart from recent drop in String, probably what is visible in the individual tests above, only other thing I see of any interest is dropoff in "Social networking" (encrypt / xml parse / filter / sort / scroll) between 2010-01-10 and 2010-01-20 - that and Data fluctuating quite a bit.

                    2009-08-01  2009-09-01  2009-09-15  2009-10-05  2009-10-12  2009-10-20  2009-11-05  2009-12-05  2009-12-20  2010-01-10  2010-Jan-20
Total               2118        2190        2110        2417        2246        2420        2454        2498        2542        2576        2459
Rendering           1610        1631        1619        1804        1797        1799        1817        1841        1826        1858        1838
Social Networking   2009        2194        2125        2066        2078        2142        2170        2532        2550        2646        2482
Complex Graphics    2716        3547        3598        3821        3809        3859        3834        3880        3862        3834        3938
Data                4468        4829        3919        4726        4627        4803        4807        4621        4680        4641        4862
DOM operations      1918        1830        2008        2010        1833        1930        1995        1942        2081        2117        1991
Text parsing        1539        1594        1547        2334        1808        2325        2354        2329        2342        2352        2038


For comparison purposes on this machine, Chromium latest buildbot.
Total               3845
Rendering           3106
Social Networking   3540
Complex Graphics    5770
Data                3087
DOM operations      4586
Text parsing        5401


http://service.futuremark.com/peacekeeper/results.action?key=2mqt - 2009-08-01 - 2118
http://service.futuremark.com/peacekeeper/results.action?key=2mn3 - 2009-09-01 - 2190
http://service.futuremark.com/peacekeeper/results.action?key=2mn7 - 2009-09-15 - 2110
http://service.futuremark.com/peacekeeper/results.action?key=2mnA - 2009-10-05 - 2417
http://service.futuremark.com/peacekeeper/results.action?key=2mnK - 2009-10-12 - 2246
http://service.futuremark.com/peacekeeper/results.action?key=2mnV - 2009-10-20 - 2420
http://service.futuremark.com/peacekeeper/results.action?key=2mnk - 2009-11-05 - 2454
http://service.futuremark.com/peacekeeper/results.action?key=2mpu - 2009-12-05 - 2498
http://service.futuremark.com/peacekeeper/results.action?key=2mqE - 2009-12-20 - 2542
http://service.futuremark.com/peacekeeper/results.action?key=2mqb - 2010-01-10 - 2576
http://service.futuremark.com/peacekeeper/results.action?key=2mnt - 2010-01-20 / Chromium LATEST - 2459
Comment 7 nemo 2010-01-20 13:00:36 PST
Created attachment 422600 [details]
collected results

Sorry. one more bug spam.  I see bugzilla chose to hard wrap the text.  Same results as above, more readable.
Comment 8 Boris Zbarsky [:bz] 2010-01-20 13:41:21 PST
nemo, do you want to narrow down the text thing to one day?  Or tell me how to sanely reproduce without running all of peacekeeper?
Comment 9 nemo 2010-01-20 13:57:33 PST
Fetch and unpack:
http://hg.mozilla.org/users/mpalmgren_mozilla.com/peacekeeper/archive/1f813e4a8ff5.tar.bz2

Open run.html.

select string: stringRegexpEmail and run selected.

2009-01-10: 126582.278
2009-01-20: 105263.158

I'll download some more builds though.
Comment 10 nemo 2010-01-20 14:57:01 PST
Testing stringValidateForm:
2010-01-11 84745.763
2010-01-12 56179.775

Windows XP 32 in a clean profile.
Comment 11 Boris Zbarsky [:bz] 2010-01-20 17:04:15 PST
Filed bug 540985 on that regression.  I can totally reproduce here on Mac.
Comment 12 nemo 2010-03-18 11:33:26 PDT
Regressed from 12/20 build:
domDynamicCreationCreateElement - ~5-6%
domDynamicCreationInnerHTML - ~5-6%

Still Regressed:
stringValidateForm - ~25%

Given response in bug #540985, will file a new bug
Comment 13 nemo 2010-09-01 11:24:41 PDT
For the record, this is:
Mozilla/5.0 (X11; Linux x86_64; rv:2.0b6pre) Gecko/20100901 Firefox/4.0b6pre
vs 
Google Chrome 7 for Linux
On the array tests in the harness.

bz asked me to file a bug on splice, but I figured I might as well also dump the results here.
runs x5
                Chrome7         FF nightly
arrayCombined  | 480044.662     | 281234.251
arrayConcat    |2406417.114     | 377674.562
arrayJoin      |   7808.000     |   9220.000
arrayPop       |7380952.383     |6785714.287
arrayPush      |8095238.097     |8095238.097
arrayReverse   |3791208.790     |7857142.859
arrayShift     |7857142.859     |1250000.000
arraySlice     |6964285.716     |5833333.333
arraySort      |3125000.000     |2631578.945
arraySplice    |7142857.145     | 191278.406
arrayWeighted  |1331813.577     | 369424.078

bz suggested increasing testOperationLimit to 10000000 - did not seem to significantly alter results.
No idea if any of these are regressions.
Comment 14 Brendan Eich [:brendan] 2010-09-01 11:35:59 PDT
We kick ass on reverse! ;-)

Truly, we've doted on join and it shows (good, just need more of that elsewhere).

/be
Comment 15 David (Yavatar) 2010-09-15 22:51:11 PDT
Hi all,

I'm been tracking Peacekeeper since Beta 4. I have two different sets below, and I apologize in advance they aren't all updated and in-sync.


Here's the original set I started when I hoped onto the betas. I was looking for some benchmarks just to see how the different browsers compared. (Peacekeeper is nice in that you can compare the best overall scores across the different browsers all in one place. Also, you can SEE something happening and not just have something running and get a bunch of numbers at the end.)

http://clients.futuremark.com/peacekeeper/results.action?key=4Edz

(BTW, I was testing Chrome engine on IE8 and i didn't clear out the cookies first in Chrome before running the test... so don't freak, IE8 was just got on steroids, that's all.) ;)


It seemed like it was generally moving in the right direction of better scores.


I was curious just how the early betas had been progressing so I went back and installed the earlier betas and created a new set of benchmarks starting at Beta 1.

http://clients.futuremark.com/peacekeeper/results.action?key=4Jf3


Beta 4 is showing best OVERALL now. I have some older WinXP machines at work with integrated Intel graphics. It seems like Beta 4 is showing better overall than the newer betas and nightlies. However, I haven't had a chance to test as in-depth as at home.

I haven't been able to get a benchmark on Peacekeeper for pre7 because it crashes after the 2nd or 3rd test (rendering stage). It appears that is still being worked on. I threw in my two cents worth of crash logs on that too.


It SEEMS that I tend to get my best test scores right after a new version is put in although it seems random sometimes that I get a better score later on. Beyond that, I can't figure out anything definitive that results in better scores

Cheers,

Yav
Comment 16 Trev 2010-11-02 20:54:09 PDT
Created attachment 487818 [details]
jquery-1.3.1

I'm attaching the version of jquery as used by Peacekeeper so I can attach test cases that depend on it.
Comment 17 Trev 2010-11-02 21:05:49 PDT
Created attachment 487822 [details]
assetmanager.js

assetmanager.js used in peacekeeper tests
Comment 18 Trev 2010-11-04 13:47:29 PDT
Here's the tests that I've looked into so far and which bugs were filed for them:

Test 1 - bug #606734
Test 2, 3, 4 - bug #608648
Test 5 - bug #608880
Test 6 - bug #609212
Test 7 - none needed
Test 8 - none needed
Test 9 - none needed
Test 10, 11 - bug #606648, bug #601176
Test 12 - bug #609704, bug #601176
Comment 19 Trev 2011-04-10 10:39:10 PDT
Here's the scores from the latest FF Nightly vs Chrome 10.0.648.204.

Test Name                                   FF      Chrome      Perc
renderChart                            80.8283    130.3281    -61.24%
renderGrid01                          185.0346    185.1259     -0.05%
renderGrid02                           99.5801    147.2470    -47.87%
renderGrid03                            3.7599     11.6364   -209.49%
renderPhysics                          52.3281     75.5584    -44.39%
community01Encrypt                    131.2163    161.1744    -22.83%
community02ParseXML                    24.2258     18.1502     25.08%
community03Filter                      90.6977    103.9522    -14.61%
community04Sort                        77.8661     76.8719      1.28%
experimentalRipple01                   15.1147     26.2141    -73.43%
experimentalRipple02                    5.8192     10.3360    -77.62%
experimentalMovie                      78.7228     70.4573     10.50%
arrayCombined                       62111.8012 128205.1282   -106.41%
arrayWeighted                       82644.6281 370370.3704   -348.15%
domGetElements                    1111111.1111 909090.9091     18.18%
domDynamicCreationCreateElement     13280.2125  10493.1794     20.99%
domDynamicCreationInnerHTML         16366.6121   7782.1012     52.45%
domJQueryAttributeFilters            2816.9014   3866.9760    -37.28%
domJQueryBasicFilters                 939.7613   1919.7543   -104.28%
domJQueryBasics                      2523.9778   4506.5345    -78.55%
domJQueryContentFilters               215.5962    446.4684   -107.09%
domJQueryHierarchy                    506.0985   1555.6938   -207.39%
stringChat                          67114.0940  47619.0476     29.05%
stringDetectBrowser                303030.3030 333333.3333    -10.00%
stringFilter                         2712.9680  38022.8137  -1301.52%
stringValidateForm                 151515.1515 909090.9091   -500.00%
stringWeighted                      84745.7627  57471.2644     32.18%
Comment 20 Zibi Braniecki [:gandalf][:zibi] 2011-10-03 04:32:41 PDT
peacekeeper is releasing the new version. My results on MacOS X (MacBookAir 1.8Ghz Core i7)

I compared Fx7, Fx10 and Chrome 16. It seems that we have a lot of regressions between fx7 and 10 on this test - https://docs.google.com/spreadsheet/ccc?key=0Ah9TBa-qpKojdHBDNmFDNUduOHFnLUk3XzRvX0swaFE&hl=en_US

Should we file bugs?
Comment 21 Boris Zbarsky [:bz] 2011-10-03 04:50:48 PDT
Yes, absolutely.  Worth testing whether those are due to type inference and mentioning that when filing....
Comment 22 Mats Palmgren (vacation) 2011-10-03 05:56:41 PDT
Created attachment 564168 [details]
Peacekeeper 2.0 beta (2011-10-03) results on Linux x86-64 (no HWA)

I have marked the tests where we are exceptionally slow compared
to Chrome in pink color.

Note that you can run single tests, for example:
http://peacekeeper.futuremark.com/run.action?repeat=1&forceSuiteName=string&forceTestName=stringstringFilter

The names to use (with some mangling) are:

all
html5-videoVideoSupport
html5-videoSubtitleSupport
html5-videoPosterSupport
html5-videoCodecH264
html5-videoCodecMP4
html5-videoCodecTheora
html5-videoCodecWebM
html5-webglSphere
html5-gamingSpitfire
html5-workerContrast01
html5-workerContrast02
html5-workerContrast03
render-renderGrid01
render-renderGrid02
render-renderGrid03
render-renderPhysics
experimental-experimentalRipple02
experimental-experimentalMovie
array-arrayCombined
array-arrayWeighted
dom-domGetElements
dom-domDynamicCreationCreateElement
dom-domDynamicCreationInnerHTML
dom-domJQueryAttributeFilters
dom-domJQueryBasicFilters
dom-domJQueryBasics
dom-domJQueryContentFilters
dom-domJQueryHierarchy
dom-domQueryselector
string-stringChat
string-stringDetectBrowser
string-stringFilter
string-stringValidateForm
string-stringWeighted
Comment 23 Zibi Braniecki [:gandalf][:zibi] 2011-10-03 06:40:52 PDT
The major regression components are:
 - HTML5 Canvas
 - Data
 - DOM Operations
 - String parsing

Because of that my initial guess is that it's related to our JS engine.
Comment 24 Zibi Braniecki [:gandalf][:zibi] 2011-10-03 07:10:29 PDT
filed bug 691314 for the regression
Comment 25 Trev 2011-10-04 20:08:53 PDT
They've released how they calculate the results in their faq (http://peacekeeper.futuremark.com/faq.action).  It is the geometric mean of the main sections (render, experimental, data/array, dom, string) which are in turn the geometric mean of each individual test.

So, to increase the score, the render and experimental sections are pretty much useless.  The data/array, dom, and string are the best ones to focus in.  And considering that Chrome is destroying Firefox in data/array and string, those would be the best two areas.
Comment 26 Boris Zbarsky [:bz] 2011-10-04 20:22:32 PDT
> the render and experimental sections are pretty much useless.  

Why?  The tests with the highest weights in the final score are in the "data/array" and "experimental" sections (since those have the fewest tests).

In any case, the data and string sections we're definitely way slower than Chrome on and those are pure JS.  In dom we're doing OK-ish, comparatively.

Luke, would you be willing to look into those array and string tests?  Some of the array subtests have preexisting bugs, but they might have changed the tests...
Comment 27 Trev 2011-10-04 20:56:06 PDT
(In reply to Boris Zbarsky (:bz) from comment #26)
> > the render and experimental sections are pretty much useless.  

Both the render and the experimental have a maximum of 60 (possibly a little over 60) because they now use the requestAnimationFrame.  The other tests are potentially unlimited.  But yes, if you go by the affect of a single test, then experimental is actually the best one because both of the tests are identical except for the canvas size.

As for the array and string tests, all of them are identical to the previous version except for arrayCombined.  They tried to fix the bug where it worked on an empty array but they botched it.  The only difference is that instead of working on a variable called "this.data500", it is working on a local variable called "data500".
Comment 28 Boris Zbarsky [:bz] 2011-10-04 21:18:43 PDT
> Both the render and the experimental have a maximum of 60 

Ah, ok.  The numbers in the spreadsheet cited in bug 691314 comment 0, we have to get 2-8x faster on those tests to worry about that cap.  Though that was on Mac; maybe on Windows we're closer...

> They tried to fix the bug where it worked on an empty array but they botched it.

Lovely.  ;)

Thanks for checking on the rest of it.  Sounds like the various bugs blocking this one still apply, then.  Good.

dmandelin, do you think you can scare up some jseng resources to pick the low-hanging fruit here (e.g. bug 609896) at least?
Comment 29 Jeff Walden [:Waldo] (remove +bmo to email) 2011-10-05 02:20:09 PDT
Bug 609896 can be folded into bug 688852, so I think that one should be considered taken care of.
Comment 30 Trev 2011-10-05 21:47:39 PDT
Created attachment 565130 [details]
Peacekeeper tests in one file

I'm attaching an html file that includes all the array and string tests.  Just pick the test and run it to get the results.  I've also added a way to run shark from it as well.  I hope to add the dom, render and experimental as well in a later update.
Comment 31 Trev 2011-10-17 20:37:35 PDT
Created attachment 567664 [details]
jquery-1.6.5.min.js

I'm attaching an updated version of jquery as used by Peacekeeper so I can attach test cases that depend on it.
Comment 32 Trev 2011-10-17 20:43:28 PDT
Created attachment 567665 [details]
Peacekeeper tests in one file

Peacekeeper has updated a few of its tests.  arrayCombined is one of them and it no longer runs over a bunch of empty arrays.  It actually works correctly.

I'm attaching an updated single file with the updates.  This file also contains the dom tests as well.
Comment 33 Steve Fink [:sfink] [:s:] 2011-10-18 14:30:13 PDT
I wrote up a simple harness to run only the JS shell tests, and compared a run of js -m -j 
-p to js -m -n and v8. Higher is better. These numbers are the number of iterations in 3 seconds.

                              opt-mjp.csv          opt-mn.csv      opt-d8.csv
arrayCombined                 	    20866	  16282  -21%	  44087 +111%
arrayWeighted                 	   299827	 260542  -13%	1131964 +277%
stringChat                    	   265637	 245946   -7%	 249335   -6%
stringDetectBrowser           	   895283	 610040  -31%	1387736  +55%
stringFilter                  	    10834	  11018   +1%	 155878 +1338%
stringValidateForm            	  1510407	 901291  -40%	2727262  +80%
stringWeighted                	   326920	 251073  -23%	 600624  +83%

Those are just the ones that are listed in comment 22 and appear to be runnable directly from the shell. I also have results for everything accessible via wget -r that I'll attach.

I haven't looked into all of these. stringFilter is bug 503107, but should be mostly fixed by bug 691797. I don't know what's up with the regressions from TM/JM (-m -j -p) to JM+TI (-m -n). Comment 29 addresses at least part of the array stuff.
Comment 34 Yani 2014-06-23 18:12:27 PDT
Peacekeeper test fails totally on DOM Tree load. Memory leaks to beyond 2Gb and crashes. Surely this is known but matches a real world problem with TwitterDeck which I suspect is the same bug.
Comment 35 Yani 2014-06-23 18:28:09 PDT
The issue doesn't occur in safe mode. Surely the plugin system should disable a runaway process.
Comment 36 Jan de Mooij [:jandem] (PTO until July 31) 2015-08-28 01:26:58 PDT
We made a lot of progress here but I'm not working on this right now.

Note You need to log in before you can comment on or make changes to this bug.