Open Bug 499198 (peacekeeper) Opened 15 years ago Updated 2 years ago

[meta]Tracking performance on Peacekeeper benchmark

Categories

(Core :: General, defect)

defect

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

(Depends on 9 open bugs, Blocks 1 open bug, )

Details

(Keywords: meta)

Attachments

(6 files, 1 obsolete file)

This is a bug to track our performance on the peacekeeper benchmark.  Mats rocks and has created a locally-runnable version; you can get it from http://hg.mozilla.org/users/mpalmgren_mozilla.com/peacekeeper
Depends on: 499199
Depends on: 499201
Alias: peacekeeper
Depends on: 499235
Depends on: 200505
The fix for bug 200505 has improved our performance on the "data" segment of the test from 2328 (3.5.1pre) to 4300 (3.6a1) on my machine.
Depends on: 503107
Depends on: 503141
Depends on: 504920
Peacekeeper tests seem to have gotten slower lately in past few months.
I assumed it was some temporary thing, but it hasn't gone away.
My highest Minefield score was back in 2009-09 / 2009-10 - 2636

Since then I've only seen declines in overall score, although the annoying interface made it hard to see what was dropping until I created a new test set.

Downloading the offline test suite and comparing a nightly from 2009-10-05 for windows against latest Minefield using a frest test run I get:
                    [2010-01-19]    [2009-10-05]        [% drop in latest]
Total:              2458            2544                 3.4%
Rendering:          1860            2268                18.0%
Social networking:  2639            2204               -19.7% (dramatic improvement)
Complex Graphics:   3923            4158                 5.7%
Data:               4580            4614                 0.7%
DOM operations:     1970            1987                 0.9%
Text parsing:       2026            2326                12.9%

Now, if I download the standalone version and try a few from the categories above that lost the most ground:
                    [2010-01-19]    [2009-10-05]        [% drop in latest]
renderChart         38.577ops       58.509ops           34.1%
renderPhotoZoom     12.309ops       56.743              78.3%
stringRegexpEmail   98039.216ops    129870.130ops       24.5%
stringValidateForm  58139.535ops    87719.298ops        33.7%

This machine is a dual core laptop, kinda surprises me things have gotten slower.  I thought Firefox was using more threading which would have an advantage on dual core, no?

Other oddity.
Running stringConcat I got 384615.385 ops in 2009-10-05 on every single run.
In 2010-01-19 that exact same test resulted in...  384615.385 on first run, 400000.000 on second.  The results continued to be one of those two, seemingly random.
Er. Sorry. Editing fail.
The first set of figures are from the online peacekeeper.  I downloaded nightly build 2009-10-05 for Windows and ran it against the online peacekeeper.  I then repeated this in the 2010-01-19 nightly.

The 2nd set are using the downloaded harness.
It really depends...  but I bet what you're seeing has pretty much nothing to
do with threading.

Would you be willing to take, say, renderPhotoZoom and try some nightlies to
narrow down when it regressed for you?
Assignee: nobody → sayrer
Well. This is irritating.                                                                                                    
Completely unable to reproduce figures I was getting repeatedly for 2009-10-05 just hours ago.                               
                                                                                                                             
So. Starting from scratch the string variation was consistently reproducible but can't reproduce the rendering variation.

And on the exact same profile / firefox.  And since I was switching between one and the other while rendering some background process load seems unlikely.

Let me get back to you on this tomorrow.  Might only be the strings...
Alright. I still have no idea how I got my results from yesterday where 2009-10-05 was doing so dramatically better in rendering in both fresh runs of the online version and offline variant.  And I'm particularly puzzled as to how I got 2636 at some point in the past.  Perhaps was a fleeting thing in just one nightly.  
At any rate, I'm unable to reproduce after downloading a bunch of builds from 2009-08 on and being sure to test in fresh profiles with nothing else running.  The only loss is in the string tests, and that is recent.  After 2010-01-10 but before 2010-01-19 / 2010-01-20.

Collected results below.  Apart from recent drop in String, probably what is visible in the individual tests above, only other thing I see of any interest is dropoff in "Social networking" (encrypt / xml parse / filter / sort / scroll) between 2010-01-10 and 2010-01-20 - that and Data fluctuating quite a bit.

                    2009-08-01  2009-09-01  2009-09-15  2009-10-05  2009-10-12  2009-10-20  2009-11-05  2009-12-05  2009-12-20  2010-01-10  2010-Jan-20
Total               2118        2190        2110        2417        2246        2420        2454        2498        2542        2576        2459
Rendering           1610        1631        1619        1804        1797        1799        1817        1841        1826        1858        1838
Social Networking   2009        2194        2125        2066        2078        2142        2170        2532        2550        2646        2482
Complex Graphics    2716        3547        3598        3821        3809        3859        3834        3880        3862        3834        3938
Data                4468        4829        3919        4726        4627        4803        4807        4621        4680        4641        4862
DOM operations      1918        1830        2008        2010        1833        1930        1995        1942        2081        2117        1991
Text parsing        1539        1594        1547        2334        1808        2325        2354        2329        2342        2352        2038


For comparison purposes on this machine, Chromium latest buildbot.
Total               3845
Rendering           3106
Social Networking   3540
Complex Graphics    5770
Data                3087
DOM operations      4586
Text parsing        5401


http://service.futuremark.com/peacekeeper/results.action?key=2mqt - 2009-08-01 - 2118
http://service.futuremark.com/peacekeeper/results.action?key=2mn3 - 2009-09-01 - 2190
http://service.futuremark.com/peacekeeper/results.action?key=2mn7 - 2009-09-15 - 2110
http://service.futuremark.com/peacekeeper/results.action?key=2mnA - 2009-10-05 - 2417
http://service.futuremark.com/peacekeeper/results.action?key=2mnK - 2009-10-12 - 2246
http://service.futuremark.com/peacekeeper/results.action?key=2mnV - 2009-10-20 - 2420
http://service.futuremark.com/peacekeeper/results.action?key=2mnk - 2009-11-05 - 2454
http://service.futuremark.com/peacekeeper/results.action?key=2mpu - 2009-12-05 - 2498
http://service.futuremark.com/peacekeeper/results.action?key=2mqE - 2009-12-20 - 2542
http://service.futuremark.com/peacekeeper/results.action?key=2mqb - 2010-01-10 - 2576
http://service.futuremark.com/peacekeeper/results.action?key=2mnt - 2010-01-20 / Chromium LATEST - 2459
Attached file collected results —
Sorry. one more bug spam.  I see bugzilla chose to hard wrap the text.  Same results as above, more readable.
nemo, do you want to narrow down the text thing to one day?  Or tell me how to sanely reproduce without running all of peacekeeper?
Fetch and unpack:
http://hg.mozilla.org/users/mpalmgren_mozilla.com/peacekeeper/archive/1f813e4a8ff5.tar.bz2

Open run.html.

select string: stringRegexpEmail and run selected.

2009-01-10: 126582.278
2009-01-20: 105263.158

I'll download some more builds though.
Testing stringValidateForm:
2010-01-11 84745.763
2010-01-12 56179.775

Windows XP 32 in a clean profile.
Filed bug 540985 on that regression.  I can totally reproduce here on Mac.
Depends on: 540985
Depends on: 544477
Regressed from 12/20 build:
domDynamicCreationCreateElement - ~5-6%
domDynamicCreationInnerHTML - ~5-6%

Still Regressed:
stringValidateForm - ~25%

Given response in bug #540985, will file a new bug
Depends on: 553342
Blocks: 553348
For the record, this is:
Mozilla/5.0 (X11; Linux x86_64; rv:2.0b6pre) Gecko/20100901 Firefox/4.0b6pre
vs 
Google Chrome 7 for Linux
On the array tests in the harness.

bz asked me to file a bug on splice, but I figured I might as well also dump the results here.
runs x5
                Chrome7         FF nightly
arrayCombined  | 480044.662     | 281234.251
arrayConcat    |2406417.114     | 377674.562
arrayJoin      |   7808.000     |   9220.000
arrayPop       |7380952.383     |6785714.287
arrayPush      |8095238.097     |8095238.097
arrayReverse   |3791208.790     |7857142.859
arrayShift     |7857142.859     |1250000.000
arraySlice     |6964285.716     |5833333.333
arraySort      |3125000.000     |2631578.945
arraySplice    |7142857.145     | 191278.406
arrayWeighted  |1331813.577     | 369424.078

bz suggested increasing testOperationLimit to 10000000 - did not seem to significantly alter results.
No idea if any of these are regressions.
We kick ass on reverse! ;-)

Truly, we've doted on join and it shows (good, just need more of that elsewhere).

/be
Depends on: 592786
Hi all,

I'm been tracking Peacekeeper since Beta 4. I have two different sets below, and I apologize in advance they aren't all updated and in-sync.


Here's the original set I started when I hoped onto the betas. I was looking for some benchmarks just to see how the different browsers compared. (Peacekeeper is nice in that you can compare the best overall scores across the different browsers all in one place. Also, you can SEE something happening and not just have something running and get a bunch of numbers at the end.)

http://clients.futuremark.com/peacekeeper/results.action?key=4Edz

(BTW, I was testing Chrome engine on IE8 and i didn't clear out the cookies first in Chrome before running the test... so don't freak, IE8 was just got on steroids, that's all.) ;)


It seemed like it was generally moving in the right direction of better scores.


I was curious just how the early betas had been progressing so I went back and installed the earlier betas and created a new set of benchmarks starting at Beta 1.

http://clients.futuremark.com/peacekeeper/results.action?key=4Jf3


Beta 4 is showing best OVERALL now. I have some older WinXP machines at work with integrated Intel graphics. It seems like Beta 4 is showing better overall than the newer betas and nightlies. However, I haven't had a chance to test as in-depth as at home.

I haven't been able to get a benchmark on Peacekeeper for pre7 because it crashes after the 2nd or 3rd test (rendering stage). It appears that is still being worked on. I threw in my two cents worth of crash logs on that too.


It SEEMS that I tend to get my best test scores right after a new version is put in although it seems random sometimes that I get a better score later on. Beyond that, I can't figure out anything definitive that results in better scores

Cheers,

Yav
Depends on: 606650
Depends on: 606648
Depends on: 603872
Depends on: 608648
Depends on: 608880
Attached file jquery-1.3.1 —
I'm attaching the version of jquery as used by Peacekeeper so I can attach test cases that depend on it.
Attached file assetmanager.js —
assetmanager.js used in peacekeeper tests
Depends on: 609212
Depends on: 609296
No longer depends on: 609296
Depends on: 601176
Depends on: 609704
Here's the tests that I've looked into so far and which bugs were filed for them:

Test 1 - bug #606734
Test 2, 3, 4 - bug #608648
Test 5 - bug #608880
Test 6 - bug #609212
Test 7 - none needed
Test 8 - none needed
Test 9 - none needed
Test 10, 11 - bug #606648, bug #601176
Test 12 - bug #609704, bug #601176
Depends on: 609835
Depends on: 171262
Depends on: 610077
Depends on: 506813
Depends on: 609229
Depends on: 605385
Depends on: 562034
Depends on: 617136
Here's the scores from the latest FF Nightly vs Chrome 10.0.648.204.

Test Name                                   FF      Chrome      Perc
renderChart                            80.8283    130.3281    -61.24%
renderGrid01                          185.0346    185.1259     -0.05%
renderGrid02                           99.5801    147.2470    -47.87%
renderGrid03                            3.7599     11.6364   -209.49%
renderPhysics                          52.3281     75.5584    -44.39%
community01Encrypt                    131.2163    161.1744    -22.83%
community02ParseXML                    24.2258     18.1502     25.08%
community03Filter                      90.6977    103.9522    -14.61%
community04Sort                        77.8661     76.8719      1.28%
experimentalRipple01                   15.1147     26.2141    -73.43%
experimentalRipple02                    5.8192     10.3360    -77.62%
experimentalMovie                      78.7228     70.4573     10.50%
arrayCombined                       62111.8012 128205.1282   -106.41%
arrayWeighted                       82644.6281 370370.3704   -348.15%
domGetElements                    1111111.1111 909090.9091     18.18%
domDynamicCreationCreateElement     13280.2125  10493.1794     20.99%
domDynamicCreationInnerHTML         16366.6121   7782.1012     52.45%
domJQueryAttributeFilters            2816.9014   3866.9760    -37.28%
domJQueryBasicFilters                 939.7613   1919.7543   -104.28%
domJQueryBasics                      2523.9778   4506.5345    -78.55%
domJQueryContentFilters               215.5962    446.4684   -107.09%
domJQueryHierarchy                    506.0985   1555.6938   -207.39%
stringChat                          67114.0940  47619.0476     29.05%
stringDetectBrowser                303030.3030 333333.3333    -10.00%
stringFilter                         2712.9680  38022.8137  -1301.52%
stringValidateForm                 151515.1515 909090.9091   -500.00%
stringWeighted                      84745.7627  57471.2644     32.18%
peacekeeper is releasing the new version. My results on MacOS X (MacBookAir 1.8Ghz Core i7)

I compared Fx7, Fx10 and Chrome 16. It seems that we have a lot of regressions between fx7 and 10 on this test - https://docs.google.com/spreadsheet/ccc?key=0Ah9TBa-qpKojdHBDNmFDNUduOHFnLUk3XzRvX0swaFE&hl=en_US

Should we file bugs?
Yes, absolutely.  Worth testing whether those are due to type inference and mentioning that when filing....
I have marked the tests where we are exceptionally slow compared
to Chrome in pink color.

Note that you can run single tests, for example:
http://peacekeeper.futuremark.com/run.action?repeat=1&forceSuiteName=string&forceTestName=stringstringFilter

The names to use (with some mangling) are:

all
html5-videoVideoSupport
html5-videoSubtitleSupport
html5-videoPosterSupport
html5-videoCodecH264
html5-videoCodecMP4
html5-videoCodecTheora
html5-videoCodecWebM
html5-webglSphere
html5-gamingSpitfire
html5-workerContrast01
html5-workerContrast02
html5-workerContrast03
render-renderGrid01
render-renderGrid02
render-renderGrid03
render-renderPhysics
experimental-experimentalRipple02
experimental-experimentalMovie
array-arrayCombined
array-arrayWeighted
dom-domGetElements
dom-domDynamicCreationCreateElement
dom-domDynamicCreationInnerHTML
dom-domJQueryAttributeFilters
dom-domJQueryBasicFilters
dom-domJQueryBasics
dom-domJQueryContentFilters
dom-domJQueryHierarchy
dom-domQueryselector
string-stringChat
string-stringDetectBrowser
string-stringFilter
string-stringValidateForm
string-stringWeighted
The major regression components are:
 - HTML5 Canvas
 - Data
 - DOM Operations
 - String parsing

Because of that my initial guess is that it's related to our JS engine.
filed bug 691314 for the regression
Depends on: 691797
Depends on: 692009
They've released how they calculate the results in their faq (http://peacekeeper.futuremark.com/faq.action).  It is the geometric mean of the main sections (render, experimental, data/array, dom, string) which are in turn the geometric mean of each individual test.

So, to increase the score, the render and experimental sections are pretty much useless.  The data/array, dom, and string are the best ones to focus in.  And considering that Chrome is destroying Firefox in data/array and string, those would be the best two areas.
> the render and experimental sections are pretty much useless.  

Why?  The tests with the highest weights in the final score are in the "data/array" and "experimental" sections (since those have the fewest tests).

In any case, the data and string sections we're definitely way slower than Chrome on and those are pure JS.  In dom we're doing OK-ish, comparatively.

Luke, would you be willing to look into those array and string tests?  Some of the array subtests have preexisting bugs, but they might have changed the tests...
(In reply to Boris Zbarsky (:bz) from comment #26)
> > the render and experimental sections are pretty much useless.  

Both the render and the experimental have a maximum of 60 (possibly a little over 60) because they now use the requestAnimationFrame.  The other tests are potentially unlimited.  But yes, if you go by the affect of a single test, then experimental is actually the best one because both of the tests are identical except for the canvas size.

As for the array and string tests, all of them are identical to the previous version except for arrayCombined.  They tried to fix the bug where it worked on an empty array but they botched it.  The only difference is that instead of working on a variable called "this.data500", it is working on a local variable called "data500".
> Both the render and the experimental have a maximum of 60 

Ah, ok.  The numbers in the spreadsheet cited in bug 691314 comment 0, we have to get 2-8x faster on those tests to worry about that cap.  Though that was on Mac; maybe on Windows we're closer...

> They tried to fix the bug where it worked on an empty array but they botched it.

Lovely.  ;)

Thanks for checking on the rest of it.  Sounds like the various bugs blocking this one still apply, then.  Good.

dmandelin, do you think you can scare up some jseng resources to pick the low-hanging fruit here (e.g. bug 609896) at least?
Bug 609896 can be folded into bug 688852, so I think that one should be considered taken care of.
Attached file Peacekeeper tests in one file (obsolete) —
I'm attaching an html file that includes all the array and string tests.  Just pick the test and run it to get the results.  I've also added a way to run shark from it as well.  I hope to add the dom, render and experimental as well in a later update.
Attachment #565130 - Attachment mime type: text/plain → text/html
Attached file jquery-1.6.5.min.js —
I'm attaching an updated version of jquery as used by Peacekeeper so I can attach test cases that depend on it.
Attachment #567664 - Attachment mime type: application/octet-stream → application/x-javascript
Attached file Peacekeeper tests in one file —
Peacekeeper has updated a few of its tests.  arrayCombined is one of them and it no longer runs over a bunch of empty arrays.  It actually works correctly.

I'm attaching an updated single file with the updates.  This file also contains the dom tests as well.
Attachment #565130 - Attachment is obsolete: true
Attachment #567665 - Attachment mime type: text/plain → text/html
I wrote up a simple harness to run only the JS shell tests, and compared a run of js -m -j 
-p to js -m -n and v8. Higher is better. These numbers are the number of iterations in 3 seconds.

                              opt-mjp.csv          opt-mn.csv      opt-d8.csv
arrayCombined                 	    20866	  16282  -21%	  44087 +111%
arrayWeighted                 	   299827	 260542  -13%	1131964 +277%
stringChat                    	   265637	 245946   -7%	 249335   -6%
stringDetectBrowser           	   895283	 610040  -31%	1387736  +55%
stringFilter                  	    10834	  11018   +1%	 155878 +1338%
stringValidateForm            	  1510407	 901291  -40%	2727262  +80%
stringWeighted                	   326920	 251073  -23%	 600624  +83%

Those are just the ones that are listed in comment 22 and appear to be runnable directly from the shell. I also have results for everything accessible via wget -r that I'll attach.

I haven't looked into all of these. stringFilter is bug 503107, but should be mostly fixed by bug 691797. I don't know what's up with the regressions from TM/JM (-m -j -p) to JM+TI (-m -n). Comment 29 addresses at least part of the array stuff.
Depends on: 697343
Depends on: 723481
Depends on: 743615
Depends on: 916851
Depends on: 917247
Depends on: 917258
Depends on: 917814
Depends on: 917839
Depends on: 918746
Depends on: 919992
Depends on: 920508
Assignee: sayrer → jdemooij
Status: NEW → ASSIGNED
Depends on: 920659
Depends on: 922016
Depends on: 922018
Depends on: 922048
Depends on: 922053
Depends on: 922063
Depends on: 922071
Depends on: 929087
Blocks: WBGP
No longer blocks: 553348
Depends on: 553348
Depends on: 626165
Depends on: 609296
Depends on: 1004923
Peacekeeper test fails totally on DOM Tree load. Memory leaks to beyond 2Gb and crashes. Surely this is known but matches a real world problem with TwitterDeck which I suspect is the same bug.
The issue doesn't occur in safe mode. Surely the plugin system should disable a runaway process.
Depends on: 1024132
We made a lot of progress here but I'm not working on this right now.
Assignee: jdemooij → nobody
Status: ASSIGNED → NEW
Component: Tracking → General
QA Contact: chofmann
Depends on: 1410624
Depends on: 1493420
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: