Open Bug 499198 (peacekeeper) Opened 11 years ago Updated Last year
[meta]Tracking performance on Peacekeeper benchmark
1.14 KB, text/plain
7.33 KB, text/html
185.84 KB, text/html
This is a bug to track our performance on the peacekeeper benchmark. Mats rocks and has created a locally-runnable version; you can get it from http://hg.mozilla.org/users/mpalmgren_mozilla.com/peacekeeper
The fix for bug 200505 has improved our performance on the "data" segment of the test from 2328 (3.5.1pre) to 4300 (3.6a1) on my machine.
Peacekeeper tests seem to have gotten slower lately in past few months. I assumed it was some temporary thing, but it hasn't gone away. My highest Minefield score was back in 2009-09 / 2009-10 - 2636 Since then I've only seen declines in overall score, although the annoying interface made it hard to see what was dropping until I created a new test set. Downloading the offline test suite and comparing a nightly from 2009-10-05 for windows against latest Minefield using a frest test run I get: [2010-01-19] [2009-10-05] [% drop in latest] Total: 2458 2544 3.4% Rendering: 1860 2268 18.0% Social networking: 2639 2204 -19.7% (dramatic improvement) Complex Graphics: 3923 4158 5.7% Data: 4580 4614 0.7% DOM operations: 1970 1987 0.9% Text parsing: 2026 2326 12.9% Now, if I download the standalone version and try a few from the categories above that lost the most ground: [2010-01-19] [2009-10-05] [% drop in latest] renderChart 38.577ops 58.509ops 34.1% renderPhotoZoom 12.309ops 56.743 78.3% stringRegexpEmail 98039.216ops 129870.130ops 24.5% stringValidateForm 58139.535ops 87719.298ops 33.7% This machine is a dual core laptop, kinda surprises me things have gotten slower. I thought Firefox was using more threading which would have an advantage on dual core, no? Other oddity. Running stringConcat I got 384615.385 ops in 2009-10-05 on every single run. In 2010-01-19 that exact same test resulted in... 384615.385 on first run, 400000.000 on second. The results continued to be one of those two, seemingly random.
Er. Sorry. Editing fail. The first set of figures are from the online peacekeeper. I downloaded nightly build 2009-10-05 for Windows and ran it against the online peacekeeper. I then repeated this in the 2010-01-19 nightly. The 2nd set are using the downloaded harness.
It really depends... but I bet what you're seeing has pretty much nothing to do with threading. Would you be willing to take, say, renderPhotoZoom and try some nightlies to narrow down when it regressed for you?
Well. This is irritating. Completely unable to reproduce figures I was getting repeatedly for 2009-10-05 just hours ago. So. Starting from scratch the string variation was consistently reproducible but can't reproduce the rendering variation. And on the exact same profile / firefox. And since I was switching between one and the other while rendering some background process load seems unlikely. Let me get back to you on this tomorrow. Might only be the strings...
Alright. I still have no idea how I got my results from yesterday where 2009-10-05 was doing so dramatically better in rendering in both fresh runs of the online version and offline variant. And I'm particularly puzzled as to how I got 2636 at some point in the past. Perhaps was a fleeting thing in just one nightly. At any rate, I'm unable to reproduce after downloading a bunch of builds from 2009-08 on and being sure to test in fresh profiles with nothing else running. The only loss is in the string tests, and that is recent. After 2010-01-10 but before 2010-01-19 / 2010-01-20. Collected results below. Apart from recent drop in String, probably what is visible in the individual tests above, only other thing I see of any interest is dropoff in "Social networking" (encrypt / xml parse / filter / sort / scroll) between 2010-01-10 and 2010-01-20 - that and Data fluctuating quite a bit. 2009-08-01 2009-09-01 2009-09-15 2009-10-05 2009-10-12 2009-10-20 2009-11-05 2009-12-05 2009-12-20 2010-01-10 2010-Jan-20 Total 2118 2190 2110 2417 2246 2420 2454 2498 2542 2576 2459 Rendering 1610 1631 1619 1804 1797 1799 1817 1841 1826 1858 1838 Social Networking 2009 2194 2125 2066 2078 2142 2170 2532 2550 2646 2482 Complex Graphics 2716 3547 3598 3821 3809 3859 3834 3880 3862 3834 3938 Data 4468 4829 3919 4726 4627 4803 4807 4621 4680 4641 4862 DOM operations 1918 1830 2008 2010 1833 1930 1995 1942 2081 2117 1991 Text parsing 1539 1594 1547 2334 1808 2325 2354 2329 2342 2352 2038 For comparison purposes on this machine, Chromium latest buildbot. Total 3845 Rendering 3106 Social Networking 3540 Complex Graphics 5770 Data 3087 DOM operations 4586 Text parsing 5401 http://service.futuremark.com/peacekeeper/results.action?key=2mqt - 2009-08-01 - 2118 http://service.futuremark.com/peacekeeper/results.action?key=2mn3 - 2009-09-01 - 2190 http://service.futuremark.com/peacekeeper/results.action?key=2mn7 - 2009-09-15 - 2110 http://service.futuremark.com/peacekeeper/results.action?key=2mnA - 2009-10-05 - 2417 http://service.futuremark.com/peacekeeper/results.action?key=2mnK - 2009-10-12 - 2246 http://service.futuremark.com/peacekeeper/results.action?key=2mnV - 2009-10-20 - 2420 http://service.futuremark.com/peacekeeper/results.action?key=2mnk - 2009-11-05 - 2454 http://service.futuremark.com/peacekeeper/results.action?key=2mpu - 2009-12-05 - 2498 http://service.futuremark.com/peacekeeper/results.action?key=2mqE - 2009-12-20 - 2542 http://service.futuremark.com/peacekeeper/results.action?key=2mqb - 2010-01-10 - 2576 http://service.futuremark.com/peacekeeper/results.action?key=2mnt - 2010-01-20 / Chromium LATEST - 2459
Sorry. one more bug spam. I see bugzilla chose to hard wrap the text. Same results as above, more readable.
nemo, do you want to narrow down the text thing to one day? Or tell me how to sanely reproduce without running all of peacekeeper?
Fetch and unpack: http://hg.mozilla.org/users/mpalmgren_mozilla.com/peacekeeper/archive/1f813e4a8ff5.tar.bz2 Open run.html. select string: stringRegexpEmail and run selected. 2009-01-10: 126582.278 2009-01-20: 105263.158 I'll download some more builds though.
Testing stringValidateForm: 2010-01-11 84745.763 2010-01-12 56179.775 Windows XP 32 in a clean profile.
Regressed from 12/20 build: domDynamicCreationCreateElement - ~5-6% domDynamicCreationInnerHTML - ~5-6% Still Regressed: stringValidateForm - ~25% Given response in bug #540985, will file a new bug
For the record, this is: Mozilla/5.0 (X11; Linux x86_64; rv:2.0b6pre) Gecko/20100901 Firefox/4.0b6pre vs Google Chrome 7 for Linux On the array tests in the harness. bz asked me to file a bug on splice, but I figured I might as well also dump the results here. runs x5 Chrome7 FF nightly arrayCombined | 480044.662 | 281234.251 arrayConcat |2406417.114 | 377674.562 arrayJoin | 7808.000 | 9220.000 arrayPop |7380952.383 |6785714.287 arrayPush |8095238.097 |8095238.097 arrayReverse |3791208.790 |7857142.859 arrayShift |7857142.859 |1250000.000 arraySlice |6964285.716 |5833333.333 arraySort |3125000.000 |2631578.945 arraySplice |7142857.145 | 191278.406 arrayWeighted |1331813.577 | 369424.078 bz suggested increasing testOperationLimit to 10000000 - did not seem to significantly alter results. No idea if any of these are regressions.
We kick ass on reverse! ;-) Truly, we've doted on join and it shows (good, just need more of that elsewhere). /be
Hi all, I'm been tracking Peacekeeper since Beta 4. I have two different sets below, and I apologize in advance they aren't all updated and in-sync. Here's the original set I started when I hoped onto the betas. I was looking for some benchmarks just to see how the different browsers compared. (Peacekeeper is nice in that you can compare the best overall scores across the different browsers all in one place. Also, you can SEE something happening and not just have something running and get a bunch of numbers at the end.) http://clients.futuremark.com/peacekeeper/results.action?key=4Edz (BTW, I was testing Chrome engine on IE8 and i didn't clear out the cookies first in Chrome before running the test... so don't freak, IE8 was just got on steroids, that's all.) ;) It seemed like it was generally moving in the right direction of better scores. I was curious just how the early betas had been progressing so I went back and installed the earlier betas and created a new set of benchmarks starting at Beta 1. http://clients.futuremark.com/peacekeeper/results.action?key=4Jf3 Beta 4 is showing best OVERALL now. I have some older WinXP machines at work with integrated Intel graphics. It seems like Beta 4 is showing better overall than the newer betas and nightlies. However, I haven't had a chance to test as in-depth as at home. I haven't been able to get a benchmark on Peacekeeper for pre7 because it crashes after the 2nd or 3rd test (rendering stage). It appears that is still being worked on. I threw in my two cents worth of crash logs on that too. It SEEMS that I tend to get my best test scores right after a new version is put in although it seems random sometimes that I get a better score later on. Beyond that, I can't figure out anything definitive that results in better scores Cheers, Yav
I'm attaching the version of jquery as used by Peacekeeper so I can attach test cases that depend on it.
assetmanager.js used in peacekeeper tests
Here's the tests that I've looked into so far and which bugs were filed for them: Test 1 - bug #606734 Test 2, 3, 4 - bug #608648 Test 5 - bug #608880 Test 6 - bug #609212 Test 7 - none needed Test 8 - none needed Test 9 - none needed Test 10, 11 - bug #606648, bug #601176 Test 12 - bug #609704, bug #601176
Here's the scores from the latest FF Nightly vs Chrome 10.0.648.204. Test Name FF Chrome Perc renderChart 80.8283 130.3281 -61.24% renderGrid01 185.0346 185.1259 -0.05% renderGrid02 99.5801 147.2470 -47.87% renderGrid03 3.7599 11.6364 -209.49% renderPhysics 52.3281 75.5584 -44.39% community01Encrypt 131.2163 161.1744 -22.83% community02ParseXML 24.2258 18.1502 25.08% community03Filter 90.6977 103.9522 -14.61% community04Sort 77.8661 76.8719 1.28% experimentalRipple01 15.1147 26.2141 -73.43% experimentalRipple02 5.8192 10.3360 -77.62% experimentalMovie 78.7228 70.4573 10.50% arrayCombined 62111.8012 128205.1282 -106.41% arrayWeighted 82644.6281 370370.3704 -348.15% domGetElements 1111111.1111 909090.9091 18.18% domDynamicCreationCreateElement 13280.2125 10493.1794 20.99% domDynamicCreationInnerHTML 16366.6121 7782.1012 52.45% domJQueryAttributeFilters 2816.9014 3866.9760 -37.28% domJQueryBasicFilters 939.7613 1919.7543 -104.28% domJQueryBasics 2523.9778 4506.5345 -78.55% domJQueryContentFilters 215.5962 446.4684 -107.09% domJQueryHierarchy 506.0985 1555.6938 -207.39% stringChat 67114.0940 47619.0476 29.05% stringDetectBrowser 303030.3030 333333.3333 -10.00% stringFilter 2712.9680 38022.8137 -1301.52% stringValidateForm 151515.1515 909090.9091 -500.00% stringWeighted 84745.7627 57471.2644 32.18%
peacekeeper is releasing the new version. My results on MacOS X (MacBookAir 1.8Ghz Core i7) I compared Fx7, Fx10 and Chrome 16. It seems that we have a lot of regressions between fx7 and 10 on this test - https://docs.google.com/spreadsheet/ccc?key=0Ah9TBa-qpKojdHBDNmFDNUduOHFnLUk3XzRvX0swaFE&hl=en_US Should we file bugs?
Yes, absolutely. Worth testing whether those are due to type inference and mentioning that when filing....
I have marked the tests where we are exceptionally slow compared to Chrome in pink color. Note that you can run single tests, for example: http://peacekeeper.futuremark.com/run.action?repeat=1&forceSuiteName=string&forceTestName=stringstringFilter The names to use (with some mangling) are: all html5-videoVideoSupport html5-videoSubtitleSupport html5-videoPosterSupport html5-videoCodecH264 html5-videoCodecMP4 html5-videoCodecTheora html5-videoCodecWebM html5-webglSphere html5-gamingSpitfire html5-workerContrast01 html5-workerContrast02 html5-workerContrast03 render-renderGrid01 render-renderGrid02 render-renderGrid03 render-renderPhysics experimental-experimentalRipple02 experimental-experimentalMovie array-arrayCombined array-arrayWeighted dom-domGetElements dom-domDynamicCreationCreateElement dom-domDynamicCreationInnerHTML dom-domJQueryAttributeFilters dom-domJQueryBasicFilters dom-domJQueryBasics dom-domJQueryContentFilters dom-domJQueryHierarchy dom-domQueryselector string-stringChat string-stringDetectBrowser string-stringFilter string-stringValidateForm string-stringWeighted
The major regression components are: - HTML5 Canvas - Data - DOM Operations - String parsing Because of that my initial guess is that it's related to our JS engine.
filed bug 691314 for the regression
They've released how they calculate the results in their faq (http://peacekeeper.futuremark.com/faq.action). It is the geometric mean of the main sections (render, experimental, data/array, dom, string) which are in turn the geometric mean of each individual test. So, to increase the score, the render and experimental sections are pretty much useless. The data/array, dom, and string are the best ones to focus in. And considering that Chrome is destroying Firefox in data/array and string, those would be the best two areas.
> the render and experimental sections are pretty much useless. Why? The tests with the highest weights in the final score are in the "data/array" and "experimental" sections (since those have the fewest tests). In any case, the data and string sections we're definitely way slower than Chrome on and those are pure JS. In dom we're doing OK-ish, comparatively. Luke, would you be willing to look into those array and string tests? Some of the array subtests have preexisting bugs, but they might have changed the tests...
(In reply to Boris Zbarsky (:bz) from comment #26) > > the render and experimental sections are pretty much useless. Both the render and the experimental have a maximum of 60 (possibly a little over 60) because they now use the requestAnimationFrame. The other tests are potentially unlimited. But yes, if you go by the affect of a single test, then experimental is actually the best one because both of the tests are identical except for the canvas size. As for the array and string tests, all of them are identical to the previous version except for arrayCombined. They tried to fix the bug where it worked on an empty array but they botched it. The only difference is that instead of working on a variable called "this.data500", it is working on a local variable called "data500".
> Both the render and the experimental have a maximum of 60 Ah, ok. The numbers in the spreadsheet cited in bug 691314 comment 0, we have to get 2-8x faster on those tests to worry about that cap. Though that was on Mac; maybe on Windows we're closer... > They tried to fix the bug where it worked on an empty array but they botched it. Lovely. ;) Thanks for checking on the rest of it. Sounds like the various bugs blocking this one still apply, then. Good. dmandelin, do you think you can scare up some jseng resources to pick the low-hanging fruit here (e.g. bug 609896) at least?
I'm attaching an html file that includes all the array and string tests. Just pick the test and run it to get the results. I've also added a way to run shark from it as well. I hope to add the dom, render and experimental as well in a later update.
I'm attaching an updated version of jquery as used by Peacekeeper so I can attach test cases that depend on it.
Peacekeeper has updated a few of its tests. arrayCombined is one of them and it no longer runs over a bunch of empty arrays. It actually works correctly. I'm attaching an updated single file with the updates. This file also contains the dom tests as well.
Attachment #565130 - Attachment is obsolete: true
I wrote up a simple harness to run only the JS shell tests, and compared a run of js -m -j -p to js -m -n and v8. Higher is better. These numbers are the number of iterations in 3 seconds. opt-mjp.csv opt-mn.csv opt-d8.csv arrayCombined 20866 16282 -21% 44087 +111% arrayWeighted 299827 260542 -13% 1131964 +277% stringChat 265637 245946 -7% 249335 -6% stringDetectBrowser 895283 610040 -31% 1387736 +55% stringFilter 10834 11018 +1% 155878 +1338% stringValidateForm 1510407 901291 -40% 2727262 +80% stringWeighted 326920 251073 -23% 600624 +83% Those are just the ones that are listed in comment 22 and appear to be runnable directly from the shell. I also have results for everything accessible via wget -r that I'll attach. I haven't looked into all of these. stringFilter is bug 503107, but should be mostly fixed by bug 691797. I don't know what's up with the regressions from TM/JM (-m -j -p) to JM+TI (-m -n). Comment 29 addresses at least part of the array stuff.
Peacekeeper test fails totally on DOM Tree load. Memory leaks to beyond 2Gb and crashes. Surely this is known but matches a real world problem with TwitterDeck which I suspect is the same bug.
The issue doesn't occur in safe mode. Surely the plugin system should disable a runaway process.
We made a lot of progress here but I'm not working on this right now.
Assignee: jdemooij → nobody
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.