Closed Bug 434085 Opened 12 years ago Closed 12 years ago
do talos run of moz2 builds on 1
.9 hardware to compare moz2 perf vs 1 .9 perf
...basically boils down to: - for mozilla-central, we have talos numbers going back to 04apr2008, but the JS change from c -> c++ happened mid2007. - the mac mini hardware being used for moz2 is different to 1.9 and 1.8. To show that today's mozilla-central builds have same performance as today's cvs-trunk builds, we have to reconfig some staging 1.9 talos machines to look at mozilla-central. It will take time to reconfig these, then collect several runs of data, then reconfig the machines back.
Assignee: nobody → anodelman
Priority: -- → P1
12 years ago
The plan that I'm following is to do a days worth of 1.9 runs, switch the machines over to mozilla-central and then do another days worth of runs. The results will all be reported on the graph server. Once the tests are completed I will collect the links to relevant graphs for people to examine. The results for 1.9 and mozilla-central will be reported on the same graph line - thus the first half of the line will be 1.9 and the second mozilla-central. I'll keep track of the switchover time so that it will be easy to determine what result belongs to what browser. This set up should provide the easiest means of finding regressions as you'll see a given line rise/lower instead of having to compare multiple lines.
12 years ago
Summary: do talos run on 1.9 hardware to compare moz2 perf vs 1.9 perf → do talos run of moz2 builds on 1.9 hardware to compare moz2 perf vs 1.9 perf
The results are on graphs-stage.mozilla.org - all of the graphs of interest begin with the label "TEMP_". The first half of each half represents results from ff3, the second half mozilla-central. Linux appears to be regressed. The most obvious graph is for tp: http://graphs-stage.mozilla.org/#spst=range&spstart=1210966320&spend=1211124180&bpst=cursor&bpstart=1210966320&bpend=1211124180&m1tid=273482&m1bl=0&m1avg=0 But it is also present for tsspider, tjss, tdhtml, etc: http://graphs-stage.mozilla.org/#spst=range&spstart=1210966320&spend=1211124180&bpst=cursor&bpstart=1210966320&bpend=1211124180&m1tid=273478&m1bl=0&m1avg=0 http://graphs-stage.mozilla.org/#spst=range&spstart=1210966320&spend=1211124180&bpst=cursor&bpstart=1210966320&bpend=1211124180&m1tid=273474&m1bl=0&m1avg=0 http://graphs-stage.mozilla.org/#spst=range&spstart=1210966320&spend=1211124180&bpst=cursor&bpstart=1210966320&bpend=1211124180&m1tid=273492&m1bl=0&m1avg=0
Are the builds using the same optimization options? (And profile-based stuff?)
Looks like there are differences in the build config. 1.9.0 build - http://mxr.mozilla.org/seamonkey/source/tools/tinderbox-configs/firefox/linux/mozconfig 1.9.1 build - http://hg.mozilla.org/build/buildbot-configs/index.cgi/file/tip/mozilla2/linux/mozconfig The key differences are 1, -enable-optimize="-Os" vs --enable-optimize 2, --disable-tests for 1.9.0. 3, --enable-codesighs for 1.9.0 The first one looks like a prime suspect.
Comment on attachment 321629 [details] [diff] [review] Use plain --enable optimise on linux plzfix.
Attachment #321629 - Flags: review?(bhearsum) → review+
Pushed that change to buildbot-configs.
I should have new tests numbers later this afternoon. I'll repost relevant graph links then.
I got some results in and it doesn't appear to have fixed the regression. You can see on this graph: http://graphs-stage.mozilla.org/#spst=range&spstart=1210966320&spend=1211239380&bpst=Cursor&bpstart=1210966320&bpend=1211239380&m1tid=273482&m1bl=0&m1avg=0 The latest test results are on the far right - it is showing the same values as previously collected for mozilla-central.
Are we doing profiled builds (i.e., profile-guided optimization) on Linux nightlies on 1.9?
(In reply to comment #12) > Are we doing profiled builds (i.e., profile-guided optimization) on Linux > nightlies on 1.9? No, |$ProfiledBuild = 1;| is only set in the windows tinder-config.pl. I'm not sure what else could be different between these machines, any ideas Ben ?
I can't find any other differences. I've confirmed that the same version of GCC is installed on both of the slaves, and they both have the appropriate CC= CXX= lines in their mozconfig.
I tested JS performance *outside* the browser on my Ubuntu Linux box and found no difference. First I just built the js shell both ways and tested them with SunSpider; no difference. Then I built the browser both ways and tested the two dist/bin/xpcshell executables with SunSpider. One, * start with revision e70e05d8eda2 (cvs-trunk-mirror tip) * copy client.py from mozilla-central tip and run it to get NSS/NSPR/etc. * use a vanilla mozconfig: . $topsrcdir/browser/config/mozconfig ac_add_options --disable-tests mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-ff-release ac_add_options --disable-oji ac_add_options --disable-xmlextras * Run autoconf-2.13 manually. * make -f client.mk build Two, * start with revision 819a2c29e295 (mozilla-central tip) * python client.py checkout * Use an identical mozconfig, except without the --disable-xmlextras line (xmlextras isn't there anymore in mozilla-central). * make -f client.mk build SunSpider claims mozilla-central is 0.4% slower overall, faster on some benchmarks and slower on others.
So I did some diffing of Linux nightlies 2008-05-20-04-trunk and 2008-05-20-02-mozilla-central. There is no difference between the list of files in the builds. A diff shows a bunch of differences in non-binary files, all related to: * version number bump * build date differences * pathnames in .js file //@line comments The binary files that differ include .so files, .jar files, browser.xpt, and libfreebl3.chk. Comparing the .so files, with: find 2008-05-20-04-trunk -name "*.so" | cut -d/ -f2- | while read FNAME; do diff -u 2008-05-20-04-trunk/$FNAME 2008-05-20-02-mozilla-central/$FNAME; done | cut -d" " -f3 | cut -d/ -f2- | while read FNAME; do diff -u <(objdump -s -d 2008-05-20-04-trunk/$FNAME) <(objdump -s -d 2008-05-20-02-mozilla-central/$FNAME); done shows that: libsoftokn3.so differences are only a date stamp in the file libmozjs.so differences are substantial ***** libnssckbi.so differences are a date stamp inside the file libnullplugin.so differences are substantial ***** libjemalloc.so differences are substantial ***** libplc4.so differences are a date stamp inside the file and one tiny diff libnss3.so differences are a date stamp inside the file libnspr4.so differences are substantial ***** libxul.so differences are substantial ***** libplds4.so differences are a date stamp inside the file and one tiny diff libssl3.so differences are a date stamp inside the file libsmime3.so differences are a date stamp inside the file I'll look at the jars next. libfreebl3.chk is probably ok; I'm not sure how to diff browser.xpt.
The diffs inside the jars are: * the version number bump * pathnames in .js file //@line comments * the following diff in content/global/buildconfig.html : - --enable-application=browser --enable-update-channel=nightly --enable-update-packaging --enable-optimize --disable-debug --disable-tests --enable-codesighs + --enable-application=browser --enable-optimize --enable-update-channel=nightly --enable-update-packaging --disable-debug Could not having --disable-tests make a difference somehow? People might sometimes ifdef code based on ENABLE_TESTS...
It's worth trying, yes
I think we should make the mozconfigs match (modulo cruft, which should be run by bsmedberg or myself) to rule out any weirdness. dbaron: the .chk files are just checksums, so they will not match. The libmozjs.so differences are probably mostly due to the C->C++ switch. (Although it's certainly possible that there are other differences.)
(In reply to comment #11) > I got some results in and it doesn't appear to have fixed the regression. You > can see on this graph: > > http://graphs-stage.mozilla.org/#spst=range&spstart=1210966320&spend=1211239380&bpst=Cursor&bpstart=1210966320&bpend=1211239380&m1tid=273482&m1bl=0&m1avg=0 > > The latest test results are on the far right - it is showing the same values as > previously collected for mozilla-central. Actually, I'm finding this a little hard to believe. The build log for http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla2/1211187600.1211190233.1341.gz&fulltext=1 shows the options used to be just -Os, and the build log for http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla2/1211274000.1211276585.12695.gz&fulltext=1 shows that the new options are "-Os -freorder-blocks -fno-reorder-functions -finline-limit=50". That should have made *some* difference. Are you sure the right build got tested?
And the latter matches the optimization options in http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1211281860.1211285059.16088.gz&fulltext=1
Add --disable-tests, --enable-codesighs, and PROFILE_GEN_SCRIPT to linux mozconfig. I don't think either of the latter two could possibly do anything, but who knows *shrug*.
Attachment #321819 - Flags: review?(ted.mielczarek)
In response to comment #20, I'm pulling builds from (as an example - builds are dropped into unique dated directories): http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/1211253826/firefox-3.1a1pre.en-US.linux-i686.tar.bz2 As long as the builders are dropping to that location then that's what I'm testing.
(In reply to comment #23) > As long as the builders are dropping to that location then that's what I'm > testing. > Confirmed, they are.
Comment on attachment 321819 [details] [diff] [review] [checked in] sync 1.9 linux mozconfig -> mozilla-central I agree, but we need to eliminate variables.
Attachment #321819 - Flags: review?(ted.mielczarek) → review+
Looking at the graphs in comment #11, there's a jump of approx 19% jump in Tp numbers (from 445ish -> 512ish). The lower number is from cvs-trunk. The higher number is from hg-mozilla-central. Question: How do we ever confirm if this jump in Tp numbers is because of code changes, build setup changes, machine changes or a combination? Suggestion: Setup a new clean, pure, unmodified code branch in hg, which exactly matches the contents of cvs-trunk and then: - point our existing mozilla2 builders to that clean, unmodified branch, and generate a build - feed that new build to the borrowed talos machines Alice brought over - feed that new build to the official moz2 talos machines also. If the new build on the borrowed talos machines return *different* results to the cvs-trunk results, then we know something is wrong with the build/talos infrastructure. If the new build on the borrowed talos machines return *similar* results to the cvs-trunk results, then we know the builds are configured and running ok. We would also have corresponding numbers on the official moz2 talos machines, so we could return the borrowed talos machines back, while still trying to figure out which code change regressed performance.
I'll be landing the mozconfig patch today, after mpt-vpn is fixed up.
joduinn, I have created a test repository at http://hg.mozilla.org/users/bsmedberg_mozilla.com/index.cgi/firefox-3.0rc1/ which you can point build automation at to implement comment 26.
Lets be careful where any _nightly_ builds from bsmedberg's repo end up on the ftp server. Somewhere in firefox/nightly/experimental would be good. Not so worried about hourly builds, probably easier to leave them unchanged for Talos' sake.
Comment on attachment 321819 [details] [diff] [review] [checked in] sync 1.9 linux mozconfig -> mozilla-central landed in changeset: 80:944b645d1a06 I kicked a linux dep build, too.
Attachment #321819 - Attachment description: sync 1.9 linux mozconfig -> mozilla-central → [checked in] sync 1.9 linux mozconfig -> mozilla-central
(In reply to comment #28) > joduinn, I have created a test repository at > http://hg.mozilla.org/users/bsmedberg_mozilla.com/index.cgi/firefox-3.0rc1/ > which you can point build automation at to implement comment 26. > Do you know what timestamp in CVS was used when creating this new mercurial branch? I'd like to know what numbers we should compare with.
It was created by unpacking the Firefox3RC1 tarball.
I've got bsmedberg's repo building on one of the production slaves right now. Alice says it should be picked up automatically by the Talos slaves. I'll queue up a second build so we get an additional run out of it.
Results are now showing up here: http://graphs-stage.mozilla.org/graph.html#spst=range&spstart=0&spend=1211399040&bpst=cursor&bpstart=0&bpend=1211399040&m1tid=273482&m1bl=0&m1avg=0 Start of the graph is ff3-from-cvs, middle part is mozilla-central-from-hg and the last few data points are ff3-from-hg. As you can see, the perf numbers generated by ff3-from-hg are comparable to ff3-from-cvs (though I suspect that there still might be a 1-2% regression, I want more data points to be sure). That nothwishstanding, I'm comfortable saying that the majority of the regression is from the browser code, not from the build system.
The graph in comment#30 shows the borrowed 1.9 talos machines running against three different situations. The first few points (up to 07:13 17may2008) were against CVS-trunk for FF3.0rc1. The next points (up to 10:23 21may2008) were against hg mozilla-central. The last few points (from 12:44 21may2008) were against the clean hg firefox-3.0rc1 branch which bsmedberg created in comment#31. To me, this looks like the build and talos infrastructure is building, and testing correctly. It seems that code changes in mozilla-central is causing almost all of the 19% performance regression. (fyi: we are still investigating the 1% difference between before (440-445) and after (445-450). Its possible these numbers will settle down in a few more runs, or we may discover a 1% difference somewhere in our build/talos infrastructure. Stay tuned...)
(In reply to comment #34) > Start of the graph is ff3-from-cvs, middle part is mozilla-central-from-hg and > the last few data points are ff3-from-hg. I see four parts there: * a bunch of points in the 439-447 range * a bunch of points in the 506-513 range (with a pause in the middle) * a bunch of points in the 475-482 range * a bunch of points in the 445-451 range How do those four parts line up with the three parts you listed?
BTW, I think this says it's time to revert the patch to build JS in C++ rather than C to determine if that's the cause.
In response to comment #36 - yes, it looks like there are 4 parts. The third chunk is unexplained as of yet. Could have been caused by one of the changes that we made to the build system (though it doesn't seem to line up directly with any of them). Where there any check ins to moz-central during that time, or were we fully locked down? Regardless, we've only collected matching numbers for ff3-in-cvs and ff3-in-hg, even with the improvement in the moz-central numbers they were still high. I will continue to attempt to track the moz-central number change, though. Just to make sure that everything gets examined.
Upon further examination of the graphs, the 2% regression might be more widespread. Looks pretty clearly to be in winxp: http://graphs-stage.mozilla.org/graph.html#spst=range&spstart=1210968540&spend=1211402340&bpst=Cursor&bpstart=1210968540&bpend=1211402340&m1tid=273554&m1bl=0&m1avg=0 Possibly in leopard: http://graphs-stage.mozilla.org/graph.html#spst=range&spstart=1210966620&spend=1211423940&bpst=Cursor&bpstart=1210966620&bpend=1211423940&m1tid=273443&m1bl=0&m1avg=0 The vista numbers also look a little suspicious. This was overlooked initially because I was looking for more major regressions (as seen in linux). Noticing 1-2% can be a little difficult.
Given the previous comments I'm turning back on regular mozilla-central linux builds.
ok, I've created a repository which backs out JS-as-C++: http://hg.mozilla.org/users/bsmedberg_mozilla.com/index.cgi/mozilla-central-jscnotcxx/ Can we perftest this repository? I'd just submit it to the tryserver, but I don't think I could compare the results to anything meaningfully.
bsmedberg: I'd rather not go down the path of doing specific perftest runs on individual what-if hg branches. Thats going to take a lot of our time, in addition to hold our temp-experiment-machines for longer. Instead, can you just back out that change in mozilla-central? The moz2 infrastructure we setup is already building and testing mozilla-central. This will allow us to revert back our special machine setups, while the hunt-for-regression work goes on.
From phone conversation with joduinn: 1) we no longer need the temp-experiment machines: we can find and correct regressions using the existing talos machines 2) we're going to try and use the tryserver and tryserver-talos to try these experimental backouts. I've kicked off three runs already: * mozilla-central * FF3RC1 * JS-as-C And will kick off a few more runs of the same builds to make sure that the numbers are stable and we can compare results meaningfully.
(In reply to comment #43) > 2) we're going to try and use the tryserver and tryserver-talos to try these > experimental backouts. I've kicked off three runs already: > > * mozilla-central > * FF3RC1 > * JS-as-C > > And will kick off a few more runs of the same builds to make sure that the > numbers are stable and we can compare results meaningfully. Filed new bug##434267 to track finding the perf-losing-patch, and copied this comment there. Once we move the loaner talos hardware back to FF3, we can close this bug.
I've disconnected the custom moz-central vs. ff3 testing configuration on the staging environment.
I don't think 434267 is the correct bug number, as that seems to be an AwesomeBar bug?
So, the Talos results, Tp: mozilla-central-stock: 480.74 FF3RC1-stock: 483.73 central-JSnotCXX: 483.94 Ts: m-c: 1495.84 FF3RC1: 1480.37 central-JSnotCXX: 1495.11 So if these numbers are to be believed (and I don't have any reason not to), according to that talos run, mozilla-central-stock is a tiny bit faster on Tp, and a tiny bit slower on Ts. The tiny bit faster on Tp seems to be due to compiling spidermonkey as C++; the tiny bit slower on Ts is not related to spidermonkey-as-C++. (Also, those numbers are spookily precise.)
(In reply to comment #44) > Filed new bug##434267 to track finding the perf-losing-patch, and copied this > comment there. (In reply to comment #46) > I don't think 434267 is the correct bug number, as that seems to be an > AwesomeBar bug? Sorry, a typo on my part. bug#435267 is tracking finding the perf-losing-patch.
dbaron, for xpt files you can use xpt_dump
Given that work moved to bug 435267 (and that's closed now anyway), I think this is done.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.