<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 3

•

17 years ago

The plan that I'm following is to do a days worth of 1.9 runs, switch the machines over to mozilla-central and then do another days worth of runs. The results will all be reported on the graph server. Once the tests are completed I will collect the links to relevant graphs for people to examine. The results for 1.9 and mozilla-central will be reported on the same graph line - thus the first half of the line will be 1.9 and the second mozilla-central. I'll keep track of the switchover time so that it will be easy to determine what result belongs to what browser. This set up should provide the easiest means of finding regressions as you'll see a given line rise/lower instead of having to compare multiple lines.

Reporter

Updated

•

17 years ago

Summary: do talos run on 1.9 hardware to compare moz2 perf vs 1.9 perf → do talos run of moz2 builds on 1.9 hardware to compare moz2 perf vs 1.9 perf

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 4

•

17 years ago

The results are on graphs-stage.mozilla.org - all of the graphs of interest begin with the label "TEMP_". The first half of each half represents results from ff3, the second half mozilla-central. Linux appears to be regressed. The most obvious graph is for tp: http://graphs-stage.mozilla.org/#spst=range&spstart=1210966320&spend=1211124180&bpst=cursor&bpstart=1210966320&bpend=1211124180&m1tid=273482&m1bl=0&m1avg=0 But it is also present for tsspider, tjss, tdhtml, etc: http://graphs-stage.mozilla.org/#spst=range&spstart=1210966320&spend=1211124180&bpst=cursor&bpstart=1210966320&bpend=1211124180&m1tid=273478&m1bl=0&m1avg=0 http://graphs-stage.mozilla.org/#spst=range&spstart=1210966320&spend=1211124180&bpst=cursor&bpstart=1210966320&bpend=1211124180&m1tid=273474&m1bl=0&m1avg=0 http://graphs-stage.mozilla.org/#spst=range&spstart=1210966320&spend=1211124180&bpst=cursor&bpstart=1210966320&bpend=1211124180&m1tid=273492&m1bl=0&m1avg=0

Comment 5

•

17 years ago

Are the builds using the same optimization options? (And profile-based stuff?)

Comment 6

•

17 years ago

Looks like there are differences in the build config. 1.9.0 build - http://mxr.mozilla.org/seamonkey/source/tools/tinderbox-configs/firefox/linux/mozconfig 1.9.1 build - http://hg.mozilla.org/build/buildbot-configs/index.cgi/file/tip/mozilla2/linux/mozconfig The key differences are 1, -enable-optimize="-Os" vs --enable-optimize 2, --disable-tests for 1.9.0. 3, --enable-codesighs for 1.9.0 The first one looks like a prime suspect.

Comment 7

•

17 years ago

Attached patch Use plain --enable optimise on linux — Details — Splinter Review

Attachment #321629 - Flags: review?(bhearsum)

Comment 8

•

17 years ago

Comment on attachment 321629 [details] [diff] [review] Use plain --enable optimise on linux plzfix.

Attachment #321629 - Flags: review?(bhearsum) → review+

Comment 9

•

17 years ago

Pushed that change to buildbot-configs.

Assignee

Comment 10

•

17 years ago

I should have new tests numbers later this afternoon. I'll repost relevant graph links then.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Assignee

Comment 11

•

17 years ago

I got some results in and it doesn't appear to have fixed the regression. You can see on this graph: http://graphs-stage.mozilla.org/#spst=range&spstart=1210966320&spend=1211239380&bpst=Cursor&bpstart=1210966320&bpend=1211239380&m1tid=273482&m1bl=0&m1avg=0 The latest test results are on the far right - it is showing the same values as previously collected for mozilla-central.

Comment 12

•

17 years ago

Are we doing profiled builds (i.e., profile-guided optimization) on Linux nightlies on 1.9?

Comment 13

•

17 years ago

(In reply to comment #12) > Are we doing profiled builds (i.e., profile-guided optimization) on Linux > nightlies on 1.9? No, |$ProfiledBuild = 1;| is only set in the windows tinder-config.pl. I'm not sure what else could be different between these machines, any ideas Ben ?

Jason Orendorff [:jorendorff]

Comment 14

•

17 years ago

I can't find any other differences. I've confirmed that the same version of GCC is installed on both of the slaves, and they both have the appropriate CC= CXX= lines in their mozconfig.

Comment 15

•

17 years ago

I tested JS performance *outside* the browser on my Ubuntu Linux box and found no difference. First I just built the js shell both ways and tested them with SunSpider; no difference. Then I built the browser both ways and tested the two dist/bin/xpcshell executables with SunSpider. One, * start with revision e70e05d8eda2 (cvs-trunk-mirror tip) * copy client.py from mozilla-central tip and run it to get NSS/NSPR/etc. * use a vanilla mozconfig: . $topsrcdir/browser/config/mozconfig ac_add_options --disable-tests mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-ff-release ac_add_options --disable-oji ac_add_options --disable-xmlextras * Run autoconf-2.13 manually. * make -f client.mk build Two, * start with revision 819a2c29e295 (mozilla-central tip) * python client.py checkout * Use an identical mozconfig, except without the --disable-xmlextras line (xmlextras isn't there anymore in mozilla-central). * make -f client.mk build SunSpider claims mozilla-central is 0.4% slower overall, faster on some benchmarks and slower on others.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 16

•

17 years ago

So I did some diffing of Linux nightlies 2008-05-20-04-trunk and 2008-05-20-02-mozilla-central. There is no difference between the list of files in the builds. A diff shows a bunch of differences in non-binary files, all related to: * version number bump * build date differences * pathnames in .js file //@line comments The binary files that differ include .so files, .jar files, browser.xpt, and libfreebl3.chk. Comparing the .so files, with: find 2008-05-20-04-trunk -name "*.so" | cut -d/ -f2- | while read FNAME; do diff -u 2008-05-20-04-trunk/$FNAME 2008-05-20-02-mozilla-central/$FNAME; done | cut -d" " -f3 | cut -d/ -f2- | while read FNAME; do diff -u <(objdump -s -d 2008-05-20-04-trunk/$FNAME) <(objdump -s -d 2008-05-20-02-mozilla-central/$FNAME); done shows that: libsoftokn3.so differences are only a date stamp in the file libmozjs.so differences are substantial ***** libnssckbi.so differences are a date stamp inside the file libnullplugin.so differences are substantial ***** libjemalloc.so differences are substantial ***** libplc4.so differences are a date stamp inside the file and one tiny diff libnss3.so differences are a date stamp inside the file libnspr4.so differences are substantial ***** libxul.so differences are substantial ***** libplds4.so differences are a date stamp inside the file and one tiny diff libssl3.so differences are a date stamp inside the file libsmime3.so differences are a date stamp inside the file I'll look at the jars next. libfreebl3.chk is probably ok; I'm not sure how to diff browser.xpt.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 17

•

17 years ago

The diffs inside the jars are: * the version number bump * pathnames in .js file //@line comments * the following diff in content/global/buildconfig.html : - --enable-application=browser --enable-update-channel=nightly --enable-update-packaging --enable-optimize --disable-debug --disable-tests --enable-codesighs + --enable-application=browser --enable-optimize --enable-update-channel=nightly --enable-update-packaging --disable-debug Could not having --disable-tests make a difference somehow? People might sometimes ifdef code based on ENABLE_TESTS...

Comment 18

•

17 years ago

It's worth trying, yes

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 19

•

17 years ago

I think we should make the mozconfigs match (modulo cruft, which should be run by bsmedberg or myself) to rule out any weirdness. dbaron: the .chk files are just checksums, so they will not match. The libmozjs.so differences are probably mostly due to the C->C++ switch. (Although it's certainly possible that there are other differences.)

Comment 20

•

17 years ago

(In reply to comment #11) > I got some results in and it doesn't appear to have fixed the regression. You > can see on this graph: > > http://graphs-stage.mozilla.org/#spst=range&spstart=1210966320&spend=1211239380&bpst=Cursor&bpstart=1210966320&bpend=1211239380&m1tid=273482&m1bl=0&m1avg=0 > > The latest test results are on the far right - it is showing the same values as > previously collected for mozilla-central. Actually, I'm finding this a little hard to believe. The build log for http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla2/1211187600.1211190233.1341.gz&fulltext=1 shows the options used to be just -Os, and the build log for http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla2/1211274000.1211276585.12695.gz&fulltext=1 shows that the new options are "-Os -freorder-blocks -fno-reorder-functions -finline-limit=50". That should have made *some* difference. Are you sure the right build got tested?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 21

•

17 years ago

And the latter matches the optimization options in http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1211281860.1211285059.16088.gz&fulltext=1

Comment 22

•

17 years ago

Attached patch [checked in] sync 1.9 linux mozconfig -> mozilla-central — Details — Splinter Review

Add --disable-tests, --enable-codesighs, and PROFILE_GEN_SCRIPT to linux mozconfig. I don't think either of the latter two could possibly do anything, but who knows *shrug*.

Attachment #321819 - Flags: review?(ted.mielczarek)

Assignee

Comment 23

•

17 years ago

In response to comment #20, I'm pulling builds from (as an example - builds are dropped into unique dated directories): http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/1211253826/firefox-3.1a1pre.en-US.linux-i686.tar.bz2 As long as the builders are dropping to that location then that's what I'm testing.

Comment 24

•

17 years ago

(In reply to comment #23) > As long as the builders are dropping to that location then that's what I'm > testing. > Confirmed, they are.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 25

•

17 years ago

Comment on attachment 321819 [details] [diff] [review] [checked in] sync 1.9 linux mozconfig -> mozilla-central I agree, but we need to eliminate variables.

Attachment #321819 - Flags: review?(ted.mielczarek) → review+

Reporter

Comment 26

•

17 years ago

Looking at the graphs in comment #11, there's a jump of approx 19% jump in Tp numbers (from 445ish -> 512ish). The lower number is from cvs-trunk. The higher number is from hg-mozilla-central. Question: How do we ever confirm if this jump in Tp numbers is because of code changes, build setup changes, machine changes or a combination? Suggestion: Setup a new clean, pure, unmodified code branch in hg, which exactly matches the contents of cvs-trunk and then: - point our existing mozilla2 builders to that clean, unmodified branch, and generate a build - feed that new build to the borrowed talos machines Alice brought over - feed that new build to the official moz2 talos machines also. If the new build on the borrowed talos machines return *different* results to the cvs-trunk results, then we know something is wrong with the build/talos infrastructure. If the new build on the borrowed talos machines return *similar* results to the cvs-trunk results, then we know the builds are configured and running ok. We would also have corresponding numbers on the official moz2 talos machines, so we could return the borrowed talos machines back, while still trying to figure out which code change regressed performance.

Comment 27

•

17 years ago

I'll be landing the mozconfig patch today, after mpt-vpn is fixed up.

Comment 28

•

17 years ago

joduinn, I have created a test repository at http://hg.mozilla.org/users/bsmedberg_mozilla.com/index.cgi/firefox-3.0rc1/ which you can point build automation at to implement comment 26.

Comment 29

•

17 years ago

Lets be careful where any _nightly_ builds from bsmedberg's repo end up on the ftp server. Somewhere in firefox/nightly/experimental would be good. Not so worried about hourly builds, probably easier to leave them unchanged for Talos' sake.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 30

•

17 years ago

Comment on attachment 321819 [details] [diff] [review] [checked in] sync 1.9 linux mozconfig -> mozilla-central landed in changeset: 80:944b645d1a06 I kicked a linux dep build, too.

Attachment #321819 - Attachment description: sync 1.9 linux mozconfig -> mozilla-central → [checked in] sync 1.9 linux mozconfig -> mozilla-central

Reporter

Comment 31

•

17 years ago

(In reply to comment #28) > joduinn, I have created a test repository at > http://hg.mozilla.org/users/bsmedberg_mozilla.com/index.cgi/firefox-3.0rc1/ > which you can point build automation at to implement comment 26. > Do you know what timestamp in CVS was used when creating this new mercurial branch? I'd like to know what numbers we should compare with.

Comment 32

•

17 years ago

It was created by unpacking the Firefox3RC1 tarball.

Comment 33

•

17 years ago

I've got bsmedberg's repo building on one of the production slaves right now. Alice says it should be picked up automatically by the Talos slaves. I'll queue up a second build so we get an additional run out of it.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Assignee

Comment 34

•

17 years ago

Results are now showing up here: http://graphs-stage.mozilla.org/graph.html#spst=range&spstart=0&spend=1211399040&bpst=cursor&bpstart=0&bpend=1211399040&m1tid=273482&m1bl=0&m1avg=0 Start of the graph is ff3-from-cvs, middle part is mozilla-central-from-hg and the last few data points are ff3-from-hg. As you can see, the perf numbers generated by ff3-from-hg are comparable to ff3-from-cvs (though I suspect that there still might be a 1-2% regression, I want more data points to be sure). That nothwishstanding, I'm comfortable saying that the majority of the regression is from the browser code, not from the build system.

Reporter

Comment 35

•

17 years ago

The graph in comment#30 shows the borrowed 1.9 talos machines running against three different situations. The first few points (up to 07:13 17may2008) were against CVS-trunk for FF3.0rc1. The next points (up to 10:23 21may2008) were against hg mozilla-central. The last few points (from 12:44 21may2008) were against the clean hg firefox-3.0rc1 branch which bsmedberg created in comment#31. To me, this looks like the build and talos infrastructure is building, and testing correctly. It seems that code changes in mozilla-central is causing almost all of the 19% performance regression. (fyi: we are still investigating the 1% difference between before (440-445) and after (445-450). Its possible these numbers will settle down in a few more runs, or we may discover a 1% difference somewhere in our build/talos infrastructure. Stay tuned...)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 36

•

17 years ago

(In reply to comment #34) > Start of the graph is ff3-from-cvs, middle part is mozilla-central-from-hg and > the last few data points are ff3-from-hg. I see four parts there: * a bunch of points in the 439-447 range * a bunch of points in the 506-513 range (with a pause in the middle) * a bunch of points in the 475-482 range * a bunch of points in the 445-451 range How do those four parts line up with the three parts you listed?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 37

•

17 years ago

BTW, I think this says it's time to revert the patch to build JS in C++ rather than C to determine if that's the cause.

Assignee

Comment 38

•

17 years ago

In response to comment #36 - yes, it looks like there are 4 parts. The third chunk is unexplained as of yet. Could have been caused by one of the changes that we made to the build system (though it doesn't seem to line up directly with any of them). Where there any check ins to moz-central during that time, or were we fully locked down? Regardless, we've only collected matching numbers for ff3-in-cvs and ff3-in-hg, even with the improvement in the moz-central numbers they were still high. I will continue to attempt to track the moz-central number change, though. Just to make sure that everything gets examined.

Assignee

Comment 39

•

17 years ago

Upon further examination of the graphs, the 2% regression might be more widespread. Looks pretty clearly to be in winxp: http://graphs-stage.mozilla.org/graph.html#spst=range&spstart=1210968540&spend=1211402340&bpst=Cursor&bpstart=1210968540&bpend=1211402340&m1tid=273554&m1bl=0&m1avg=0 Possibly in leopard: http://graphs-stage.mozilla.org/graph.html#spst=range&spstart=1210966620&spend=1211423940&bpst=Cursor&bpstart=1210966620&bpend=1211423940&m1tid=273443&m1bl=0&m1avg=0 The vista numbers also look a little suspicious. This was overlooked initially because I was looking for more major regressions (as seen in linux). Noticing 1-2% can be a little difficult.

Comment 40

•

17 years ago

Given the previous comments I'm turning back on regular mozilla-central linux builds.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 41

•

17 years ago

ok, I've created a repository which backs out JS-as-C++: http://hg.mozilla.org/users/bsmedberg_mozilla.com/index.cgi/mozilla-central-jscnotcxx/ Can we perftest this repository? I'd just submit it to the tryserver, but I don't think I could compare the results to anything meaningfully.

Reporter

Comment 42

•

17 years ago

bsmedberg: I'd rather not go down the path of doing specific perftest runs on individual what-if hg branches. Thats going to take a lot of our time, in addition to hold our temp-experiment-machines for longer. Instead, can you just back out that change in mozilla-central? The moz2 infrastructure we setup is already building and testing mozilla-central. This will allow us to revert back our special machine setups, while the hunt-for-regression work goes on.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 43

•

17 years ago

From phone conversation with joduinn: 1) we no longer need the temp-experiment machines: we can find and correct regressions using the existing talos machines 2) we're going to try and use the tryserver and tryserver-talos to try these experimental backouts. I've kicked off three runs already: * mozilla-central * FF3RC1 * JS-as-C And will kick off a few more runs of the same builds to make sure that the numbers are stable and we can compare results meaningfully.

Reporter

Comment 44

•

17 years ago

(In reply to comment #43) > 2) we're going to try and use the tryserver and tryserver-talos to try these > experimental backouts. I've kicked off three runs already: > > * mozilla-central > * FF3RC1 > * JS-as-C > > And will kick off a few more runs of the same builds to make sure that the > numbers are stable and we can compare results meaningfully. Filed new bug##434267 to track finding the perf-losing-patch, and copied this comment there. Once we move the loaner talos hardware back to FF3, we can close this bug.

Jim Jeffery not reading bug-mail 1/2/11

Assignee

Comment 45

•

17 years ago

I've disconnected the custom moz-central vs. ff3 testing configuration on the staging environment.

Comment 46

•

17 years ago

I don't think 434267 is the correct bug number, as that seems to be an AwesomeBar bug?

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 47

•

17 years ago

So, the Talos results, Tp: mozilla-central-stock: 480.74 FF3RC1-stock: 483.73 central-JSnotCXX: 483.94 Ts: m-c: 1495.84 FF3RC1: 1480.37 central-JSnotCXX: 1495.11 So if these numbers are to be believed (and I don't have any reason not to), according to that talos run, mozilla-central-stock is a tiny bit faster on Tp, and a tiny bit slower on Ts. The tiny bit faster on Tp seems to be due to compiling spidermonkey as C++; the tiny bit slower on Ts is not related to spidermonkey-as-C++. (Also, those numbers are spookily precise.)

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Reporter

Comment 48

•

17 years ago

(In reply to comment #44) > Filed new bug##434267 to track finding the perf-losing-patch, and copied this > comment there. (In reply to comment #46) > I don't think 434267 is the correct bug number, as that seems to be an > AwesomeBar bug? Sorry, a typo on my part. bug#435267 is tracking finding the perf-losing-patch.

timeless

Comment 49

•

17 years ago

dbaron, for xpt files you can use xpt_dump