Closed Bug 435267 Opened 16 years ago Closed 16 years ago

identify and backout changes in mozilla-central which cause perf regression

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: joduinn, Unassigned)

References

Details

In bug#434085, we confirmed that running talos on the same hardware as used in FF3, we see perf is 19% worse in mozilla-central compared to cvs-trunk / 3.0rc1.

This bug is to track figuring out *which* landed change in mozilla-central is causing this regression and backing it out.
(copying forward bsmedberg's initial investigations from bug#434085comment#43)

we're going to try and use the tryserver and tryserver-talos to try these
experimental backouts. I've kicked off three runs already:

* mozilla-central
* FF3RC1
* JS-as-C

And will kick off a few more runs of the same builds to make sure that the
numbers are stable and we can compare results meaningfully.
Copied from my comments in bug 434085, updated with bsmedberg's latest runs:

Talos results, Tp:

mozilla-central-stock: 480.74, 481.40, 483.96
FF3RC1-stock: 483.73
central-JSnotCXX: 483.94, 480.19

Ts:
m-c: 1495.84, 1475, 1534
FF3RC1: 1480.37
central-JSnotCXX: 1495.11, 1515

It all looks like noise; I don't think I see a regression here.  Where does this leave us?
Just to provide some additional information, when we briefly switched over the mozilla-central builds to tests a fresh pull of ff3 code in hg we reproduced the drop in tp numbers on more than a single machine.  It also showed up in the production talos ubuntu mozilla-central triad.  See here:

http://graphs.mozilla.org/#spst=range&spstart=1207339502&spend=1208211397&bpst=cursor&bpstart=1207339502&bpend=1208211397&m1tid=395166&m1bl=0&m1avg=0&m2tid=395125&m2bl=0&m2avg=0&m3tid=395135&m3bl=0&m3avg=0

The drop at 5/21 12:14 is when we switched to fresh ff3 code, and the subsequent increase at 5/22 6:14 was switching back to a pull from the mozilla-central hg repository.  This means that we have 4 talos machines in agreement that a regression exists.

I'm afraid that this has mostly shown the weakness in our try talos configuration.
Product: Firefox → Core
QA Contact: general → general
I've corrected an inconsistency in the --enable-optimize line of the Linux try server mozconfig and kicked a new set of builds. I'll post results here when they all come out.
What do we lose by backing everything back to CVS trunk and getting numbers that work - then working our way forward?
Latest numbers:
mozilla-central stock - 480.61
FF3RC1 - 482.66
mozilla-central w/ JSnotCXX patch - 482.73
Scratch that last comment. In my haste to get these builds going I forgot to cvs up on the master to get the new mozconfig. Re-running the builds now.
OK, here's the numbers from builds with a proper --enable-optimize line:

mozilla-central stock - 426.54
FF3RC1 - 431.04
mozilla-central w/ JSnotCXX patch - 427.06
This strongly indicates to me that we have a build config problem on the m-c build machines.
(In reply to comment #9)
> This strongly indicates to me that we have a build config problem on the m-c
> build machines.
> 

FWIW: I just did a manual comparison of the Linux mozconfigs and the only differences are as follows:
In the mozilla-central mozconfig, not in 1.9:
export MOZILLA_OFFICIAL=1 (this is set by tinderbox for 1.9)

In 1.9, not in mozilla-central:
mk_add_options MOZ_CO_MODULE="mozilla/tools/update-packaging mozilla/tools/codesighs" - this is inapplicable for mozilla-central.


Here's a paranoid version check of os/kernel/tool versions on the relevant VMs.
OS+kernel versions:
fx-linux-tbox: CentOS release 5 (Final) - 2.6.18-53.1.13.el5 #1 SMP Tue Feb 12 13:01:45 EST 2008 i686 i686 i386 GNU/Linux
moz2-linux-slave01: CentOS release 5 (Final) - 2.6.18-53.1.6.el5
moz2-linux-slave02: CentOS release 5 (Final) - 2.6.18-53.1.14.el5
moz2-linux-slave03: CentOS release 5 (Final) - 2.6.18-53.1.19.el5
try server slaves: CentOS release 5 (Final) - 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:36:49 EST 2008 i686 athlon i386 GNU/Linux

GCC Versions:
fx-linux-tbox: gcc (GCC) 4.1.2 20061011 (Red Hat 4.1.1-29)
moz2 linux slaves: gcc (GCC) 4.1.2 20061011 (Red Hat 4.1.1-29)
try server slaves: gcc (GCC) 4.1.2 20061011 (Red Hat 4.1.1-29)

Python versions:
fx-linux-tbox: Python 2.5.1
moz2 linux slaves: Python 2.5.1
try server slaves: Python 2.5.1

Perl versions:
fx-linux-tbox: This is perl, v5.8.8 built for i386-linux-thread-multi
moz2 linux slaves: This is perl, v5.8.8 built for i386-linux-thread-multi
try server slaves: This is perl, v5.8.8 built for i386-linux-thread-multi

Sed versions:
fx-linux-tbox: GNU sed version 4.1.5
moz2 linux slaves: GNU sed version 4.1.5
try server slaves: GNU sed version 4.1.5

Make versions:
fx-linux-tbox: GNU Make 3.81
moz2 linux slaves: GNU Make 3.81
try server slaves: GNU Make 3.81

ld versions:
fx-linux-tbox: GNU ld version 2.17.50.0.6-2.el5 20061020
moz2 linux slaves: GNU ld version 2.17.50.0.6-2.el5 20061020
try server slaves: GNU ld version 2.17.50.0.6-2.el5 20061020

nm versions:
fx-linux-tbox: GNU nm 2.17.50.0.6-2.el5 20061020
moz2 linux slaves: GNU nm 2.17.50.0.6-2.el5 20061020
try server slaves: GNU nm 2.17.50.0.6-2.el5 20061020
Yesterday we had the following plan of action:
* clobber slaves
* do a run of mozilla-central
* do a new run of mozilla-central with the JS-as-C++ patch backed out

As it turns out, after clobbering the slaves the numbers dropped back down and have continued to stay down, even after backing out the JS-as-C++ patch.

(when: qm-linux-trunk01, qm-linux-trunk02, qm-linux-trunk03)
Before clobber: 469, 461, 461
After clobber, before backout: 435, 424, 430
After backout (w/ a clobber): 429, 424, 433
After checking the JS-as-C++ patch back in: 427, 425, 433
Another run, no code changes: 430, 429, 434

Given this, it looks like our performance was somehow different in a depend build from that in a clobber. I should note that our dep builds are not currently clobbered on a regular basis, like those on 1.8 or 1.9.
WFM!

Bug 432236 is about regular/automatic clobbering. I think I may take a shot at it.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.