Upgrade qa and build machines to reflect latest linux runtime requirements

RESOLVED FIXED

Status

Release Engineering
General
P2
normal
RESOLVED FIXED
11 years ago
4 years ago

People

(Reporter: rc, Assigned: preed)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

11 years ago
Vlad attempted to land an upgrade to in-repository version of Cairo last night and witnessed some reftest errors due to platform incompatibility. We need to upgrade the machines to reflect the new platform requirements proposed in the url:

http://wiki.mozilla.org/Linux/Runtime_Requirements
(Reporter)

Comment 1

11 years ago
Preed and Ben have raised good points about this hitting a number of extra machines. Perftest machines, for example. While it'll be fairly easy to upgrade qm-rhel02,3, the others might be a little trickier. There might be staging issues if we don't upgrade these at the same time.

Comment 2

11 years ago
Some notes extracted from email and previous discussions:

We tried to replace argo-vm (an undocumented install currently production firefox trunk nightlies) with fx-linux-tbox (The current linux refplatform), but the builds produced by fx-linux-tbox were 5-10% slower than argo-vm when tested under identical conditions and identical configs.

I don't know how to diagnose this, so I was going to propose that we ignore it and switch to fx-linux-tbox anyway, because it's a known config and with --enable-libxul it's within a few % of argo-vm.

Now, as for switching to CentOS5 as a build refplatform: builds produced by CentOS5 will not run on CentOS4. This means that we would have to upgrade the performance test boxes to a newer runtime (I believe that the new perf farm is already running some modern version of Ubuntu, right?). If we decide to switch to CentOS5, we should switch the perf box first, and get some historical data. Then we can switch the build machine.

This doesn't really affect the unit-test boxes specifically, because they test their own builds; so you could theoretically upgrade them independently. But that means that our unit tests are testing an entirely different environment than our production builds, which isn't especially helpful.
(Reporter)

Comment 3

11 years ago
(In reply to comment #2)
> Now, as for switching to CentOS5 as a build refplatform: builds produced by
> CentOS5 will not run on CentOS4. This means that we would have to upgrade the
> performance test boxes to a newer runtime (I believe that the new perf farm is
> already running some modern version of Ubuntu, right?). If we decide to switch
> to CentOS5, we should switch the perf box first, and get some historical data.
> Then we can switch the build machine.

Yup. The linux perf boxes (qm-plinux01-05) are running Ubuntu Feisty, iirc.

> This doesn't really affect the unit-test boxes specifically, because they test
> their own builds; so you could theoretically upgrade them independently. But
> that means that our unit tests are testing an entirely different environment
> than our production builds, which isn't especially helpful.

Right. Once nice option since we've got a new machine on dedicated hardware is that we can install CentOS5 on it and bring up a test box and run it alongside qm-rhel02 for a couple of days to compare results. This might give us some useful data before we upgrade the build machines.
Blocks: 383960

Comment 4

11 years ago
I would love to stop using the old perf machine setup and use the new qm-plinux boxes instead, but I'm concerned about when they're going to be ready. Would it be faster to commission a new tinderbox-based perftest machine?
Flags: blocking1.9+
Target Milestone: --- → mozilla1.9alpha6
from offline discussions: 

1) we've already moved from argo-vm (an undocumented install currently production
firefox trunk nightlies) to fx-linux-tbox (The current linux refplatform, same as linux refplatform rel4).

2) we need to create a new refplatform rel5. Assigning to preed, based on yesterday's build meeting.

3) we need to rollout this new refplatform rel5 out to build and QA machines to close this bug.
Assignee: nobody → build
Component: Build Config → Build & Release
Flags: blocking1.9+
Product: Core → mozilla.org
QA Contact: build-config → preed
Target Milestone: mozilla1.9alpha6 → ---
Version: Trunk → other
Assignee: build → preed
I noticed that fx-linux-tbox, from the current generation of the ref vm, didn't have TBOX_CLIENT_CVS_DIR set, so it wasn't updating its tinderbox code. This line needs to be added to  ~cltbld/.bash_profile 
  export TBOX_CLIENT_CVS_DIR="/builds/tinderbox/mozilla/tools"

Bug 384626 might change that a little.
Blocks: 333126
(Assignee)

Comment 7

11 years ago
I'm actually working on this nowish, so resetting priority.
Status: NEW → ASSIGNED
Priority: -- → P1
(Assignee)

Comment 8

11 years ago
Created attachment 269265 [details] [diff] [review]
tinder-config changes for fxnewref-linux-tbox
Attachment #269265 - Flags: review?(ccooper)

Updated

11 years ago
Attachment #269265 - Flags: review?(ccooper) → review+
(Assignee)

Comment 9

11 years ago
Alright, ref-vm is created. It does *NOT* (as of now) include the buildbot dependencies (I will add those on Monday).

It's reporting into the MozillaExperimental tinderbox page (http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaExperimental), and builds are being uploaded in the experimental directory under linux-newref.

We'll test this VM out for a few days, and then look at when it makes sense to switch it so it's the nightly Linux build machine for trunk (probably next week)?

Updated

11 years ago
Blocks: 385911
It would probably be good for the reference build VM (if this is what we're going to be using for tinderboxes) to have some debuginfo packages, which help for generating stack traces and performance data when needed.  The ones I have installed on my machine are:

cairo-debuginfo
expat-debuginfo
fontconfig-debuginfo
freetype-debuginfo
glib2-debuginfo
glibc-debuginfo
gnome-vfs2-debuginfo
gtk2-debuginfo
hal-debuginfo
libX11-debuginfo
libXft-debuginfo
libgnome-debuginfo
pango-debuginfo
scim-bridge-debuginfo
No longer blocks: 385911

Comment 11

11 years ago
gcc-debuginfo is also useful sometimes (it has symbols for libstdc++)
Based on some trace-malloc stacks I took recently, the following also show up in some stacks:

dbus-debuginfo libselinux-debuginfo gtk2-engines-debuginfo ORBit2-debuginfo libXcursor-debuginfo libXext-debuginfo libXfixes-debuginfo libXi-debuginfo libXinerama-debuginfo libXrender-debuginfo atk-debuginfo libbonobo-debuginfo dbus-glib-debuginfo GConf2-debuginfo popt-debuginfo gcc-debuginfo
(Assignee)

Comment 13

11 years ago
Sorry for the bugspam; these are now P2 in the New View of the World (tm).
Priority: P1 → P2
(Reporter)

Comment 14

11 years ago
This is holding up trunk development, which was the original reason for filing this bug. If there's some way that we can build a later version of Cairo on the existing refplatform, then that's OK, but I understood this was blocking some important patches.

Vlad, Dbaron?
(Assignee)

Comment 15

11 years ago
(In reply to comment #14)
> This is holding up trunk development, which was the original reason for filing
> this bug. If there's some way that we can build a later version of Cairo on the
> existing refplatform, then that's OK, but I understood this was blocking some
> important patches.

robcee:

I'm planning on deploying this for nightlies on Thursday, 5 July. There was a question about whether we need to coordinate the deployment together, and I think the answer technically is no, but a) I could be wrong, and b) it doesn't actually solve the problem until reftest is running on this version.

So, will you have time on Thursday to deploy the new image?
(Reporter)

Comment 16

11 years ago
Hey preed,

I will be around all day Thursday and can help set this up under your expert tutelage. I'm not sure if we'll want to replace the existing machine (qm-rhel02, scary thought as it's still running the master) or install either a new VM with the reference image or install it on the mac mini we had set aside for this. We'll have to discuss.

have a good 4th!
(Assignee)

Comment 17

11 years ago
(In reply to comment #16)

> new VM with the reference image or install it on the mac mini we had set aside
> for this. We'll have to discuss.

The quickest/easiest way to do this is run it in a VM. The unit test/ref test doesn't have performance requirements, does it?

I was going to make the switch tonight (just now, actually), but I ran into a couple of problems getting all the extra packages people wanted. I also don't want to make the switch until you're around (and for others reading the bug, to be clear, robcee did bug me about it today, but I was distracted, looking at a couple of other fires).

robcee: let's do this on Friday during the day; you gonna be around?
(Assignee)

Comment 18

11 years ago
Update: bug 387128 requests cloning the new VM for the reftest.

I'll be switching the nightly tinderbox over this afternoon; we'll see what happens.
(Assignee)

Updated

11 years ago
Depends on: 387128
(Assignee)

Updated

11 years ago
No longer depends on: 387128

Updated

11 years ago
Depends on: 387181
(Assignee)

Comment 19

11 years ago
Update:

We attempted to switch over to the new refplatform for nightly builds on friday; that went fine. However, when the performance testing machine (not unit test or ref test) tried to run the build, it failed to find a (pango) shared library.

So, I reverted us back to the older ref platform, since keeping us on the new refplatform would have meant no performance data.

Reed and rhelmer pulled some heroics on Friday to get them up and I just pointed the new test machine to the new ref-test builds; they're both reporting to MozillaExperimental:

http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaExperimental
(Reporter)

Updated

11 years ago
Depends on: 387676
(Reporter)

Comment 20

11 years ago
New machine's up and running, and reporting to:

http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox

just waiting to checkin configuration files.
Are there bugs filed on the failing tests on qm-centos5-01?  I didn't look at the unit/chrome tests, but I did check out the reftests yesterday, and 3/4 of them were due to font kerning.  The other one looked like it could have been rounding, it looked like a 1 pixel difference in the size of a rect.
(Reporter)

Comment 22

11 years ago
I blogged about them and roc saw it, does that count? ;)

I don't believe individual bugs have been filed yet. We should do that.
(Assignee)

Comment 23

11 years ago
So are we ready to make this switch again during Thursday's (12 July) nightly outage?

Or do we need to wait for something else?

I'm going to re-assign this bug to robcee, since I'm ready to go, to let him comment.

There are bugs and test results reporting into MozillaExperimental now (although, they're currently down due to a VMware migration; should be back up shortly, though):

http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaExperimental
(Assignee)

Updated

11 years ago
Assignee: preed → rcampbell
Status: ASSIGNED → NEW
(Reporter)

Comment 24

11 years ago
I believe we're ready to roll with this. I'll file individual bugs on the failures on qm-centos5-01 tomorrow if they haven't been filed already by then and send out pleas and bribes to try to get people looking at the errors.
(Reporter)

Updated

11 years ago
Assignee: rcampbell → preed

Updated

11 years ago
Depends on: 388054
(Assignee)

Comment 25

11 years ago
fxnewref-linux-tbox is reporting into the Firefox page, cycle times look good, test results look good, I've posted to m.d.planning, m.d.a.firefox, and m.d.platform about its existence.

Bug 387167 tracks renaming the machine to fx-linux-tbox; bug 388054 tracks an issue with spikes in the Tp/Ts graphs, which rhelmer is dealing with/has addressed.
Status: NEW → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.