384035 - Upgrade qa and build machines to reflect latest linux runtime requirements

Reporter

Description

•

17 years ago

Vlad attempted to land an upgrade to in-repository version of Cairo last night and witnessed some reftest errors due to platform incompatibility. We need to upgrade the machines to reflect the new platform requirements proposed in the url:

http://wiki.mozilla.org/Linux/Runtime_Requirements

Rob Campbell [:rc] (:robcee)

Reporter

Comment 1

•

17 years ago

Preed and Ben have raised good points about this hitting a number of extra machines. Perftest machines, for example. While it'll be fairly easy to upgrade qm-rhel02,3, the others might be a little trickier. There might be staging issues if we don't upgrade these at the same time.

Benjamin Smedberg

Comment 2

•

17 years ago

Some notes extracted from email and previous discussions:

We tried to replace argo-vm (an undocumented install currently production firefox trunk nightlies) with fx-linux-tbox (The current linux refplatform), but the builds produced by fx-linux-tbox were 5-10% slower than argo-vm when tested under identical conditions and identical configs.

I don't know how to diagnose this, so I was going to propose that we ignore it and switch to fx-linux-tbox anyway, because it's a known config and with --enable-libxul it's within a few % of argo-vm.

Now, as for switching to CentOS5 as a build refplatform: builds produced by CentOS5 will not run on CentOS4. This means that we would have to upgrade the performance test boxes to a newer runtime (I believe that the new perf farm is already running some modern version of Ubuntu, right?). If we decide to switch to CentOS5, we should switch the perf box first, and get some historical data. Then we can switch the build machine.

This doesn't really affect the unit-test boxes specifically, because they test their own builds; so you could theoretically upgrade them independently. But that means that our unit tests are testing an entirely different environment than our production builds, which isn't especially helpful.

Rob Campbell [:rc] (:robcee)

Reporter

Comment 3

•

17 years ago

(In reply to comment #2)
> Now, as for switching to CentOS5 as a build refplatform: builds produced by
> CentOS5 will not run on CentOS4. This means that we would have to upgrade the
> performance test boxes to a newer runtime (I believe that the new perf farm is
> already running some modern version of Ubuntu, right?). If we decide to switch
> to CentOS5, we should switch the perf box first, and get some historical data.
> Then we can switch the build machine.

Yup. The linux perf boxes (qm-plinux01-05) are running Ubuntu Feisty, iirc.

> This doesn't really affect the unit-test boxes specifically, because they test
> their own builds; so you could theoretically upgrade them independently. But
> that means that our unit tests are testing an entirely different environment
> than our production builds, which isn't especially helpful.

Right. Once nice option since we've got a new machine on dedicated hardware is that we can install CentOS5 on it and bring up a test box and run it alongside qm-rhel02 for a couple of days to compare results. This might give us some useful data before we upgrade the build machines.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

17 years ago

Blocks: 383960

Benjamin Smedberg

Comment 4

•

17 years ago

I would love to stop using the old perf machine setup and use the new qm-plinux boxes instead, but I'm concerned about when they're going to be ready. Would it be faster to commission a new tinderbox-based perftest machine?

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

17 years ago

Flags: blocking1.9+

Target Milestone: --- → mozilla1.9alpha6

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 5

•

17 years ago

from offline discussions: 

1) we've already moved from argo-vm (an undocumented install currently production
firefox trunk nightlies) to fx-linux-tbox (The current linux refplatform, same as linux refplatform rel4).

2) we need to create a new refplatform rel5. Assigning to preed, based on yesterday's build meeting.

3) we need to rollout this new refplatform rel5 out to build and QA machines to close this bug.

Assignee: nobody → build

Component: Build Config → Build & Release

Flags: blocking1.9+

Product: Core → mozilla.org

QA Contact: build-config → preed

Target Milestone: mozilla1.9alpha6 → ---

Version: Trunk → other

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

17 years ago

Assignee: build → preed

Nick Thomas [:nthomas] (UTC+12)

Comment 6

•

17 years ago

I noticed that fx-linux-tbox, from the current generation of the ref vm, didn't have TBOX_CLIENT_CVS_DIR set, so it wasn't updating its tinderbox code. This line needs to be added to  ~cltbld/.bash_profile 
  export TBOX_CLIENT_CVS_DIR="/builds/tinderbox/mozilla/tools"

Bug 384626 might change that a little.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

17 years ago

Blocks: 333126

J. Paul Reed [:preed]

Assignee

Comment 7

•

17 years ago

I'm actually working on this nowish, so resetting priority.

Status: NEW → ASSIGNED

Priority: -- → P1

J. Paul Reed [:preed]

Assignee

Comment 8

•

17 years ago

Attached patch tinder-config changes for fxnewref-linux-tbox — Details — Splinter Review

Attachment #269265 - Flags: review?(ccooper)

Chris Cooper [:coop] (he/him)

Updated

•

17 years ago

Attachment #269265 - Flags: review?(ccooper) → review+

J. Paul Reed [:preed]

Assignee

Comment 9

•

17 years ago

Alright, ref-vm is created. It does *NOT* (as of now) include the buildbot dependencies (I will add those on Monday).

It's reporting into the MozillaExperimental tinderbox page (http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaExperimental), and builds are being uploaded in the experimental directory under linux-newref.

We'll test this VM out for a few days, and then look at when it makes sense to switch it so it's the nightly Linux build machine for trunk (probably next week)?

bhearsum@mozilla.com (:bhearsum)

Updated

•

17 years ago

Blocks: 385911

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 10

•

17 years ago

It would probably be good for the reference build VM (if this is what we're going to be using for tinderboxes) to have some debuginfo packages, which help for generating stack traces and performance data when needed.  The ones I have installed on my machine are:

cairo-debuginfo
expat-debuginfo
fontconfig-debuginfo
freetype-debuginfo
glib2-debuginfo
glibc-debuginfo
gnome-vfs2-debuginfo
gtk2-debuginfo
hal-debuginfo
libX11-debuginfo
libXft-debuginfo
libgnome-debuginfo
pango-debuginfo
scim-bridge-debuginfo

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

17 years ago

No longer blocks: 385911

Andrew Schultz

Comment 11

•

17 years ago

gcc-debuginfo is also useful sometimes (it has symbols for libstdc++)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 12

•

17 years ago

Based on some trace-malloc stacks I took recently, the following also show up in some stacks:

dbus-debuginfo libselinux-debuginfo gtk2-engines-debuginfo ORBit2-debuginfo libXcursor-debuginfo libXext-debuginfo libXfixes-debuginfo libXi-debuginfo libXinerama-debuginfo libXrender-debuginfo atk-debuginfo libbonobo-debuginfo dbus-glib-debuginfo GConf2-debuginfo popt-debuginfo gcc-debuginfo

J. Paul Reed [:preed]

Assignee

Comment 13

•

17 years ago

Sorry for the bugspam; these are now P2 in the New View of the World (tm).

Priority: P1 → P2

Rob Campbell [:rc] (:robcee)

Reporter

Comment 14

•

17 years ago

This is holding up trunk development, which was the original reason for filing this bug. If there's some way that we can build a later version of Cairo on the existing refplatform, then that's OK, but I understood this was blocking some important patches.

Vlad, Dbaron?

J. Paul Reed [:preed]

Assignee

Comment 15

•

17 years ago

(In reply to comment #14)
> This is holding up trunk development, which was the original reason for filing
> this bug. If there's some way that we can build a later version of Cairo on the
> existing refplatform, then that's OK, but I understood this was blocking some
> important patches.

robcee:

I'm planning on deploying this for nightlies on Thursday, 5 July. There was a question about whether we need to coordinate the deployment together, and I think the answer technically is no, but a) I could be wrong, and b) it doesn't actually solve the problem until reftest is running on this version.

So, will you have time on Thursday to deploy the new image?

Rob Campbell [:rc] (:robcee)

Reporter

Comment 16

•

17 years ago

Hey preed,

I will be around all day Thursday and can help set this up under your expert tutelage. I'm not sure if we'll want to replace the existing machine (qm-rhel02, scary thought as it's still running the master) or install either a new VM with the reference image or install it on the mac mini we had set aside for this. We'll have to discuss.

have a good 4th!

J. Paul Reed [:preed]

Assignee

Comment 17

•

17 years ago

(In reply to comment #16)

> new VM with the reference image or install it on the mac mini we had set aside
> for this. We'll have to discuss.

The quickest/easiest way to do this is run it in a VM. The unit test/ref test doesn't have performance requirements, does it?

I was going to make the switch tonight (just now, actually), but I ran into a couple of problems getting all the extra packages people wanted. I also don't want to make the switch until you're around (and for others reading the bug, to be clear, robcee did bug me about it today, but I was distracted, looking at a couple of other fires).

robcee: let's do this on Friday during the day; you gonna be around?

J. Paul Reed [:preed]

Assignee

Comment 18

•

17 years ago

Update: bug 387128 requests cloning the new VM for the reftest.

I'll be switching the nightly tinderbox over this afternoon; we'll see what happens.

Adam Guthrie

Updated

•

17 years ago

Depends on: 387181

J. Paul Reed [:preed]

Assignee

Comment 19

•

17 years ago

Update:

We attempted to switch over to the new refplatform for nightly builds on friday; that went fine. However, when the performance testing machine (not unit test or ref test) tried to run the build, it failed to find a (pango) shared library.

So, I reverted us back to the older ref platform, since keeping us on the new refplatform would have meant no performance data.

Reed and rhelmer pulled some heroics on Friday to get them up and I just pointed the new test machine to the new ref-test builds; they're both reporting to MozillaExperimental:

http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaExperimental

Rob Campbell [:rc] (:robcee)

Reporter

Updated

•

17 years ago

Depends on: 387676

Rob Campbell [:rc] (:robcee)

Reporter

Comment 20

•

17 years ago

New machine's up and running, and reporting to:

http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox

just waiting to checkin configuration files.

(not currently active) Ted Mielczarek

Comment 21

•

17 years ago

Are there bugs filed on the failing tests on qm-centos5-01?  I didn't look at the unit/chrome tests, but I did check out the reftests yesterday, and 3/4 of them were due to font kerning.  The other one looked like it could have been rounding, it looked like a 1 pixel difference in the size of a rect.

Rob Campbell [:rc] (:robcee)

Reporter

Comment 22

•

17 years ago

I blogged about them and roc saw it, does that count? ;)

I don't believe individual bugs have been filed yet. We should do that.

J. Paul Reed [:preed]

Assignee

Comment 23

•

17 years ago

So are we ready to make this switch again during Thursday's (12 July) nightly outage?

Or do we need to wait for something else?

I'm going to re-assign this bug to robcee, since I'm ready to go, to let him comment.

There are bugs and test results reporting into MozillaExperimental now (although, they're currently down due to a VMware migration; should be back up shortly, though):

http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaExperimental

J. Paul Reed [:preed]

Assignee

Updated

•

17 years ago

Assignee: preed → rcampbell

Status: ASSIGNED → NEW

Rob Campbell [:rc] (:robcee)

Reporter

Comment 24

•

17 years ago

I believe we're ready to roll with this. I'll file individual bugs on the failures on qm-centos5-01 tomorrow if they haven't been filed already by then and send out pleas and bribes to try to get people looking at the errors.

Rob Campbell [:rc] (:robcee)

Reporter

Updated

•

17 years ago

Assignee: rcampbell → preed

Adam Guthrie

Updated

•

17 years ago

Depends on: 388054

J. Paul Reed [:preed]

Assignee

Comment 25

•

17 years ago

fxnewref-linux-tbox is reporting into the Firefox page, cycle times look good, test results look good, I've posted to m.d.planning, m.d.a.firefox, and m.d.platform about its existence.

Bug 387167 tracks renaming the machine to fx-linux-tbox; bug 388054 tracks an issue with spikes in the Tp/Ts graphs, which rhelmer is dealing with/has addressed.

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → Release Engineering