cerberus build machine isn't well (very slow)

RESOLVED FIXED

Status

P3
normal
RESOLVED FIXED
12 years ago
5 years ago

People

(Reporter: harald.langhammer, Assigned: nthomas)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

12 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.4pre) Gecko/20070415 Firefox/2.0.0.4
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.4pre) Gecko/20070415 Firefox/2.0.0.4pre

For about a week now the "cerberus" (cerberus-vm?) build machine vanished from tinderbox
http://tinderbox.mozilla.org/showbuilds.cgi?tree=Mozilla1.8-l10n-de

As a result, localized win32 fx2.0 branch nightlies aren't built since April 15 any more, linux and mac builds are fine as seen at the URL given above


Reproducible: Always

Steps to Reproduce:
1. Open Tinderbox http://tinderbox.mozilla.org/showbuilds.cgi?tree=Mozilla1.8-l10n-de
or
2. Open target FTP folder ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla1.8-l10n/

Actual Results:  
Cerberus not present on tinderbox / No win32 builds since April 15 at target FTP folder

Expected Results:  
Cerberus building on tinderbox / New nightlies every day

Comment 1

12 years ago
I didn't find anything obvious on the console - box was setting with a cygwin shell opened in:

cltbld@cerberus-vm /builds/repacks/firefox-1.5.0.11-rc1-google

Killed off a couple makes and re-started tinderbox.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → FIXED
Assignee: server-ops → mrz
OS: Windows XP → All
Hardware: PC → All
(Reporter)

Comment 2

12 years ago
Same happened again
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
There was a "corrupted stack" error in the cygwin terminal (which is a new one to me). I opened a new terminal and restarted tinderbox.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 12 years ago12 years ago
Resolution: --- → FIXED
(Reporter)

Comment 4

12 years ago
Same happened again
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
(Assignee)

Comment 5

12 years ago
This box is currently having problems with the file system on its hard disk. I meant to log/reopen a bug on that yesterday, apologies.
Assignee: mrz → nrthomas
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P1

Comment 6

12 years ago
Sorry for the bugspam; these are now P2 in the New View of the World (tm).
Priority: P1 → P2
(Assignee)

Comment 7

12 years ago
Fixed three file system errors using chkdsk. Tinderbox restarted, its doing Mozilla1.8 branch now. 

Will leave this open for a day or so, to remind me to keep an eye on it.

Updated

12 years ago
Component: Server Operations: Tinderbox Maintenance → Build & Release
(Assignee)

Comment 8

12 years ago
Got as far as the Korean locale, then
  update-packaging/common.sh: fork: Resource temporarily unavailable

Tinderbox restarted.
(Assignee)

Comment 9

12 years ago
I updated VMware tools from build 32039 to 38803 (which matches boxes like fx-win32-tbox), which bumped the driver "VMware SCSI Controller" from v1.2.0.2 (1999-11-14) to v1.2.0.4 (2005-08-17). It seems snappier, but lets see how it goes.
(Assignee)

Comment 10

12 years ago
This time the Firefox/Mozilla1.8/l10n run finished, taking 8h 15m. The previous successful nightly run was on June 29th, and took 8h 35m. The machine had been rebooted for comment #9.

Then the Thunderbird/Mozilla1.8/l10n run started, getting as far as the complete update for cs before bombing out

 processing xpistub.dll
 /cygdrive/c/builds/tinderbox/Tb-Mozilla1.8-l10n/WINNT_5.2_Clobber/mozilla/tools/update-packaging/common.sh: fork: Resource temporarily unavailable
/cygdrive/c/builds/tinderbox/Tb-Mozilla1.8-l10n/WINNT_5.2_Clobber/mozilla/tools/update-packaging/common.sh: line 81: [: =: unary operator expected
 ignoring remove instruction for directory: components/mork.dll
 /cygdrive/c/builds/tinderbox/Tb-Mozilla1.8-l10n/WINNT_5.2_Clobber/mozilla/tools/update-packaging/common.sh:  fork: Resource temporarily unavailable

The call is to make_full_update.sh, which uses functions in common.sh
http://mxr.mozilla.org/mozilla1.8/source/tools/update-packaging/make_full_update.sh
http://mxr.mozilla.org/mozilla1.8/source/tools/update-packaging/common.sh

xpistub.dll is the last of the files to go into the mar, so we get to
 70 # Append remove instructions for any dead files.
 71 append_remove_instructions "$targetdir" >> $manifest

There are a bunch of shell calls and pipes between lines 71 and 88 of common.sh, which could cause the error.
(Assignee)

Comment 11

12 years ago
Fragmentation report:
...
NTFS, 4KB clusters, 68 GB total space, 31 % free space
...
Volume fragmentation
    Total fragmentation             39 %
    File fragmentation              79 %
    Free space fragmentation         0 %

File fragmentaion
    Total files                     1,072,595
    Average file size               72 KB
    Total fragmented files          135,420
    Total excess fragments          339,687
    Average fragments per file      1.31
...
Pagefile not fragmented
...
Folder fragmentaion
    Total folders                   184,451
    Fragmented folders              9,090
    Excess folder fragments         31,066

Running defragger ....

If this doesn't help, then we can look at the SCSI driver (some win32 VMs use a different one), or will need a fix from bug 386074.
(Assignee)

Comment 12

12 years ago
Still defragging, at 17% on some arbitrarily non-linear scale.
(Assignee)

Comment 13

12 years ago
Defrag done, couldn't manage to fix ~ 2500 files but still a big improvement. Tinderbox restarted.

(Assignee)

Comment 14

12 years ago
Got as far as Korean before the fork error occurred. I'll turn off the Mozilla1.8 locales to try get some trunk coverage.

Next step is to try the SCSI driver change, using a clone of cerberus-vm. 

Updated

12 years ago
Blocks: 386074
(Assignee)

Comment 15

11 years ago
Not to jinx it, but cerberus-vm has been solid for the last three days. The shorter trunk runs might be factor.

I'm hoping to clone this VM today, so we can try the other SCSI driver.
(Assignee)

Comment 16

11 years ago
mrz set up cerberus-vm-clone, which is what is says on the tin. It's now doing the Firefox & Thunderbird locales for Mozilla1.8 branch, using the SCSI driver "LSI Logic PCI-X Ultra320", v5.2.3790.1830 from Microsoft. cerberus-vm continues to do the trunk locales.

The procedure was:
• Power off the VM you want to change controllers on
• Connect to the Service Console and edit the vmx file for the VM
• Add the following lines to the vmx file
o scsi1.present = "true"
o scsi1.virtualDev = "lsilogic"
• Power on the VM and it will discover the new SCSI card
• Power off the VM and edit the SCSI Controller settings, change the type to LSI Logic
• Power VM back on, answer Yes for the adapter change message
• Once it boots successfully shut the VM down again (it will have two LSI controllers at this point)
• Edit the vmx file and remove the lines you added above
• Power on the VM again and you will be all set

I'll check how it's doing tomorrow, and if possible compare build times.
(Assignee)

Comment 17

11 years ago
The clone only got as far as da before having the "fork: resource not available problem" in update-packaging/common.sh. Shell verbosity turned on, running again.
(Assignee)

Comment 18

11 years ago
(In reply to comment #17)
> The clone only got as far as da before having the "fork: resource not available
> problem" in update-packaging/common.sh. Shell verbosity turned on, running
> again.

I also jiggered the last-built file so that the Fx l10n run was an hourly. The resulting run finished ok, although it took 4hrs 20 min. I can't see any record of update preparation in either the full log or those for individual locales. I backed these logs up in 
 /cygdrive/c/builds/tinderbox/Fx-Mozilla1.8-l10n/WINNT_5_2_Clobber/...
     old-logs/20070712-hourly-build
Also,
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla1.8-l10n/1184232217.7885.gz&fulltext=1


For the Thunderbird nightly run, I manually added "set -x" to the top of
  mozilla/tools/update-packaging/{common.sh,make_full_update.sh} 
when the mozilla/ checkout finished. The full run completed, which I wasn't expecting, taking 5h 24m. The logs were backed up to  
 /cygdrive/c/builds/tinderbox/Tb-Mozilla1.8-l10n/WINNT_5_2_Clobber/...
     old-logs/20070712-nightly-build
Also,
http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla1.8-l10n/1184247705.21996.gz&fulltext=1

There was update output this time, including some fork errors (warnings ?!?). Eg

+ (( i=103+1 ))
+ (( 104<219 ))
/cygdrive/c/builds/tinderbox/Tb-Mozilla1.8-l10n/WINNT_5.2_Clobber/mozilla/tools/update-packaging/common.sh: fork: Resource temporarily unavailable
+ f=
++ echo components/msgdb.xpt
+ '[' -n '' ']'
+ (( i=104+1 ))
+ (( 105<219 ))
++ echo components/msgimap.xpt
++ tr '|' ' '
++ sed 's/^ *\(.*\) *$/\1/'
++ tr -d '\r'
+ f=components/msgimap.xpt
+ '[' -n components/msgimap.xpt ']'
++ echo components/msgimap.xpt
++ grep -c '\/$'
+ '[' 0 = 0 ']'
+ echo 'remove "components/msgimap.xpt"'

This is one broken and one working trip around the for loop at 
  http://mxr.mozilla.org/mozilla1.8/source/tools/update-packaging/common.sh#75
I think the failure to fork must be for the $() at line 77, which points the finger at Cygwin rather than disk problems. In total there are 7 fork errors in the run: 3 at line 77, 2 at line 81, one at line 38 (make_add_instruction), and one more I didn't identify.


##<blink>###################################################################
#                                                                          #        
#  The net result is that the manifest for locale mars is not trustworthy, #
#  unless we've verified there are no fork errors in the log.              #
#                                                                          #        
############################################################################

Random guess - it's an SMP or HyperThreading issue on ESX3, but I have nothing to back that up.

Then we're back to building Firefox/Mozilla1.8. I've removed last-built to force a nightly, and will add the "set -x" again. It will be interesting to see if this run also completes. Based on cerberus-vm not managing it the ~ 5 times I tried, that seems unlikely but maybe the -x slows things down enough to make it not die completely. The symptom of dying is bash consuming all of a CPU and the build making no progress.

Finally, to summarize a little
* cerberus-vm is set to do Trunk builds only, and has done so without dying since July 6th. It's unknown why this is, although there are less locales that build on the trunk at the moment.

* cerberus-vm-clone is set to do Mozilla1.8 builds only, and is using the LSI SCSI driver. It's still very slow and has this forking problem.
Assignee: nrthomas → preed
(Assignee)

Comment 19

11 years ago
> * cerberus-vm is set to do Trunk builds only, and has done so without dying
> since July 6th. It's unknown why this is, although there are less locales that
> build on the trunk at the moment.

There are quite a few differences between trunk and mozilla1.8 in tools/update-packaging, though nothing in the way of shell calls.
(In reply to comment #18)

> Finally, to summarize a little
> * cerberus-vm is set to do Trunk builds only, and has done so without dying
> since July 6th. It's unknown why this is, although there are less locales that
> build on the trunk at the moment.
> 
> * cerberus-vm-clone is set to do Mozilla1.8 builds only, and is using the LSI
> SCSI driver. It's still very slow and has this forking problem.

Given this, I'm going to to use cerberus-vm for the 2.0.0.5 release.
Duplicate of this bug: 388793
So, some notes on this:

-- We got cerberus-vm feeling... ok again by reducing its CPUs from 2 back to 1.

-- cerberus-vm is the original VM; we were gonna dump the clone. It has the original SCSI driver settings, too...
Status: NEW → RESOLVED
Last Resolved: 12 years ago11 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 390340
(Assignee)

Comment 23

11 years ago
cerberus-vm has been getting slower and slower, the most recent nightly runs for Firefox & Thunderbird took 10 hours and 6 hrs 10 mins respectively (on the 25th/26th Sep), since then it's been too slow to complete the cvs checkout within the 1 hour timeout. 

Reopening to try to sort his out (390340 only helps trunk).
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
(Assignee)

Comment 24

11 years ago
I'm relocating the VM files from the netapp-d-002 partiton to netapp-c-001 to see if that helps.
Assignee: preed → nrthomas
Status: REOPENED → NEW
(Assignee)

Comment 25

11 years ago
Nightly runs took 8hr 12 min and 5hr 35min for Fx and Tb respectively, so that's better but still too slow.

mrz, do you have any advice from when you looked at patrocles a while back ?
Summary: cerberus build machine (fx 2.0 branch localized nightlies) dropped off tinderbox → cerberus build machine isn't well (very slow)
Some things to try:

-- I note some weird VM settings; for instance, the vmdk doesn't have independent mode off (from what I can tell), and it doesn't have "persistent" selected. (These may come up when the VM is powered off, but for now, they look like they're unchecked). Also, I'd try removing/disconnecting the USB device and the serial devices. We don't really need them. Are we sure we have the current version of the tools installed?

-- Of very interesting note, this looks like it has a BusLogic controller. Have we tried the LSI Logic controller with no other changes (still 1 VCPU, etc.)? I know we tried the LSI Logic controller before, but as I remember, we also changed a bunch of other settings at the same time, correct?

I think we should probably clone cerberus-vm via VI, and then only modify it via the client, not by tweaking vmx files, and see if we can get performance to be a bit more acceptable.

Comment 27

11 years ago
We can also try moving it to fiberchannel disk to eliminate the "netapp-is-slow" issue (which is because the sata disks are just overloaded).  Note, that will just give you a best-case disk access time as the fc shelf is just on loan.  Just another point of reference if it's helpful.
(In reply to comment #27)
> We can also try moving it to fiberchannel disk to eliminate the
> "netapp-is-slow" issue (which is because the sata disks are just overloaded). 
> Note, that will just give you a best-case disk access time as the fc shelf is
> just on loan.  Just another point of reference if it's helpful.

We could give that a try, but I don't think that's the problem; I noticed that during a nightly build run, the first few locales run in a reasonable amount of time (10ish minutes), and that time gets progressively worse with each locale.

In my mind, that's points to something wrong with the machine configuration or software, not a hardware problem.
(Assignee)

Comment 29

11 years ago
* When the VM is shutdown, the virtual disk has Independent off, and Persistent/Nonpersistent is undefined (greyed out with no sign of radio selection). Paul, did you mean that Independent should be enabled ?

* On the console at boot, there is a message 
  "At least one service or driver failed during system startup. Use Event Viewer
   to examine the log for details". 
* The System log has 
  "The cpqasm2 service failed to start due to the following error:
   The service cannot be started, either because it is disabled or because it
   has no enabled devices associated with it."
* This then prevents the "HP ProLiant System Management Interface Driver" from loading, and in turn the "HP ProLiant Remote Manageement Service" & "HP ProLiant  System Shutdown Service"
* In the Service Manager, there are no disabled HP services. It's pretty tempting to uninstall this stuff.

* There is also an error about the "VMware Converter Service" failing to start. Ditto on the uninstall comment.

* Installed VMWare Tools is build-38803, I updated it July 4th (comment #9). Did we update ESX since then ? For comparison, I have v7.3.2 build-51348 in a XP VM running in Fusion. Can test updating the tools after the cloning.

* this was the CPU usage trend while deleting the old source tree for a nightly:
    http://people.mozilla.com/~nthomas/misc/cerberus-vm.png
  Times are PDT+8 - not sure what that tells us, the command was rm.

Starting a clone now, so we can try the suggestions.

Comment 30

11 years ago
>   "At least one service or driver failed during system startup. Use Event
> Viewer
>    to examine the log for details". 
> * The System log has 
>   "The cpqasm2 service failed to start due to the following error:

That's a relic of when cerberus was on a physical HP box.  You could remove them I guess but their failure to start shouldn't affect anything.

> * There is also an error about the "VMware Converter Service" failing to start.
> Ditto on the uninstall comment.
> 

Probably only necessary during the original p2v/VMware Converter step.

> * Installed VMWare Tools is build-38803, I updated it July 4th (comment #9).
> Did we update ESX since then ? For comparison, I have v7.3.2 build-51348 in a
> XP VM running in Fusion. Can test updating the tools after the cloning.

No, the ESX hosts have all been the same since we migrated from 2.x to 3.0.1 and I upgraded all the tools packages during that round.

(Assignee)

Comment 31

11 years ago
cerberus-vm-clone is up, with

* Disk - Independent: off, Persistent. I don't think this was a real change, although the VI said the device was modified simply by toggling Independent on and off.
* The two serial ports and USB controller were removed

I'll start it up and let it cycle for a bit, cerberus-vm is off.
(Assignee)

Comment 32

11 years ago
(In reply to comment #31)
> cerberus-vm-clone is up, with
> 
> * Disk - Independent: off, Persistent. I don't think this was a real change,
> although the VI said the device was modified simply by toggling Independent on
> and off.
> * The two serial ports and USB controller were removed
> 
> I'll start it up and let it cycle for a bit, cerberus-vm is off.

Fx nightly run took 8hr 21min, so no improvment there.

Restarting box with Independent: on, Persistent.

(Assignee)

Comment 33

11 years ago
(In reply to comment #32)
> Restarting box with Independent: on, Persistent.

Got two Firefox nightly runs, 8hr 21min & 8hr 49min; plus a Thunderbird nightly run of 6hr. So this change was no help either.

Going to try switching from the BusLogic SCSI driver to the LSI one, and also clone cerberus-vm-clone onto the fiberchannel netapp.

(Assignee)

Comment 34

11 years ago
(In reply to comment #33)
> Going to try switching from the BusLogic SCSI driver to the LSI one, and also
> clone cerberus-vm-clone onto the fiberchannel netapp.

This brought the build time down to about 5 hours for a Firefox nightly, woo! I accidentally killed tinderbox while it was copying files right at the end, so it didn't go green. This included a 30 minute CVS checkout, which was about 50 minutes previously.

I've now removed all the HP crud left over from when this had a real disk array, plus the VMware converter, VNC, and freed up a few GB of old builds.  It's building another nightly now.

Comment 35

11 years ago
Just out of curiosity, what kind of build time is acceptable or are we aiming for?
(Assignee)

Comment 36

11 years ago
(In reply to comment #35)
> Just out of curiosity, what kind of build time is acceptable or are we aiming
> for?

I'm working on generating a trend graph for this box, right back to when it was on real hardware, which would help answer this question. As we've moved towards the release automation this box is more a backup, and 4 hours or less would probably be OK. 

Those localisers working to get 2.0.0.x locales ready would probably appreciate something faster than that. How many are in that situation Axel ?
(In reply to comment #35)
> Just out of curiosity, what kind of build time is acceptable or are we aiming
> for?

staging-pacifica-vm and production-pacifica-vm consistently take about 2.5 hours to do a branch l10n clobber build (same exact setup as cerberus-vm). 

They are both VMs (clones of pacifica-vm).
(In reply to comment #37)
> (In reply to comment #35)
> > Just out of curiosity, what kind of build time is acceptable or are we aiming
> > for?
> 
> staging-pacifica-vm and production-pacifica-vm consistently take about 2.5
> hours to do a branch l10n clobber build (same exact setup as cerberus-vm). 
> 
> They are both VMs (clones of pacifica-vm).
> 

Sorry I should qualify that "same exact setup" statement - "same exact Tinderbox config and version". Other things on the machine are quite likely different (versions of compiler and other tools).
cf: What are the build times for each locale after this switch?

I'm actually curious about trend, mostly; I noticed when build times were long that each locale got progressively slower to build (with af taking 10 minutes and zh-TW taking an hour+).

This doesn't seem right to me, and I'm wondering if we're still seeing that with the new setup.
(Assignee)

Comment 40

11 years ago
(In reply to comment #34)
> I've now removed all the HP crud left over from when this had a real disk
> array, plus the VMware converter, VNC, and freed up a few GB of old builds. 
> It's building another nightly now.

Firefox hourly/nightly:  2:38 / 4:56   (57 locales)
T'bird  hourly/nightly:  2:12 / 3:27   (42 locales)

So no further improvement there.

(In reply to comment #39)
> cf: What are the build times for each locale after this switch?
> 
> I'm actually curious about trend, mostly; I noticed when build times were long
> that each locale got progressively slower to build (with af taking 10 minutes
> and zh-TW taking an hour+).
> 
> This doesn't seem right to me, and I'm wondering if we're still seeing that
> with the new setup.

We are still seeing this, turns out it's the way tinderbox is written for l10n -  all the locales have the same start time:
http://mxr.mozilla.org/seamonkey/source/tools/tinderbox/post-mozilla-rel.pl#842

So the boxes on the waterfall get longer the further down the list of locales you get (this kinda makes sense if you want to compare to a CVS timestamp, but the time used is actually later than that). It's not so obvious on other boxes because they cycle much faster.

Here's a break down of the Firefox nightly run:

Run starts       - 1230
  [deletion of old source tree and other cleanup]
checkout start   - 1249       ( 19 min)
checkout end     - 1314       ( 25 min)
  [configure, build tools]
l10n starts      - 1321       (  7 min)
last locale ends - 1704       (223 min, avg of 3min 54sec each)
  [local copies and push to stage]
Run ends         - 1726       ( 22 min, 750MB of data to handle)
(Assignee)

Comment 41

11 years ago
Created attachment 284333 [details]
trend graph for build time 

Here's the promised trend of build time for cerberus, with comparison to 2 other  windows boxes (patrcoles & pacifica), going back to pre-virtualisation days. The times were extracted from tinderbox's JSON output with a perl script, and there is a bunch more data than shown if anyone wants it.

Of them all pacifica has not changed very much at all, patrocles was unwell (how was that fixed again ?), and cerberus has been pretty spectacularly broken in recent times. While the SCSI driver change has helped a lot it's clear that more can be done, so I'm going to run scandisk, turn off the Indexing Service, clean out some more old files and defrag the disk.

References for the labels:
"cerberus virtualisation?" - from change of name on tinderbox
"Down for a week" - comment #0 and 1
"Move to netapp" - https://intranet.mozilla.org/Build:Vmware:VIMigrationNotes#Round_3
"Use 1 CPU instead of 2" - comment #22
"LUN fix??" - can't locate any firm date from this from email, helpwanted
"Use LSI SCSI driver on a clone" - comment #34
QA Contact: justin → build
(Assignee)

Comment 42

11 years ago
There was small improvement (~10min) after three runs with the builtin defragger (this compacted free space as well as defragging the files). And the build times are pretty constant over the last fortnight - it's about 4hr45min for Fx nightly, and 3hr for Thunderbird.

mrz, any objections to me moving cerberus-vm onto the fcal partition for a few test runs ?

Comment 43

11 years ago
Is thi going to close the tree? If it will I want to try this online copy tool instead. 
(Assignee)

Comment 44

11 years ago
(In reply to comment #43)
> Is thi going to close the tree? If it will I want to try this online copy tool
> instead. 

No, it's a l10n box on the 1.8 branch - a couple of hours downtime is no problem. If you want to try VMotion then that's cool too.

I meant cerberus-vm-clone in commment #42.

Comment 45

11 years ago
This was done yesterday, btw. 
(Assignee)

Comment 46

11 years ago
The resulting build times were 

Firefox hourly/nightly:  1:20 / 3:05   (57 locales)
T'bird  hourly/nightly:  1:00 / 1:52   (42 locales)

on an otherwise unloaded fibre channel netapp, and unloaded VM host (same hardware). Compared to the loaded netapp and vmhost (comment #40), it's about a 50% less time for hourlies, and 46% less for the Tb nightly, and 37% less for the Firefox nightly. So that would be a nice win to have permanently, if we can.

I've switched back to the loaded VM now that the test is complete. Anyone else have any ideas ? Otherwise I think we're out of steam here and will have to live with it.

Comment 47

11 years ago
(In reply to comment #46)
> The resulting build times were 
> 
> Firefox hourly/nightly:  1:20 / 3:05   (57 locales)
> T'bird  hourly/nightly:  1:00 / 1:52   (42 locales)
> 
> on an otherwise unloaded fibre channel netapp, and unloaded VM host (same
> hardware). Compared to the loaded netapp and vmhost (comment #40), it's about a
> 50% less time for hourlies, and 46% less for the Tb nightly, and 37% less for
> the Firefox nightly. So that would be a nice win to have permanently, if we
> can.

You should go back to the FCAL image and vmotion it back to a loaded ESX host to see what it looks like on an un-busy ESX box.  In my test, the sweet spot was at 5 running VMs - more than 5 and build times increased.

Ping me if you need my help in doing this.
(Assignee)

Comment 48

11 years ago
(In reply to comment #47)
> You should go back to the FCAL image and vmotion it back to a loaded ESX host
> to see what it looks like on an un-busy ESX box.  In my test, the sweet spot
> was at 5 running VMs - more than 5 and build times increased.

Worth a go, although this particular VM host (bm-vmware08) is relatively unloaded. The resident VM's are associated with our build automation and only compile intermittently. 

It's just started cycling in this setup (unloaded FCAL, "loaded" host).
(Assignee)

Comment 49

11 years ago
(In reply to comment #48)
> It's just started cycling in this setup (unloaded FCAL, "loaded" host).

Build times are a little slower:

Firefox hourly/nightly:  1:30 / 3:30
T'bird  hourly/nightly:  1:10 / 2:10

If we are getting new hardware which will have similar performance to the borrowed netapp then I don't think it's worth putting any more work into this. This box does nightly builds on a maintenance branch, and we now do releases on another machine. For locales that are still coming on for 2.0.0.x, it cycles frequently enough IMHO. I suspect most localisers are going to be concentrating on Firefox 3 now. 

If there are no objections, I'll resolve this FIXED.
Priority: P2 → P3

Comment 50

11 years ago
Fine with me.
(Assignee)

Updated

11 years ago
Status: NEW → RESOLVED
Last Resolved: 11 years ago11 years ago
Resolution: --- → FIXED
(Assignee)

Comment 51

11 years ago
This VM was moved back to the regular Netapp today, host unchanged.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.