Closed Bug 414434 Opened 17 years ago Closed 17 years ago

can Build get another 4 xserves?

Categories

(mozilla.org Graveyard :: Server Operations, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: phong)

References

Details

Attachments

(1 file)

With bm-xserve09 under repair we're left with no free xserves. Right now, I'm looking for something to use as a moz2 build slave, and possibly another try server slave.
Assignee: server-ops → justin
We certainly do need xserves for trunk, our only spare (bm-xserve07) is in use while bm-xserve09 is broken (see bug#bug#410271 for details.). Two more xserves would be good for trunk. For the moz2 builds, we do need hardware also - did we decide to go with xserves or mac minis?
Summary: can Build get another xserve or two? → can Build get another 2 or 3 xserves?
If these are mission critical (i.e. close the tree if one of these goes down), then they should be xserves - mini's are not production quality hardware. These are fairly expensive, copying schrep...
For Moz2 we'll need a ton of mac build machines - I'd like to get to 3 per platform per branch (for fast cycle times) and use multiple machines for redundancy. If you need 1-2 xserves before that than it is fine.
After a quick meeting with Justin, rhelmer and myself, we're going to try an experiment with a mac mini as a trunk/1.9 slave. This will confirm how long it takes a mac mini to do full clobber builds (compared with xserve), and also if the bits produced are compatible.
What does "per branch" mean? We're going to have many "branches", but those which aren't mozilla-central probably won't need really fast response times. I really think that we should be looking at 10-ish cheap build slaves per OS by the time we're fully up.
(In reply to comment #5) > What does "per branch" mean? We're going to have many "branches", but those > which aren't mozilla-central probably won't need really fast response times. I > really think that we should be looking at 10-ish cheap build slaves per OS by > the time we're fully up. > Per branch means any significant development branch.
(In reply to comment #6) > (In reply to comment #5) > > What does "per branch" mean? We're going to have many "branches", but those > > which aren't mozilla-central probably won't need really fast response times. I > > really think that we should be looking at 10-ish cheap build slaves per OS by > > the time we're fully up. > > > > Per branch means any significant development branch. > Do we need to do this statically? If we have a large pool of slaves, then the resources will automatically go where they are needed most. Not trying to rathole, just want as few pools as possible..
(In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #5) > > > What does "per branch" mean? We're going to have many "branches", but those > > > which aren't mozilla-central probably won't need really fast response times. I > > > really think that we should be looking at 10-ish cheap build slaves per OS by > > > the time we're fully up. > > > > > > > Per branch means any significant development branch. > > > > Do we need to do this statically? If we have a large pool of slaves, then the > resources will automatically go where they are needed most. > > Not trying to rathole, just want as few pools as possible.. Agreed, having fewer bigger pools is better, in terms of resources being shared where they are needed the most, as well as overall reliability, etc. However we have to break them out by refplatform. We also (at least for now) have to break them out by what buildmaster they are connected to, because buildbot doesnt support a slave being allocated to different masters - but this will improve as we consolidate buildmasters.
(In reply to comment #4) > After a quick meeting with Justin, rhelmer and myself, we're going to try an > experiment with a mac mini as a trunk/1.9 slave. This will confirm how long it > takes a mac mini to do full clobber builds (compared with xserve), and also if > the bits produced are compatible. The only thing about the minis is that we're limited as far as total amount of RAM and disk speed (they only take laptop drives). Assuming that that our build process is really badly I/O bound, if we want to make our builds faster by e.g. doing a ramdisk or hooking them up to fast storage, that option is out. Not that we're doing this now with the xserves, but it's possible. We could certainly try it, although maybe this is all just working around a problem in the build system that we'd be better off fixing and investigating directly.
Yeah, I said "10" as a "3 for mozilla-central and 7 for the other branches" kind of number. The I/O issues are most noticable on Windows... I don't think we need to drive mac hardware decisions around it nearly as much.
> However we have to break them out by refplatform. We also (at least for now) > have to break them out by what buildmaster they are connected to, because > buildbot doesnt support a slave being allocated to different masters - but this > will improve as we consolidate buildmasters. > The Mozilla2 master will be handling mozilla-central + all of the branches; I don't see any need to separate them, doing so will just add pain. I don't think the ref platform will be an issue either; it seems to me that they would all be using the same one. To start with, I think the way to go here is with one pool of slaves for all branches. This should reduce idle time. If we feel a branch needs more slaves at some point we can easily dedicate X number of slaves to it.
(In reply to comment #4) > After a quick meeting with Justin, rhelmer and myself, we're going to try an > experiment with a mac mini as a trunk/1.9 slave. Box is up at 10.2.71.240 .
(In reply to comment #12) > (In reply to comment #4) > > After a quick meeting with Justin, rhelmer and myself, we're going to try an > > experiment with a mac mini as a trunk/1.9 slave. > > Box is up at 10.2.71.240 . Great, thanks. I'll go ahead and plug it into staging, and we can see how long a full release set takes.
I installed the following ref platforms on the mini: http://wiki.mozilla.org/ReferencePlatforms/Mac http://wiki.mozilla.org/ReferencePlatforms/BuildBot/MacOSX Attached is a simple clobber build sequence, I'm just going to run this for a while on both slaves and compare the results. It'd be nice to do a full release automation run too (build+repack+l10n/update verify), but that'll take some more time to set up, and I think full clobber build will be a pretty good metric for starters.
Preliminary results: mini clobber 48s checkout 115s compile 4490s (1h15m) xserve clobber 34s checkout 86s compile 2323s (39m) As expected, the mini is slower on i/o operations, and quite a bit slower on compile time. The specs are quite a bit different: xserve 2 x 2.66 GHz Dual-Core Intel Xeon 4 GB 667 MHz DDR2 FB-DIMM Server RAID (RAID 5?) mini 1.83 GHz Intel Core 2 Duo 1 GB 667 MHz DDR2 SDRAM (guessing SATA 5400 RPM from spec sheet) According to http://www.apple.com/macmini/specs.html the minis can be had with at most 1 2 GHz core2duo and are upgradable to 2 GB RAM. Adding more memory might help us out here (I think we're probably swapping if it's only got 1 GB) but I'm not sure how much of a difference it'll make. Can we upgrade the RAM on this and find out?
nope - these are sealed from apple, and can't upgrade. We tried with a 2gb model back in the vmware tests and saw similar numbers - over 2x the time of a xserve, so I think we should assume 2x increase in time (closer to 2x with the added memory). Given that, does 2x the time but 5-10x the machines make more sense or fewer, faster more redundant machines?
Do we have a call here? I know the builds on the mini have been run and don't want you to be held up on hardware...
Assignee: justin → rhelmer
If a few xservers help us in the short term let's just get them
Even though the xserves are more expensive, I think we need to go with xserves now; the time differences in comment#15 are too large to ignore. Rechecking what else is pending, here's what I think we need right now: 1 - 2nd mac slave for 1.8 branch 1 - 2nd mac slave for 1.9 branch 2 - 1st & 2nd mac slaves for moz2 branch 1 - 2nd mac slave for try server 1 - replacement for the dead spare
Summary: can Build get another 2 or 3 xserves? → can Build get another 6 xserves?
Assignee: rhelmer → joduinn
(In reply to comment #19) > Even though the xserves are more expensive, I think we need to go with xserves > now; the time differences in comment#15 are too large to ignore. Rechecking > what else is pending, here's what I think we need right now: > > 1 - 2nd mac slave for 1.8 branch Nick correctly pointed out that we use PPC xserves on this branch. Removing from shopping list. > 1 - 2nd mac slave for 1.9 branch > 2 - 1st & 2nd mac slaves for moz2 branch > 1 - 2nd mac slave for try server > 1 - replacement for the dead spare I just talked with both Sean and MRZ; they think apple can revive the dead spare machine quickly, and that we do not need to buy a replacement. They will update bug#410271 after they talk with apple, and have more concrete info, but we may not need to buy this after all. Adjusted subject to match.
Summary: can Build get another 6 xserves? → can Build get another 4 or 5 xserves?
Priority: -- → P1
(In reply to comment #20) > (In reply to comment #19) > > 1 - replacement for the dead spare > I just talked with both Sean and MRZ; they think apple can revive the dead > spare machine quickly, and that we do not need to buy a replacement. They will > update bug#410271 after they talk with apple, and have more concrete info, but > we may not need to buy this after all. Just talked with Sean on IRC. This is being repaired by apple, and should be back in a week. No need to buy replacement after all. > Adjusted subject to match. Adjusted subject to match.
Summary: can Build get another 4 or 5 xserves? → can Build get another 4 xserves?
Assignee: joduinn → justin
on order.
Assignee: justin → mrz
I have four new boxes racked and mostly ready to do. I'm assuming you want these imaged with the same image I've been using for all the other Intel Xserves. Sadly, the new breed of XServes only have a Firewall 800 port (two of them!) but no FW400 and the only cables I have with me are FW400. Can you confirm you want these imaged off the "bm-xserve05" image or perhaps you want fresh 10.5 installs?
Can you image these 4 machines from "bm-xserve10"? bm-serve10 is used for the trunk/1.9 branch and is running 10.4.8. All 4 of these new machines will end up being used in various roles on trunk, so should be the same as bm-serve10. (fyi: bm-xserve05 is used for the 1.8 branch, and is slightly different, running 10.4.7)
The cleanest way to do this will be to take xserve10 offline - will that be possible this week? I can do a hot clone but it'd be cleaner to take a clone of a non-running filesystem.
If we're going with the existing 10.4 setup for bug 417045, then bm-xserve10 can be taken down at will. Just let us know when you do. But I think we need a decision in bug 417045 first. Can we cover the possibility of needing 10.5 by having one of these machines setup asap ? I hope they shipped with 10.5 and only need DNS etc done.
Blocks: 417045
(In reply to comment #26) > Can we cover the possibility of needing 10.5 by having one of these machines > setup asap ? I hope they shipped with 10.5 and only need DNS etc done. They did but Apple doesn't pre-configure RAID1 so it'll be a rebuild anyways. I can do one of those Tuesday morning and grab an image of xserve10 at the same time.
Would it be too much to ask for one of each then ? ie image xserve10, set that up on new machine, rebuild another new machine with 10.5.
No, and that's what I assumed I'd be doing :)
Nick - 1 10.5 server and one imaged off xserve10 leaves me with two extra boxes. What do you want those to be?
Flags: colo-trip+
I think we'll need the other two with 10.5, but not for two to three of days. Probably our workflow will be to set up a 10.5 machine with all the stuff we need for tinderbox & buildbot, and then image that for the other two.
bm-xserve16 / 10.2.71.113 is up. It's running a fresh install of 10.5 off the included CDs, haven't run any software updates. Login username is Administrator and you can probably guess at the password. Let me know when you're ready to have an image of this one taken for the other two.
(In reply to comment #32) > bm-xserve16 / 10.2.71.113 is up. It's running a fresh install of 10.5 off the > included CDs, haven't run any software updates. Login username is > Administrator and you can probably guess at the password. Set up with standard tinderbox auth.
Grabbed an image of bm-xserve10 (2.5 hours). Passing to Phong for the other cloning work.
Assignee: mrz → phong.tran
Status: NEW → ASSIGNED
Phong - I grabbed a new image off bm-xserve09 (the drive is still attached to that box in 103.02). It's on the desktop labeled "10.4-Gold-2008-03-06.dmg". Maybe you'll have better luck with that image onto bm-xserve17?
bm-xserver17 still won't boot after clone from new image.
Interesting that 10.4 won't run on the new hardware (hopefully 10.5 will run fine on the old hardware if we end up doing that in the future). I think we need to change the plan anyway, and setup all four of the new machines with 10.5. I'm still putting the finishing touches to bm-xserve16, and will let you know when it's ready for cloning (later today if all goes well). Hopefully you've haven't sunk too much time into getting 10.4 onto a new box.
Then I'm going to consider this bug closed. When you have a 10.5 gold image and are ready to deploy that on the remaining three, open a new bug. thanks.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
No longer blocks: 414734
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: