Closed Bug 492224 Opened 15 years ago Closed 15 years ago

Decommission SeaMonkey's cb-xserve02

Categories

(mozilla.org Graveyard :: Server Operations, task)

PowerPC
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kairo, Assigned: phong)

References

Details

(Whiteboard: see comment #26)

SeaMonkey currently has 4 Mac build slaves: The new cb-seamonkey-osx-01/02 ones are running Leopard, while the old cb-sea-miniosx01 and cb-xserve02 are running Tiger.

We can't put them into a generic platform pool together, as we would risk getting different builds out of the pool depending on what machine the task happens to end up on.

We need to figure out what to do about that. The management-wise easiest and most future-proof solution would be to retire the Tiger machines and give them back to Mozilla IT while getting two more Leopard VMs instead. Of course, I'm not sure if that's possible both from the POV of available resources as from what Mozilla can give us.

Matthew, can you enlighten me there?


The alternative would be to deviate from the mozilla2 config we're basing on, have two pools and use the Tiger one for 1.9.1 branch and the Leopard one for trunk, which would allow us to use Tiger as long as 1.9.1 is alive (1.9.2 will probably not support it any more).
I don't have any problem dumping the two physical servers and replacing them with two VMs.  Easier to manage and saves on power too.

Flip this over to server-ops if that's the direction you want to go.
In that case, I'd like to request two copies of the Parallels OSX image, looks like it's easier for both of us :)
Assignee: nobody → server-ops
Component: Project Organization → Server Operations
Product: SeaMonkey → mozilla.org
QA Contact: organization → mrz
Version: Trunk → other
Summary: Figure out the Tiger vs. Leopard machine problem → Replace Tiger machines with Leopard VMs for SeaMonkey
cb-seamonkey-osx-03 / 63.245.210.40
cb-seamonkey-osx-04 / 63.245.210.41

When can we reclaim the other hosts?
Thanks for the new machines, I roughly verified I can access them, will set them up tomorrow. Is it early enough if I can take down and leave you the old machines some time next week? I want to get this done as early as possible, but one never knows what pitfalls come up in between. :)
I'm in no rush - comment in the bug when they're ready to be reclaimed.
Whiteboard: waiting on kairo
OK, the new machines are working and fully set up to join the SeaMonkey buildbot pool, I'll notify you here in the bug once I'm ready for the old machines to be shut down and taken back by you guys. Thanks a lot!
phong, you can reclaim the Mini and shutdown the xserve.
Assignee: server-ops → phong
Erm, don't take them down yet, I have mentioned in comment #6 that I'll tell you here.

I still need to do some work to the new configurations, but the more pressing issue is that the new VMs are not connected to the network due to a quite interesting Parallels bug that make us lose network connections on machines when we come over a certain number of running VMs (8?)

We need to at least solve that before the old machines can be taken down and reclaimed.
I have a case open with Parallels.
Depends on: 493321
Moving dependency as retiring the Tiger machines depends on moving the whole SeaMonkey systems to the generic pools and moving those into production (even though creating the pools depends on having the new Leopard VMs available).
No longer blocks: 485821
Depends on: 485821
Fwiw,
Thunderbird recently added 'MacOSX 10.5 comm-* check'
in addition(!) to their 'MacOSX 10.4 comm-* check'.

I don't know what their setups are, but
I wonder whether SeaMonkey may want to do so (somehow) too?
(The goal could be to run packaged tests (only) on the 10.4.?.)
(In reply to comment #11)
> I don't know what their setups are, but
> I wonder whether SeaMonkey may want to do so (somehow) too?

No, we want to get rid of the 10.4 machines.
cb-xserve02.mozilla.com is idle now, and we probably can turn it off soon (at least if nothing bad happens that might need us to go back to the old build configurations).
cb-sea-miniosx01 will need to be with us a while longer, until bug 493321 is solved.
Whiteboard: waiting on kairo → waiting bug 493321 (Parallels), then on OK from on kairo
Blocks: 504344
Where are we with this bug?
Whiteboard: waiting bug 493321 (Parallels), then on OK from on kairo → waiting on OK from on kairo
We're waiting on getting reliable Mac machines for SeaMonkey, Parallels obviously can't provide them.
Whiteboard: waiting on OK from on kairo → waiting on reliable Mac machines, then OK from on kairo
(In reply to comment #15)
> We're waiting on getting reliable Mac machines for SeaMonkey, Parallels
> obviously can't provide them.

Who's driving that?  Is that another Community Giving request?
Assignee: phong → mrz
(In reply to comment #16)
> (In reply to comment #15)
> > We're waiting on getting reliable Mac machines for SeaMonkey, Parallels
> > obviously can't provide them.
> 
> Who's driving that?  Is that another Community Giving request?

I have no clue yet, because I'm not only f***ing angry about that whole failure the Parallels experiment has ended in and left us without the full set of machines we were promised in the first place and with the ones we have on the Mac platform crippled in a way that they can't run unit tests reliably, but I also have a zillion of other stuff to care about.

cb-xserve02 is not in my hands any more and not needed by us any more, I have told joduinn it would be available for some testing around Thunderbird 2.0.0.x, not sure if he actually used it for that in the end.
cb-sea-miniosx01 is the only Mac machine we have right now that is stable enough to reliably run unit tests and unless we get at least 4 really reliable Leopard machines, wherever they come from and whoever can provide them, I'm very unwilling to let it go.
Chatted with Kairo.  Presently, we are at a better state then when we started with 2 more Macs then what we started with.  (Three of the new ones are up and one of the old ones on Tiger is still running to give us reliable unit tests).  

Although Seamonkey have trunk and branch up, we aren't running unit tests on trunk right now as we're missing the machine power to do so.

Kairo:  What do you need to get unit tests running on trunk?  You mention 4 Leopard machines.  I wonder where we can start and if 4 is the must-have number?
(In reply to comment #18)
> Kairo:  What do you need to get unit tests running on trunk?  You mention 4
> Leopard machines.  I wonder where we can start and if 4 is the must-have
> number?

Our slave pools are already way overloaded with the three Mac slaves that are running, esp. at the speed they do (currently taking 5h for a nightly build), so 4 is the bare minimum.
Also, a comm-central change currently affects both trunk and branch, and triggers both normal builds and unit test build runs on both branches, which is 4 processes, so only 4 machines per platform can handle that usefully. (I'm obviously not counting the time the machines are spending in nightly builds and L10n repacks, or the additional power we'll need when considering packaged test runs without losing leak stats or possibly PGO - or any possibility to maintain more than one branch in parallel to trunk.)

So, to reiterate, 4 machines per platform are the bare minimum we need to run builds and unit tests on both trunk and branch.
(In reply to comment #19)

> So, to reiterate, 4 machines per platform are the bare minimum we need to run
> builds and unit tests on both trunk and branch.

I got lost - how many total machines?
Assignee: mrz → sethb
Component: Server Operations → Community Giving
QA Contact: mrz → community-giving
As I feared, this bug is morphing from the "retire the old machines" bug into the "get the requested machines going" bug, which just makes it messier to read now :-/

(In reply to comment #20)
> (In reply to comment #19)
> 
> > So, to reiterate, 4 machines per platform are the bare minimum we need to run
> > builds and unit tests on both trunk and branch.
> 
> I got lost - how many total machines?

Summary:
              min   ESX    Parallels
                    up     up   down
Mac Leopard    4     0      3*    1
Win2k3         4     2      2     0
Linux i686     5     2      2     1
Linux x86_64   1     0      1     0

min: bare minimum of machines needed with 2 branches (i.e. trunk + 1 branch)
     - 2 per branch and tier-1 platform,
     - 1 for x86_64 testing,
     - 1 for enabling debug cycles on Linux
(This is not the "we're happy" requirement, it's the "we can at least do stuff" minimum)

ESX: currently running ("up") machines on ESX servers (sj and nl)

Parallels:
  up: machines currently up and running
  down: set up but shut down and not able to run due to bug 493321

*: Mac Leopard VMs on Leopard reduced to 1 CPU and therefore quite slow due to
   bug 493450 and still sometimes faulty with bug 494671

(Also, the buildmaster, which is one of the Parallels Linux VMs, recently started to intermittently lose connection with slaves while building, which makes cycles go red.)


I hope this gives a visible overview of where we are and gives some insight as to why I'm somewhat unhappy with the current state of affairs.
So 4 Mac Minis?
(In reply to comment #22)
> So 4 Mac Minis?

On the Mac side of things, I guess that would be the best solution - RelEng is using minis as well for their build pools?

What's your take on the whole Parallels story? Trying to get rid of it altogether?
BTW, cb-sea-miniosx01 and cb-xserve02 are of course still around. They're PPC machines set up with Tiger, the former is currently still running unit test cycles as long as we don't have reliable Leopard machines, the latter is completely idle.
Of course, they can be replaced or re-imaged as needed in the process of this bug (which would match what the bug originally was filed for).
(In reply to comment #24)
> BTW, cb-sea-miniosx01 and cb-xserve02 are of course still around. They're PPC
> machines set up with Tiger

I just saw that I erred, cb-sea-miniosx01 is an Intel mini, actually. It probably could be re-imaged and added to the Leopard pool in the process of getting this solved.
This bug is quite confusing right now, I've talked to mrz and we're going forward this way:

1) This bug originally was about decommissioning the old Tiger machines, let's go back to that and make this bug be about cb-xserve02, which can die or be taken back by you, and let's close off the bug with that.

2) I'll file a new bug on re-imaging cb-sea-miniosx01 with Leopard, which can then move forward as time in your team permits.

3) I'll file another new bug on getting 4 new minis on Leopard, which can move as allowed by your team, Apple sales, etc.

4) I'll file yet another bug on the changes we can do to the Parallels setup once step 3 is done and I have brought those machines into production.
-------------------------------------------------------------------------------


So, from here on, this bug is about decommissioning cb-xserve02, which currently is assigned to SeaMonkey, but is idle and can be taken offline and back by IT right away.
Assignee: sethb → server-ops
Component: Community Giving → Server Operations
QA Contact: community-giving → mrz
Summary: Replace Tiger machines with Leopard VMs for SeaMonkey → Decommission SeaMonkey's cb-xserve02
Whiteboard: waiting on reliable Mac machines, then OK from on kairo → see comment #26
Blocks: 526206
Blocks: 526208
No longer blocks: 504344
(filed bug 526206 and bug 526208 about points 2) and 3) of comment #26)
(In reply to comment #26)
> 1) This bug originally was about decommissioning the old Tiger machines, let's
> go back to that and make this bug be about cb-xserve02, which can die or be
> taken back by you, and let's close off the bug with that.

What spec machine is this xserve? And if its of any use to me, is it possible for me to snag this?

If its recent vintage, we would love any help on reducing our wait times. If its really old machine, we can add it to a pool of "geriatric" machines we now have doing testing on old hardware like non-sse, ppc, etc.
(In reply to comment #28)
> What spec machine is this xserve? And if its of any use to me, is it possible
> for me to snag this?

It's a PPC machine, probably somewhat old, so I guess it's more a candidate for the "geratric" pool ;-)
Assignee: server-ops → phong
Flags: colo-trip+
done.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.