Closed Bug 572395 Opened 10 years ago Closed 8 years ago

SeaMonkey Hardware Deployment in SCL3

Categories

(SeaMonkey :: Project Organization, defect, major)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kairo, Assigned: mlarrain)

References

(Depends on 4 open bugs)

Details

(Whiteboard: [need machine count])

I recently created a "one year hardware strategy" for SeaMonkey, for which I've summed up requests into four sets: "have" is our existing infrastructure, "needed" is the minimum I see what require within a year, "wanted" is what we would be comfortable to have, "like" is what we'd like in the ideal case.

For the "needed" set, bugs are filed for all requested additional machines, this bug is for tracking all those in that set.

For reference, here's the rough matrix of the machine numbers in the different sets:

       | Linux |  Mac | Win32 | Linux64 | Mac64 | Win64 |
-------+-------+------+-------+---------+-------+-------+
have   |   6   |   5  |   5   |    1    |   -   |   -   |
-------+-------+------+-------+---------+-------+-------+
needed |  10+1 |   8  |  10   |    1    |   1   |   1   |
-------+-------+------+-------+---------+-------+-------+
wanted |  12+1 |  10  |  12   |    6    |   6   |   6   |
-------+-------+------+-------+---------+-------+-------+
like   |  16+1 |  13  |  15   |   10    |  10   |  10   |

The +1 in the Linux column is the SUMO server, see dependencies.
As a note, clobberer can run on the puppet server, so they only need one VM together.
I think I need to get an understanding of how the cycle perpetuates itself.  In one year, are we going to need to add more machines?  

This is a good summary of the bugs you have filed, but you should add in more of that strategy that you wrote in email.  Can we articulate in this bug how many active branches you intend to maintain, how we can recycle machines in the future to get more leverage from the resources we provide, etc?  

Let's allow this bug to track that discussion so MRZ and I can move forward.
(In reply to comment #1)
> I think I need to get an understanding of how the cycle perpetuates itself.  In
> one year, are we going to need to add more machines?  

I thought I pointed that out in the email, I filed the bug to track the minimum, i.e. "needed" set in the plan I pointed out there.

Concretely, on adding more machines after that year, it depends on what of the listed sets we make match reality, see below.

> This is a good summary of the bugs you have filed, but you should add in more
> of that strategy that you wrote in email.

I had intended this to be just a tracking bug for those other bugs, but I can surely go into more detail. Here's the birds-eye overview that I gave in the email:

We're anticipating having a second release branch, after we delayed that from 1.9.2 to 1.9.3. We're running builds, L10n repacks and debug tests on those machines, opt tests would be good, ideally we'd get some perf numbers and run talos as well. Our current machines are Linux, Mac and Windows 32bit, with one experimental Linux64 machine, we should extend to 64bit at least experimentally, ideally more.
For the non-builder/tester machines: We have a build master running, having talos would need a talos master as well (that's in "like" only). And unfortunately, we can't share Firefox infrastructure directly in cases like SUMO, puppet/OPSI, signing, or clobberer, so we'll need to get our own setups.

> Can we articulate in this bug how
> many active branches you intend to maintain, how we can recycle machines in the
> future to get more leverage from the resources we provide, etc?  

Currently, we have trunk and one release branch, very soon, we need to have trunk and two release branches, but that's what should persist in the long run.
This bug is only about the "needed" profile/set, which is focused in 32bit with having some experimental 64bit support. In the long run, we'll need to match the 32bit build/test machines with 64bit ones. Even later, when 32bit is being retired in Mozilla world, the various 32bit builders can be phased out.
The level we need to reach to fully do 64bit and match what we're doing for 32bit is what we have for Mac. Of course, the difference there is smaller in the "wanted" and "like" profiles/sets than in the "needed" one, that's part of the trade-off between them.

As for the actual machines, the new VMs for puppet/clobberer (linux), OPSI (win32), and signing (win32) could possibly be supplied from existing ones used for build/test if we get new machines to replace those older ones those instead. The new machines can be supplied in the style of what we request for the additional release branch in bug 537323 - the SUMO server in bug 543373 is probably an exception.

> Let's allow this bug to track that discussion so MRZ and I can move forward.

I'm happy with anything as long as we can get things moving on this topic. :)
Assignee: nobody → mrz
This bug's old enough that I've lost a little context - 

@kaiser - can you tell me how many of each platform you need so I can get things on order?  No Minis unless require OSX.
(In reply to comment #3)
> This bug's old enough that I've lost a little context - 
> 

If I am wrong on any front below, KaiRo can correct me, if I am right on it all, hopefully he'll acknowledge this.

Since I'm just barely getting up to speed on SM Releng Machine needs, (I am taking over/have taken over as primary RelEng guy from kaiRo for SeaMonkey). I'll try to address this based on our "like" list that he outlined in c#0, if the numbers and dollars add up too high, we can back down a bit, but I want to get us going.

> @kaiser - can you tell me how many of each platform you need so I can get
> things on order? 

       | Linux |  Mac | Win32 | Linux64 | Mac64 | Win64 |
-------+-------+------+-------+---------+-------+-------+
like   |  16+1 |  13  |  15   |   10    |  10   |  10   |

Ok, on the linux front:

I'm actually going to extend this, if possible: 16+2;

1 machines, for SUMO install (Bug 543373).. to quote from there: "For very low load, these can all run on one box. More realistically you can probably run MySQL/Sphinx on one box and Apache/Memcache on another."
2 MySQL install, usable by our buildmaster [currently on sqlite] AND SUMO install (if security is not a problem on allowing access both to build network and SUMO machine).  -- If it is, we can merge this down to 1 machine.
16: Build [ix] Machines, loaded with CentOS5.

Linux64:
10 Build [ix] Machines, loaded with CentOS5x64

Win32:
15 Build [ix] Machines [KaiRo, what to load these with?]

Win64:
10 Build [ix] Machines [KaiRo, what to load these with?]

Mac64:
[modified] 16: OSX 10.6 should be the same machines as Firefox "fast" OSX64 ones.

> No Minis unless require OSX.

We need them for tests, we can re-image our current OSX 10.6 machines back to OSX10.5 for this, so I'm looking for:
7 mini's loaded with OSX10.5

And these should all have around 80GB on an /e/ drive for builds, mirroring production Firefox machines.

(If this is too much, as I said we can trim down to our bare minimum; this is the minimum I see us being able to be productive with in the coming year or two actually; given the new rapid-fire development style Firefox is moving to. And this does not even count "Talos" style hardware for us.)
mrz, just to make things clear, what has been granted is the "needed" set, right, or is it a different amount of machines?

Callek and me would of course be happier if we have some wiggle room there, esp. if it's in the area of getting more Linux64 machines online - and, when it comes to Macs, the picture looks surely nicer as we actually need the increase mostly on the 64bit (10.6) side and less so on the 32bit (10.5) side - but hardware-wise, that's not the real difference anyhow.

So, what amount/set of machines has been granted? What wiggle room do we have?
(In reply to comment #5)
> mrz, just to make things clear, what has been granted is the "needed" set,
> right, or is it a different amount of machines?


Needed - can you revise your hardware  list based on that please?
Whiteboard: [need machine count]
OK, so here's the list for new machines for the "needed" profile:

Mac build/test slaves: 4
PC (Linux/Windows) build/test slaves: 13
(I added 3 PC machines for getting Linux 64bit up to a level where SeaMonkey can start actually supporting that architecture officially)

SUMO server: 1 (was included in comment #0 - is this still on the plate?)


System setups for slaves:

Mac: 1x 10.5, 3x 10.6
PC: 4x Linux 32bit, 3x Linux 64bit, 5x Win2k3, 1x Win 64bit

All those (except the Win64 one) are ideally to be cloned from existing SM slave setups (minis or VMs).

When those are up and running, some of the existing Linux/Windows VMs are to be re-purposed for the non-slave setups like puppet or signing.
> All those (except the Win64 one) are ideally to be cloned from existing SM
> slave setups (minis or VMs).

That may be problematic.  I can't buy the same Mini hardware you have now, can only buy the new hardware.  Unsure if that image will exactly work.

I don't know of a way to take a VM image and put it on bare metal hardware.  You may need to build new ref images for these (similar to what we did for RelEng).
> SUMO server: 1 (was included in comment #0 - is this still on the plate?)

Tell me more about this one - is it sumo for seamonkey?  What sort of hardware do you need for this?
(In reply to comment #8)
> > All those (except the Win64 one) are ideally to be cloned from existing SM
> > slave setups (minis or VMs).
> 
> That may be problematic.  I can't buy the same Mini hardware you have now, can
> only buy the new hardware.  Unsure if that image will exactly work.
> 
> I don't know of a way to take a VM image and put it on bare metal hardware. 
> You may need to build new ref images for these (similar to what we did for
> RelEng).

If that is necessary, `we` can manage and figure out a solution. Just the "ideal" is using an existing refimage.

We surely do want -iX- etc. machines like are in use now by Firefox/MoCo rather than getting a bunch of VMs
(In reply to comment #9)
> > SUMO server: 1 (was included in comment #0 - is this still on the plate?)
> 
> Tell me more about this one - is it sumo for seamonkey?  What sort of hardware
> do you need for this?

https://bugzilla.mozilla.org/show_bug.cgi?id=543373#c4

or summarized here:

"At a minimum, you'll need Apache, MySQL, and Memcache available. If you want
search to work, you'll also need Sphinx Search[1]."

We expect very low load, so they can all probably be on one box.

And as per https://bugzilla.mozilla.org/show_bug.cgi?id=543373#c3 it is unclear *exactly* what that box needs to be for the hardware side; (at least unclear to me, Robert may have gotten better insight since then)
As Callek said, the exact requirements for that server are a bit fuzzy unfortunately, we don't really have much of a clue ourselves - and yes, that's for a SUMO for SeaMonkey.
The build machines are the most pressing point, though, we can handle the SUMO one lazily.
Hardware on order (Mac Minis + HP DL120s).  SUMO we'll do later.
(In reply to comment #13)
> Hardware on order (Mac Minis + HP DL120s).  SUMO we'll do later.

CC-ing joduinn and dustin per Phone Meeting request.
The Mac Minis should be delivered to San Jose today 

Fed Ex Tracking no.: 468767477373

@zandr, which shelves did you want?
(In reply to comment #15)
> @zandr, which shelves did you want?

Zandr, was this answered off bug?

mrz: any updates on status for those of us curious?

(I'm just trying to find out where in the status-steps we are here. If its waiting on a question from you, want to move it along, of course.)

Mrz: also FYI, I agree'd to let Joduinn's team take these boxes and he agreed to run them across puppet/opsi to bring them to a production state for us, where before being handed to SeaMonkey they will get cleared of keys/passwords and have the puppet/opsi tooling dropped/disabled, before we get them for our use.
Target Milestone: --- → seamonkey2.1b3
Target Milestone: seamonkey2.1b3 → ---
(In reply to comment #16)
> (In reply to comment #15)
> > @zandr, which shelves did you want?
> 
> Zandr, was this answered off bug?

mrz told me he'd ordered the Sonnettech shelves, so I think so. :)

> mrz: any updates on status for those of us curious?

I need to ping Rich, but I'd expect the DL120s any day now.

> Mrz: also FYI, I agree'd to let Joduinn's team take these boxes and he agreed
> to run them across puppet/opsi to bring them to a production state for us,
> where before being handed to SeaMonkey they will get cleared of keys/passwords
> and have the puppet/opsi tooling dropped/disabled, before we get them for our
> use.

Any by "joduinn's team", you really mean my team to get refimages installed, etc. Then I'll coordinate with dustin to get the puppet configs sorted.
(In reply to comment #17)
> > mrz: any updates on status for those of us curious?
> 
> I need to ping Rich, but I'd expect the DL120s any day now.
> 

So... how are we on the status of all these machines, setup, and progress all around?

(a) have all requested/approved machines arrived.
(b) have all requested/approved machines been installed in a data center.
(c) who has ownership of the machines atm.
(d) have the machines been imaged in any way yet.
(e)...more?
Good timing.  Just got an update today from HP that the DL120s are delayed until June.  I lost track of the Minis - Zandr, did those come in?
Yes, they're installed in the Sonnet shelves and sitting under my desk.
Callek asked in IRC, but I'll answer here.

The minis are not imaged. We don't have a refimage for this hardware. FWIW, this hardware won't run 10.5, either. We can't buy hardware that will.

I was waiting on the HP hardware to come in to install anything in a DC.
Pushing over to Zandr for actual implementation.
Assignee: mrz → zandr
(In reply to comment #21)
> this hardware won't run 10.5, either. We can't buy hardware that will.

Ok, incase its not clear then, the 4 mac's we'll install 10.6 on, and then we'll bring down one of the 10.5 machines we have [and one of those 10.6 ones that used to have 10.5] later, so it can be re-imaged, and we can get an extra 10.5 back. Unless KaiRo has an opposing opinion.
(In reply to comment #23)
> Ok, incase its not clear then, the 4 mac's we'll install 10.6 on, and then
> we'll bring down one of the 10.5 machines we have [and one of those 10.6
> ones that used to have 10.5] later, so it can be re-imaged, and we can get
> an extra 10.5 back. Unless KaiRo has an opposing opinion.

Sounds good to me - and actually, you're the one who needs to decide nowadays ;-)
(In reply to matthew zeier [:mrz] from comment #22)
> Pushing over to Zandr for actual implementation.

Zandr, any details on our progress with the scl2 expansion and thus these machines? If the expansion is going to take much longer is there any rack/space/etc. we can use in the interim for this, since these very machines block us being ABLE to install MSVC2010 on any windows machines, and of course, that will be the trunk default soon.
Assignee: zandr → dustin
I'll meet up with Justin early next week to get his notion of the current status and open questions here, then get answers for the open questions and hopefully finish this.  Sorry for the delay - remind me that I'm buying when I'm next in Boston ;)
So AIUI the current state of this project is as follows:
 - 13 HP DL120G7's are in scl2, but not racked or otherwise set up
 - 4 r4(?) minis are in mtv1 (under zandr's desk..)

The plan for these machines is:
 - 5 win2k3 ** critical for MSVC2010
 - 1 win64
 - 4 linux32
 - 3 linux64
 - 4 Mac OS X 10.6 - known-good on this hardware

The next few steps are:
 - get them racked and powered on, and in the build VPN (??)
 - image them up with the releng images (which should be fun since the releng images are for iX hardware) (relops)
 - bake them in staging for a bit (releng)
 - move to community VLAN or anywhere that can communicate with the master (relops)

In looking to take care of the racking-and-powering, I'd like to see if we can get these racked somewhere temporary to begin with -- the minis in mtv1, and maybe putting a subset of the HP's in scl1 for imaging and baking.  Either way, since I'm remote, I'll need to get local hands working on this, which can be hard.  Likely nothing will move until we've got the R4 minis installed, but at that point I can probably get some time allocated.
Severity: normal → major
In parallel to getting the HP's racked in scl2, I'll set up access to two DL120G7's (relabs01 and relabs02) that are acting as labs machines at the moment.  I want to say up-front, though: we may need these boxes back on short (1 day) notice at any time, so we'll need to be quick.

I'm reasonably confident that I can get the linux32 and linux64 systems up and talking to puppet, using a fresh (kickstart) install of centos55. 

I'm less confident in building the win32/win64 machines.  I have no confidence that OPSI will do what it's trained to do, and even if it did that doesn't completely specify a slave.  If it's OK, I'd like to do a base OS install of each of those, and then leave them to you (Callek) to set up as per the RefImage docs - I'm sure releng and digipengi will help out with any questions.

In both cases, once the machine is built, we'll snapshot it with DeployStudio, so we can roll it out ready to go in scl2 when the time comes.

Callek, I've granted you temporary build VPN access in bug 692205.  I'll put a w32 image on one of the machines and give you login information -- will you be able to do the setup from there?  I can work on the linux images on the other machine.
I filed bug 692344 to get the minis racked locally, and the path from there to a running 10.6 builder is fairly short.

I've allocated relabs01 to windows image creation, and relabs02 to linux image creation.  As you might expect, neither has gone smoothly.  Not surprisingly, ancient CentOS-5.0 doesn't have drivers built-in for the HP drives.  Similarly, the windows 2003 sp2 install CDs don't recognize the disks.  I'm sure there are ways around these problems - it just remains to find them!
I forgot that this hardware can't run CentOS-5.0.  Best we can do is 5.6 -- see
  http://h18004.www1.hp.com/products/servers/linux/supportmatrix/rhel/exceptions/rhel-exceptions.html
Given coop's success with 5.5 for the hgwriter slaves, after some discussion with Callek, 5.6 it is.
Even getting 5.6 on these systems is a bit tricky, and seems a waste of time given that we'll be putting centos 6 on them soon. Also, w2k3sp2 isn't supported at all on this hardware - it won't even install.

As noted above, windows is the priority for seamonkey, as they currently have no capacity to build with MSVC2010.

As an alternative to breaking fresh ground in supported hardware, I'd like to see if we can "trade" some of these DL120G7's for some iX systems.  It seems it would be easiest to move some of the iX systems currently in mountain view over to the community network (or relocate them if necessary), replete with windows images.  This doesn't have to be a full set - maybe 2 w32 systems and one linux?

This would be temporary until releng designs images that run on DL120G7's, at which point we'd "trade back", or otherwise settle accounts.
This was on the roundtable list for today's releng meeting - what was the outcome?
(In reply to Dustin J. Mitchell [:dustin] from comment #31) 
> As an alternative to breaking fresh ground in supported hardware, I'd like
> to see if we can "trade" some of these DL120G7's for some iX systems.  It
> seems it would be easiest to move some of the iX systems currently in
> mountain view over to the community network (or relocate them if necessary),
> replete with windows images.  This doesn't have to be a full set - maybe 2
> w32 systems and one linux?
> 
> This would be temporary until releng designs images that run on DL120G7's,
> at which point we'd "trade back", or otherwise settle accounts.

We discussed this at the releng meeting today. We're fine with trading 3 iX systems to SeaMonkey, but that's contingent on them coming from the batch of machines currently slated for repair, i.e. bug 673972.

We're already starting to hemorrhage on win32 build times, but fixing the outstanding iX machines wouldn't get us to where we need to be anyway, so we might as well give them to SeaMonkey. We'll be prioritizing the DL120 work to gain capacity back here.
Great!
Depends on: 673972
No longer depends on: 557704
I've located the minis, asset's 05280, 05281, 05282, 05283.  I need to find access to the build VLAN to image them - working on that now.
We're taking these to scl1 tomorrow to image and bake them among familiar faces.
OK, the mac minis are done.  Here's the stats:

5280	c8:2a:14:20:90:78	sea1	C07FH1BFDD6L
5282	c8:2a:14:20:92:07	sea2	C07FH1EXDD6L
5283	c8:2a:14:20:98:4b	sea3	C07FH1EGDD6L
5281	c8:2a:14:20:c6:d5	sea4	C07FH1E9DD6L

all are puppetized, cleaned of secrets, and passwords changed to one I shared with Callek.

Sadly, there's no *place* for these minis yet, so they are coming back to mountain view to go under matt's desk until we can get them to scl2, where the community network lives.
Depends on: 702490
No longer depends on: 701887
Assignee: dustin → mlarrain
These mini's have moved to SCL2 for now but will be going to SCL3 shortly.
Status: NEW → ASSIGNED
Depends on: 721516
No longer depends on: 702490
No longer depends on: 703156
No longer depends on: 703160
Serge, please leave even the "obsoleted" dependencies on here. And the bug 703160 machines have no current tracker any more, which also isn't good, as now there's nothing on file to get us up to the full amount of machines, esp. on Linux.
Depends on: 702490, 703156, 703160
Summary: Needed machines for one year hardware strategy 2010 → SeaMonkey Hardware Deployment in SCL3
All we are waiting on now is iLO to get finished on the HP's and then all the SM hardware will be in place.
all items are configured per talking to dmoore, closing this bug.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.