Closed Bug 695278 Opened 9 years ago Closed 9 years ago

SeaMonkey Buildbot master and many slaves inaccessible.

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Callek, Assigned: mlarrain)

References

Details

I know many of these are on the Parallels server. Not certain if any are on ESX as well though.

BLOCKER since this is currently stopping work on SeaMonkey for any branch, including creating a beta release. Ensuring the tree stays closed.

List of hosts down:

cb-seamonkey-linuxmaster-01
cb-seamonkey-linux-01
cb-seamonkey-linux-02
cb-seamonkey-linux64-01
cb-seamonkey-win32-01
cb-seamonkey-win32-02
cb-seamonkey-win32-03

Once the master is up, myself or KaiRo will need to login and reboot manually, to ensure that buildbot and all services startup properly, after that it is prudent to restart all the slaves so that they can connect to the master cleanly.
Assignee: server-ops → ashish
That sounds like it's all machines on the parallels host.
Assignee: ashish → phong
This entire server is down.  It doesn't see any of the drives anymore.
This server is down and has been removed from sjc1 and brought back to mtv1 for service or recovery.  I expect you'll have a more detailed prognosis in the morning when it's been examined.

So the severity is clear, this has the Seamonkey tree closed at the moment.
Rick is working on bring it back to life.  It looks like the controller might be bad and he is working with Apple support to get it sorted out.
For some reason one of the drives in the raid set is degraded.  This shouldn't stop the server from booting, so Apple is sending a tech and replacement drive. Will update with time schedule when Apple calls back with the appointment.
Looks like tomorrow October 20, 2011 between 9am-5pm pacific for the hardware replacement.  Apple Dispatch # D53836703
Apple arrived today with bad news.  According to them the raid array in un-recoverable.  The bad drive was not apart of that array, so we had 2 issue at once.  I am giving the raid array one last shot for recovery with non-Apple tools.
Manually set the new drive as a raid spare.  The rebuild is running, and should be finished (good or bad) in the morning.  I will update the status as soon I know more details of success or failure of the rebuild.
Per dumitru on IRC, phong mentioned this RAID rebuild completely failed, which means that all data is lost.

Dumitru also believes that we have absolutely no refimages that can be put back on these VM, my memory contradicts that but I can't find the relevant/helpful bug #'s at present, hoping KaiRo can help out here.

Either way, the base-OS install + Parallels Install + VM OS installs was told to me to be unable to happen until at earliest mid-next-week.
(In reply to Justin Wood (:Callek) from comment #9)
> Dumitru also believes that we have absolutely no refimages that can be put
> back on these VM, my memory contradicts that but I can't find the
> relevant/helpful bug #'s at present, hoping KaiRo can help out here.

IIRC, we had success back then to convert the Linux ESX refimages to Parallels, but Windows needed a fresh install and setup.
I'd be very happy if we could get Parallels replaced with something more common in Mozilla build/release, but a re-install of those Parallels VMs should work as well.
1. Can VMWare be installed on an Xserve?
2. How many VMs can VMWare support on an Xserve?
3. Does MoCo have a VMWare licence we can use? (I assume MoCo has a site licence to this sort of stuff).
Does it need to be the same (obsolete?) Xserve or could MoCo/CG sponsor one (or two) more common Xeon-based hardware (and virtual environment on top of it) for SeaMonkey? Now would be good time to refresh (and plug any SPOF?) platform since reinstalls are needed anyway..
(In reply to Philip Chee from comment #11)
> 1. Can VMWare be installed on an Xserve?
> 2. How many VMs can VMWare support on an Xserve?
> 3. Does MoCo have a VMWare licence we can use? (I assume MoCo has a site
> licence to this sort of stuff).

(In reply to Teemu Mannermaa (:wicked) from comment #12)
> Does it need to be the same (obsolete?) Xserve or could MoCo/CG sponsor one
> (or two) more common Xeon-based hardware (and virtual environment on top of
> it) for SeaMonkey? Now would be good time to refresh (and plug any SPOF?)
> platform since reinstalls are needed anyway..

Let me ask to filter questions on can/should/etc. for this through me, so as not to confuse confuddle this bug for those (mostly IT/OPs) working on it.

That said, :wicked, newer hardware to replace this is not on the table. 

Ratty, VMWare Fusion should certainly be able to be used instead. I'm not sure how many VM's it would support here, but I would shoot for our last-setup numbers and go from there. 

So yea, I'm happy with VMWare Fusion, or Parallels back up, but we'll want this setup as soon as possible one way or the other.
Current Status:

* cb-seamonkey-linuxmaster-01 is up, and churning, a minor issue with my cron running for clobberer but overall happy. (Used puppet as a guide for the basic setup, and set clobberer in /builds/clobberer/* htdocs/ is the public dir with symlinks to repo, and db/ is the database store)

* cb-seamonkey-linux-01 is created, just needs set up and then cloning for the other linux hosts

* cb-seamonkey-linux64-01 is created, also needs set up.

* cb-seamonkey-win32-01 is being worked on as we speak by phong (I'm told)


Open Question:

Is the VM software still Parallels, or did we switch to another host software?
Reassigning this bug to Matt, as he has been designated as the point-man for all-things-seamonkey for now.

Current Status:
 
* cb-seamonkey-linuxmaster-01 is (still up)
** Will need to take a snapshot later (separate bug)

* cb-seamonkey-linux-01 is created, still needs set up and then cloning for
the other linux hosts
** Will work on this a bit this week, and hopefully be done with it by mid next week so it can be snapshotted and run in prod for a bit like I'm doing with win

* cb-seamonkey-linux64-01 is up.
** (has older clang, but I can deal with that later)
** Will need to take a snapshot later as well (Will do at same time as our linuxmaster as this is the only linux64 we have right now)

* cb-seamonkey-win32-01 is set up, and being snapshotted as I type this. Once thats done I'll run it in prod for a few days before having the other win hosts here imaged up.


Solved Question:

Is the VM software still Parallels, or did we switch to another host
software?

We are using parallels, and for record the hostname of this xserve (HOST os) is: cb-parallels01
Assignee: phong → mlarrain
cb-seamonkey-win32-01 02 and 03 are all up and running.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
For those following along at home, and not directly involved.

This bug is marked FIXED now due to the fact that IT's direct work here is basically over, all VM's are technically accessible, I brought up them after this xserve and Parallels was fixed.

I'm working on finishing bringing up the linux32, which will get copied over to the other linux VM's for the parallels (in a different bug)

And we'll also get backups created for all these VM types in a different bug as well.

Thanks to everyone involved for your hard work!
Depends on: 714499
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.