Closed Bug 841609 Opened 11 years ago Closed 11 years ago

bigtent-yahoo1.idweb.scl2.svc.mozilla.com and bigtent-yahoo3.idweb.scl2.svc.mozilla.com not coming back up after disabling rsbac SOFTMODE

Categories

(Cloud Services :: Operations: Miscellaneous, task)

x86_64
Linux
task
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gene, Unassigned)

Details

NOTE : THESE SYSTEMS ARE NOT LIVE YET EVEN THOUGH THEY'RE IN "PRODUCTION". THIS REQUEST IS ***NOT*** URGENT

In the process of working on getting the bigtent servers ready to take some code I found that 1 and 3 in SCL2 were in SOFTMODE for RSBAC. I disabled SOFTMODE and when running puppet saw that it said "err: /Stage[main]/Rsbac/Exec[please-reboot-to-disable-softmode]/returns: change from notrun to 0 failed: /bin/false returned 1 instead of one of [0] at /etc/puppet/modules/rsbac/manifests/init.pp:190"

I then did a "sudo init 6" to reboot 1 and 3 and they don't seem to be coming back up.

If these are VMs is there a wiki page on how to get into the console? If these are physical, do we have PDU control to power cycle them?
As mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=841102 there are problems running RSBAC kernels on libvirt guests running on kvm servers in svcops.

In short, when init 6/reboot-ing from the console, the host will completely hang, requiring the host to be destroy'd to be properly rebooted. Once that's done, the drive will come up in an inconsistent state and require repair. This drops you to a root password prompt and this patchset of RSBAC will not allow you to login at the console as root. You may continue on to a command line, but this skips the disk repair and you have to re-reboot and do it manually.

It's my suggestion that the bigtent-* servers be moved to hardware until the issues with running RSBAC on libvirt kvm guests.
 
Documentation for the KVM servers is, and has been, on the svcops wiki at https://intranet.mozilla.org/Services/Ops/KVM
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.