Closed Bug 696428 Opened 13 years ago Closed 12 years ago

set up install.build.scl1.mozilla.com as a new ds server

Categories

(Infrastructure & Operations :: RelOps: General, task, P1)

x86
macOS

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Assigned: dividehex)

References

Details

(Whiteboard: [sjc1 evac])

Please create a raid1 volume and install OS X server 10.7 on install.build.scl1.mozilla.com.
colo-trip: --- → scl1
I was not able to boot to a recovery partition on this machine (which currently has 10.6.8 on it), and the app store sells lion server as an *add-on* to lion for a total of $85 and more time than I have remaining here.

Where can we get the install for this?  All I see on fs2 is
  ./Apple/Xcode/Xcode 4.1 for OS X Lion.zip
  ./Apple/Mac OS X 10.7 Lion Install Disc.dmg
It sounds like we're going to have to reinstall from scratch.  Please bring the 10.7 install media to the datacenter.

1) Set up RAID 1 for the boot disk (you won't be able to use a recovery partition with RAID)
2) Install 10.7 onto the RAID partition
3) Enable screen sharing.
4) Set the root password to match pxe1.

After that, we can log in remotely and install the server app and start configuring the necessary services.
I created the raid as RAID1 with a 64K block size. (Still pondering if a larger block size would help the speed as this isn't a database server.)
Waiting for Lion to install to the raid now. and will enable screen sharing and change the root password as requested.
Created user dsadmin and enabled screen sharing and remote login. Server is up and running now. Steps required are to get the server components on here which arr said she would take care of.
Got the machine imaged with a plain 10.7 install.  Opened bug 699106 with desktop to see about getting Server.app from the App store.
Assignee: mlarrain → arich
Severity: critical → major
I still need an r5 in this datacenter to test the new netboot set, but the new netboot set seems to work with 10.7 on the r4s.
colo-trip: scl1 → ---
I backed out the new netboot set since the older machines were having issues.  I might need to look at enabling multiple netboot sets or recreating the one I had.
Summary: Install OS X server 10.7 on install.build.scl1.mozilla.com → set up install.build.scl1.mozilla.com as a new ds server
Assignee: arich → jwatkins
DS has been configured and is running.  I have also built a new netboot set based on lion 10.7.2. and scp'ed it over to pxe1 to be served up by the netboot service.

The netboot set was successful on r3 and r5.

We still need to install DSPC and migrate the netboot service from pxe1.scl1 to install.scl1
Blocks: 719622
DSPC has been configured.  When we are ready to flip the switch from pxe1 to install, we will need to change the IP address in dhcp to redirect pxe.
The netboot service has been configured but has not been started since this would conflict with the netboot service in production on pxe1.  I have also synced the current images between pxe1 and install and updated the workflows to the current images.

At this point we are ready to start testing the netboot service w/(lion)netboot set and workflows.  This can testing can by done without disrupting the DSPC service on PXE1 since the services are independent of each other.  If the DS testing is successful, we can move forward with the DSPC testing.
Success:

r4s boot the lion netboot set and are able to re-image nicely.  I successfully re-imaged both a r4-snow and r4-lion.

Capturing an image from an r5 raid volume was also successful.
As was restoring that image back to the disk on a *non-raid volume*

Failure:

I ran into a problem while attempting to re-image the r5 with a raid0 volume.  The asr refused to do a file copy mode during the restore.  Apparently apple has removed the file copy method from asr in lion.  This means it is not possible to restore an image with DS to an apple software raid volume at this time.

"Apple removed file copy support from ASR in Lion which breaks disk image restoration to software raid volumes..."
I need to build a device to continuously shake a fist at Apple.  It's like they are trying to make our lives difficult.
I came up with a hack for this and it seems to work nicely.

I built a netboot set off of lion and then mounted the image. Then moved a copy of a 10.6.8 version asr to the image from a mini running snow. This downgrade of asr allowed DS continue with the file copy method it so badly wanted to use.

asr: version 239 --> asr: version 153

dividehex still wishes deploystudio was open source
Both 10.7.2 and 10.7.3 have been tested successfully in capturing and deploying to a Raid0 software volume with the workaround/hack in comment 13.
install.build.scl1 has been added to nagios and nrpe has been setup
Priority: -- → P1
Whiteboard: [sjc1 evac]
After getting DS working and tested on install.build.scl1, we ran into issues setting up DSPC.  Apple has replaced Samba with its own smb/cifs library which prevents DSPC from accessing CIFS on lion.   NFS is not an option since it appears to be disabled in the DSPC boot image.  This forces us to revert to using 10.6.8.

I have installed 10.6.8 on install2.build.scl1 which will be the new DS/DSPC server for SCL1.  Install will move to scl3 and server as DS since DSCP will not be needed there.
Per irc convo with arr:

We will upgrade PXE1 to the current DS version (1.0rc130) and continue using it to support SCL1 while Install will then be earmarked for SCL3.
PXE1 has been upgraded to the latest stable DS version (1.0rc130) and is back to serving Netboot and DSPC. I have also built a lion netboot set (with asr version 153) so it can support the r5 builder with raid.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.