set up install.build.scl1.mozilla.com as a new ds server

RESOLVED FIXED

Status

Infrastructure & Operations
RelOps
P1
major
RESOLVED FIXED
7 years ago
5 years ago

People

(Reporter: arr, Assigned: dividehex)

Tracking

Details

(Whiteboard: [sjc1 evac])

(Reporter)

Description

7 years ago
Please create a raid1 volume and install OS X server 10.7 on install.build.scl1.mozilla.com.
(Reporter)

Updated

7 years ago
colo-trip: --- → scl1
I was not able to boot to a recovery partition on this machine (which currently has 10.6.8 on it), and the app store sells lion server as an *add-on* to lion for a total of $85 and more time than I have remaining here.

Where can we get the install for this?  All I see on fs2 is
  ./Apple/Xcode/Xcode 4.1 for OS X Lion.zip
  ./Apple/Mac OS X 10.7 Lion Install Disc.dmg
(Reporter)

Comment 2

7 years ago
It sounds like we're going to have to reinstall from scratch.  Please bring the 10.7 install media to the datacenter.

1) Set up RAID 1 for the boot disk (you won't be able to use a recovery partition with RAID)
2) Install 10.7 onto the RAID partition
3) Enable screen sharing.
4) Set the root password to match pxe1.

After that, we can log in remotely and install the server app and start configuring the necessary services.
I created the raid as RAID1 with a 64K block size. (Still pondering if a larger block size would help the speed as this isn't a database server.)
Waiting for Lion to install to the raid now. and will enable screen sharing and change the root password as requested.
Created user dsadmin and enabled screen sharing and remote login. Server is up and running now. Steps required are to get the server components on here which arr said she would take care of.
(Reporter)

Comment 5

7 years ago
Got the machine imaged with a plain 10.7 install.  Opened bug 699106 with desktop to see about getting Server.app from the App store.
Assignee: mlarrain → arich
Depends on: 699106
(Reporter)

Updated

7 years ago
Severity: critical → major
(Reporter)

Comment 6

7 years ago
I still need an r5 in this datacenter to test the new netboot set, but the new netboot set seems to work with 10.7 on the r4s.
colo-trip: scl1 → ---
(Reporter)

Comment 7

7 years ago
I backed out the new netboot set since the older machines were having issues.  I might need to look at enabling multiple netboot sets or recreating the one I had.
Summary: Install OS X server 10.7 on install.build.scl1.mozilla.com → set up install.build.scl1.mozilla.com as a new ds server
(Assignee)

Updated

6 years ago
Assignee: arich → jwatkins
(Assignee)

Comment 8

6 years ago
DS has been configured and is running.  I have also built a new netboot set based on lion 10.7.2. and scp'ed it over to pxe1 to be served up by the netboot service.

The netboot set was successful on r3 and r5.

We still need to install DSPC and migrate the netboot service from pxe1.scl1 to install.scl1
(Reporter)

Updated

6 years ago
Blocks: 719622
(Assignee)

Comment 9

6 years ago
DSPC has been configured.  When we are ready to flip the switch from pxe1 to install, we will need to change the IP address in dhcp to redirect pxe.
(Assignee)

Comment 10

6 years ago
The netboot service has been configured but has not been started since this would conflict with the netboot service in production on pxe1.  I have also synced the current images between pxe1 and install and updated the workflows to the current images.

At this point we are ready to start testing the netboot service w/(lion)netboot set and workflows.  This can testing can by done without disrupting the DSPC service on PXE1 since the services are independent of each other.  If the DS testing is successful, we can move forward with the DSPC testing.
(Assignee)

Comment 11

6 years ago
Success:

r4s boot the lion netboot set and are able to re-image nicely.  I successfully re-imaged both a r4-snow and r4-lion.

Capturing an image from an r5 raid volume was also successful.
As was restoring that image back to the disk on a *non-raid volume*

Failure:

I ran into a problem while attempting to re-image the r5 with a raid0 volume.  The asr refused to do a file copy mode during the restore.  Apparently apple has removed the file copy method from asr in lion.  This means it is not possible to restore an image with DS to an apple software raid volume at this time.

"Apple removed file copy support from ASR in Lion which breaks disk image restoration to software raid volumes..."
I need to build a device to continuously shake a fist at Apple.  It's like they are trying to make our lives difficult.
(Assignee)

Comment 13

6 years ago
I came up with a hack for this and it seems to work nicely.

I built a netboot set off of lion and then mounted the image. Then moved a copy of a 10.6.8 version asr to the image from a mini running snow. This downgrade of asr allowed DS continue with the file copy method it so badly wanted to use.

asr: version 239 --> asr: version 153

dividehex still wishes deploystudio was open source
(Assignee)

Comment 14

6 years ago
Both 10.7.2 and 10.7.3 have been tested successfully in capturing and deploying to a Raid0 software volume with the workaround/hack in comment 13.
(Assignee)

Comment 15

6 years ago
install.build.scl1 has been added to nagios and nrpe has been setup
(Reporter)

Updated

6 years ago
Priority: -- → P1
Whiteboard: [sjc1 evac]
(Assignee)

Updated

6 years ago
Depends on: 727155
(Assignee)

Comment 16

6 years ago
After getting DS working and tested on install.build.scl1, we ran into issues setting up DSPC.  Apple has replaced Samba with its own smb/cifs library which prevents DSPC from accessing CIFS on lion.   NFS is not an option since it appears to be disabled in the DSPC boot image.  This forces us to revert to using 10.6.8.

I have installed 10.6.8 on install2.build.scl1 which will be the new DS/DSPC server for SCL1.  Install will move to scl3 and server as DS since DSCP will not be needed there.
(Assignee)

Comment 17

6 years ago
Per irc convo with arr:

We will upgrade PXE1 to the current DS version (1.0rc130) and continue using it to support SCL1 while Install will then be earmarked for SCL3.
(Assignee)

Comment 18

6 years ago
PXE1 has been upgraded to the latest stable DS version (1.0rc130) and is back to serving Netboot and DSPC. I have also built a lion netboot set (with asr version 153) so it can support the r5 builder with raid.
(Assignee)

Updated

6 years ago
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.