Closed Bug 982261 Opened 6 years ago Closed 6 years ago

investigate installing 2008R2 on seamicro

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
macOS
task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: arich, Assigned: markco)

References

Details

Attachments

(1 file)

We have a lot of spare seamicro xeon capacity.  It's worth a quick check to see if we can use them for windows 2008R2 builders.
OK Mark, this is all yours.  Console access is via SSH from an admin node like rocky to seamicro-c1.r101-3.console.scl3.mozilla.com.  Credentials are in GPG.  There is nothing running on this chassis, so you have your run of the place.
Q and I figured out how to do a completely no touch install. Well except for sending the reset using pxe command. 

Next question would be, do we know if any of those nodes are in inventory? I am asking because of DNS and naming.
Two of them are in inventory:

* https://inventory.mozilla.org/systems/show/7058/
* https://inventory.mozilla.org/systems/show/7059/

You're free to re-use those.  You can add any others you need using similar values.  The rack positions there are .00 and .01 for servers 1 and 2.. you could probably change that to .01 and .02 to avoid off-by-one confusion.

There's docs on setting pxe boot, among lots of other things, at https://mana.mozilla.org/wiki/display/SYSADMIN/Seamicro
It doesn't appear that the Windows builder vlan is available to this machine. How should I go about resolving that?  


seamicro-c1(config-nic-0)# clear vlan 264
vlans are being cleared for server id 1 nic 0 . Please wait...
seamicro-c1(config-nic-0)# vlan 40 untagged
%Error: Vlan id not configured.
The windows 2008R2 vlans in scl3 are 240 (winbuild) and 244 (wintry).
I knew that. I was looking at this and didn't click:
https://mana.mozilla.org/wiki/display/NOC/VLAN+assignments?src=search#VLANassignments-Releng

OK that works.
Blocks: 940789
(a correction there, winbuild is 236).  Mark is working on getting this to work now, since it's bubbled up in priority.
Quick update. We were able to get a partial install working, and as well able to RDP into the machine. Next step is to remove SeaMicros unattended file and see holw fare it gets with one our custom unattended files.
Update. Currently the OS is being installed, it joins the domain, updates Group policy. At this point working out having RDP connect to the system.
Depends on: 991711
RDP appears to be working now.  Next steps:

rename the blade description in on the seamicro to be: b-2008-sm-0001.winbuild.releng.scl3.mozilla.com

get it to recognize the 1T drive for testing

reimage and hand off for testing
Sounds like fast progress!

How is it going on obtaining the ssd. I was unable to get an answer from seamicro on whether these things will pass-through trim. I think we'll have to find out by testing whether windows sees trim support :(
dcops had one on order (probably through SN, which I can't see tickets in unless I own them).

also, I renamed the node in the chassis, so now it's fix disk and reimage before handing over.
I went through and figured out how to partition a disk into multiple vdisks while I was at it here:

https://mana.mozilla.org/wiki/display/SYSADMIN/How+to+Change+Seamicro+Drive+Partitioning

And the 1G disk has been setup as one big partition:

Disk0/1 is up
 Model: ST91000640NS, Revision: BK03, Serial: 9XG17Q46
 Id: 5000c5004057f498, Name: /dev/wd7c, Size: 931GB
 Server:   1, Vdisk:  0, Name: partition-0/1-00, Size: 930GB, Offset: 00000000GB
The machine has been renamed, and host name has been given to catlee. The naming script still needs a bit of rework before it will work for the Seamicro nodes, but that should be worked out soon.
I've set up the disk for system ids 2 and 3 as well.  Please go ahead and install them as
b-2008-sm-0002.winbuild.releng.scl3.mozilla.com
b-2008-sm-0003.winbuild.releng.scl3.mozilla.com
Both nodes have been kicked off. I will check on them later this morning.
Because the naming script is still being worked on the installs can not happen in parallel. One machine needs to be installed, renamed, rebooted before the second install can be kicked off. Else the install fails on Domain join because of the same name.  

I have re-kicked off the an install on node 2.
Node 2 complete and renamed. Kicking off node 3.
After kicking off the install for node 3, node 2 dropped off, and node three never completed. I am going to shutdown node 3 and try to reinstall node 2.
Attached image SeaMicroNode2.jpg
Even though there are 2 separate partitions, there seems to be an issue with the 2 nodes sharing the same drive. Node installed well and was up for hours before i kicked off the install on node three. Shortly after the console for node 3 showed loading registry, node 2 crashed. The attached screen shot is from the console for node 2 on the Seamicro.
Niether machine was reachable through RDP, and console for node 3 was showing nothing. After resetting node three the console showed that machine entered a check disk mode and began deleting corrupted files and index entries. The check disk mode seemed to have died at:
Correcting error in index $I30 for file 7119

Which was somewhere in stage 2 of 3.
I went back and checked the disk config, and somehow it had let me assign the same vdisk to both machines.  I've fixed that.  Try installing them again.
Just an update. I still having issue installing on either node.
(In reply to Mark Cornmesser [:markco] from comment #23)
> Just an update. I still having issue installing on either node.

Any updates to this? Have we decided that seamicros are not viable?
Depends on: 999940
There seems to be some trouble in communication between the domain controllers and the entire 236 vlan. I just opened a blocking bug up with Netops to check this out.
No longer depends on: 999940
Depends on: 999940
Node 2 is now up and renamed. I am going to install 3 and see what happens.
Here are some test results in our staging environment:

WINNT 5.2 mozilla-central build  - compile time:  23 mins, 2 secs
WINNT 5.2 mozilla-central pgo-build - complie time: 2 hrs, 55 mins, 8 secs
WINNT 6.1 x86-64 mozilla-central build - complie time: 24 mins, 26 secs 

I have found a single issue with the current setup: I had to upload manually the ffxbld_dsa key (~/.ssh/ffxbld_dsa) to make it working properly.
Currently I am running into an issue where I can not install the OS on separate partitions of the SSD, which was our plan for nodes 2 and 3. I have contacted SeaMicro support.
(In reply to Massimo Gervasini [:mgerva] from comment #27)
> Here are some test results in our staging environment:
> 
> WINNT 5.2 mozilla-central build  - compile time:  23 mins, 2 secs
> WINNT 5.2 mozilla-central pgo-build - complie time: 2 hrs, 55 mins, 8 secs
> WINNT 6.1 x86-64 mozilla-central build - complie time: 24 mins, 26 secs 
> 
> I have found a single issue with the current setup: I had to upload manually
> the ffxbld_dsa key (~/.ssh/ffxbld_dsa) to make it working properly.

This is a known issue. Releng opted to copy the keys manually. We have a process worked out to automatically distribute the keys but it has not been blessed yet.
(In reply to Mark Cornmesser [:markco] from comment #28)
> Currently I am running into an issue where I can not install the OS on
> separate partitions of the SSD, which was our plan for nodes 2 and 3. I have
> contacted SeaMicro support.

Support Case number C-4426.
Node 2 and 3 are now available for testing.
All of the following machines are now handed over to releng for testing.  Please add the releng bug that's tracking that side of the work as a dependent on this one so we can trace the work though bugzilla.

b-2008-sm-0001.winbuild.releng.scl3.mozilla.com (1T physical disk)
b-2008-sm-0002.winbuild.releng.scl3.mozilla.com (one half of 1T SSD)
b-2008-sm-0003.winbuild.releng.scl3.mozilla.com (second half of 1T SSD)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.