Closed
Bug 1014703
Opened 11 years ago
Closed 11 years ago
image 64 seamicro machines for production
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: arich)
References
Details
Attachments
(2 files)
In bug 1002634 we successfully tested 3 seamicro machines as windows builders. Let's move the remaining into production as well.
Assignee | ||
Updated•11 years ago
|
Assignee: relops → mcornmesser
Assignee | ||
Comment 1•11 years ago
|
||
:catlee: we're going to need to know how many machines to allocate for try vs build. Do you have a count for us? We'll also need to reclaim the first node and rebuild it once the ssds come in, since it's on a 1T SATA drive right now.
Reporter | ||
Comment 2•11 years ago
|
||
Let's go with 50% try, 50% prod.
Assignee | ||
Comment 3•11 years ago
|
||
catlee: so to be clear, you want 7 32G wintry, 7 32G winbuild, 25 16G wintry, and 25 winbuild?
Flags: needinfo?(catlee)
Assignee | ||
Comment 4•11 years ago
|
||
This is the seamicro server config for all 64 nodes (0 - 63). They are split so that we have 25 16G wintry nodes, 25 16G winbuild nodes, 7 32G wintry nodes, and 7 32G winbuild nodes.
Information has not yet been added to inventory/dns/dhcp.
Assignee | ||
Comment 5•11 years ago
|
||
bhearsum: since the seamicros are chassis and don't have individual ipmi interfaces like the 1U machines, have you given though to how automatic reboots will be handled? The docs talk about XML-RPC API support, but relops has not investigated this at all.
I suspect you want to look at SeaMicro_Config_2.7.0.0_18-Mar-2012_Edition-1.pdf to get a sense of what's possible.
Flags: needinfo?(bhearsum)
Comment 6•11 years ago
|
||
(In reply to Amy Rich [:arich] [:arr] from comment #5)
> bhearsum: since the seamicros are chassis and don't have individual ipmi
> interfaces like the 1U machines, have you given though to how automatic
> reboots will be handled? The docs talk about XML-RPC API support, but
> relops has not investigated this at all.
I've moved away from SlaveAPI work for the time being, so I'm redirecting this to Callek
Flags: needinfo?(bhearsum) → needinfo?(bugspam.Callek)
Comment 7•11 years ago
|
||
In the meantime slaveapi will file "unreachable" dcops bugs for these hosts, if ipmi is not specified/reachable in inventory and we can't connect directly to the host.
for slaveapi needs I'd love a set of docs on "how to remotely reboot a specific node" and any unique information for such stored in inventory so we don't need to replicate a dict like devices.json in order to do so.
Reporter | ||
Comment 8•11 years ago
|
||
(In reply to Amy Rich [:arich] [:arr] from comment #3)
> catlee: so to be clear, you want 7 32G wintry, 7 32G winbuild, 25 16G
> wintry, and 25 winbuild?
yes
Flags: needinfo?(catlee)
Assignee | ||
Comment 9•11 years ago
|
||
64 hostnames, SREG records, and CNAMEs updated in inventory to match the chassis description and server id designations:
https://inventory.mozilla.org/en-US/core/search/#q=b-2008-sm
Assignee | ||
Comment 10•11 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #7)
That might quickly become untenable if there are a lot of reboots required.
I'm not sure how to remotely reboot these machines (without using the GUI) as it stands. I suspect the first step will probably involve moving the OOB interface and/or figuring out how to setup the fabric inband without screwing with the IP allocations.
Assignee | ||
Comment 11•11 years ago
|
||
catlee: when can we delete the three existing machines and rebuild them on the new ssds?
Flags: needinfo?(catlee)
Reporter | ||
Comment 12•11 years ago
|
||
Any time. Coordinate with buildduty to take them out of production first.
Flags: needinfo?(catlee)
Assignee | ||
Updated•11 years ago
|
Summary: Move remaining seamicro machines into production → image 64 seamicro machines for production
Assignee | ||
Comment 13•11 years ago
|
||
The three existing machines were disabled in slavealloc and their disk configurations have been wiped and re-done with the proper SSD partitions. I've configured the disks for all but nodes 4-12 (which are waiting on an SSD swap), and have kicked off installs for all of the machines that have disks. It looks like we having issues talking to the domain controllers, though, so getting the machines functional is blocked on some help from netops.
Assignee | ||
Comment 14•11 years ago
|
||
All new SSDs have been configured, and we're attempting installs on all nodes. We might still be blocked by flow issues, so I'll report back when all installs are successful.
Assignee | ||
Updated•11 years ago
|
Assignee: mcornmesser → arich
Assignee | ||
Comment 15•11 years ago
|
||
After a great deal of debugging, hacking, and hand-holding, I think I've gotten all but 004 and 0031 installed. Those two are having tftp issues which are probably the fault of the seamicro's weird internal networking. I'll bang on them more next week.
Assignee | ||
Comment 16•11 years ago
|
||
mgerva: catlee says you're the one who can start smoke testing these before we get them into production, just to make sure that things are working and we're seeing the performance we expected (we went from half of a 1T SSD to 1/4 of a 1T ssd).
Flags: needinfo?(mgervasini)
Comment 17•11 years ago
|
||
Thanks arr!
b-2008-sm-000{1..3} have been re-enabled in slavealloc. b-2008-sm-0033 should be ready soon.
Flags: needinfo?(mgervasini)
Comment 18•11 years ago
|
||
b-2008-sm-0033 has been enabled on slavealloc too.
Comment 19•11 years ago
|
||
All the seamicros (except for 0004 and 0031) have been enabled on slavealloc.
Most of the seamicro are already accepting jobs but some of them, after a reboot, are just showing the following message (when connecting with RDP): "Please wait for the Group Policy Client..."
here's the list of machines that show the above message:
(try)
b-2008-sm-0013
b-2008-sm-0016
b-2008-sm-0017
b-2008-sm-0020
b-2008-sm-0021
b-2008-sm-0024
b-2008-sm-0025
b-2008-sm-0026
b-2008-sm-0027
b-2008-sm-0030
(build)
b-2008-sm-0044
b-2008-sm-0059
b-2008-sm-0060
b-2008-sm-0063
b-2008-sm-0064
Assignee | ||
Comment 20•11 years ago
|
||
Going to split out the issues with 0004 and 0031 into a different bug.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Flags: needinfo?(bugspam.Callek)
You need to log in
before you can comment on or make changes to this bug.
Description
•