Closed Bug 1014703 Opened 11 years ago Closed 10 years ago

image 64 seamicro machines for production

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: catlee, Assigned: arich)

References

Details

Attachments

(2 files)

seamicro server configuration 11 years ago Amy Rich [:arr] [:arich] 8.75 KB, text/plain		Details
disk configuration for the seamicro 11 years ago Amy Rich [:arr] [:arich] 7.25 KB, text/plain		Details

Chris AtLee [:catlee]

Reporter

Description

•

11 years ago

In bug 1002634 we successfully tested 3 seamicro machines as windows builders. Let's move the remaining into production as well.

Chris AtLee [:catlee]

Reporter

Updated

•

11 years ago

Blocks: 1002634

Amy Rich [:arr] [:arich]

Assignee

Updated

•

11 years ago

Depends on: 1014700

Amy Rich [:arr] [:arich]

Assignee

Updated

•

11 years ago

Assignee: relops → mcornmesser

Amy Rich [:arr] [:arich]

Assignee

Comment 1

•

11 years ago

:catlee: we're going to need to know how many machines to allocate for try vs build. Do you have a count for us? We'll also need to reclaim the first node and rebuild it once the ssds come in, since it's on a 1T SATA drive right now.

Chris AtLee [:catlee]

Reporter

Comment 2

•

11 years ago

Let's go with 50% try, 50% prod.

Amy Rich [:arr] [:arich]

Assignee

Comment 3

•

11 years ago

catlee: so to be clear, you want 7 32G wintry, 7 32G winbuild, 25 16G wintry, and 25 winbuild?

Flags: needinfo?(catlee)

Amy Rich [:arr] [:arich]

Assignee

Updated

•

11 years ago

Depends on: 1017126

Amy Rich [:arr] [:arich]

Assignee

Comment 4

•

11 years ago

Attached file seamicro server configuration — Details

This is the seamicro server config for all 64 nodes (0 - 63). They are split so that we have 25 16G wintry nodes, 25 16G winbuild nodes, 7 32G wintry nodes, and 7 32G winbuild nodes. Information has not yet been added to inventory/dns/dhcp.

Amy Rich [:arr] [:arich]

Assignee

Comment 5

•

11 years ago

bhearsum: since the seamicros are chassis and don't have individual ipmi interfaces like the 1U machines, have you given though to how automatic reboots will be handled? The docs talk about XML-RPC API support, but relops has not investigated this at all. I suspect you want to look at SeaMicro_Config_2.7.0.0_18-Mar-2012_Edition-1.pdf to get a sense of what's possible.

Flags: needinfo?(bhearsum)

bhearsum@mozilla.com (:bhearsum)

Comment 6

•

11 years ago

(In reply to Amy Rich [:arich] [:arr] from comment #5) > bhearsum: since the seamicros are chassis and don't have individual ipmi > interfaces like the 1U machines, have you given though to how automatic > reboots will be handled? The docs talk about XML-RPC API support, but > relops has not investigated this at all. I've moved away from SlaveAPI work for the time being, so I'm redirecting this to Callek

Flags: needinfo?(bhearsum) → needinfo?(bugspam.Callek)

Justin Wood (:Callek)

Comment 7

•

11 years ago

In the meantime slaveapi will file "unreachable" dcops bugs for these hosts, if ipmi is not specified/reachable in inventory and we can't connect directly to the host. for slaveapi needs I'd love a set of docs on "how to remotely reboot a specific node" and any unique information for such stored in inventory so we don't need to replicate a dict like devices.json in order to do so.

Chris AtLee [:catlee]

Reporter

Comment 8

•

11 years ago

(In reply to Amy Rich [:arich] [:arr] from comment #3) > catlee: so to be clear, you want 7 32G wintry, 7 32G winbuild, 25 16G > wintry, and 25 winbuild? yes

Flags: needinfo?(catlee)

Amy Rich [:arr] [:arich]

Assignee

Comment 9

•

11 years ago

64 hostnames, SREG records, and CNAMEs updated in inventory to match the chassis description and server id designations: https://inventory.mozilla.org/en-US/core/search/#q=b-2008-sm

Amy Rich [:arr] [:arich]

Assignee

Comment 10

•

11 years ago

(In reply to Justin Wood (:Callek) from comment #7) That might quickly become untenable if there are a lot of reboots required. I'm not sure how to remotely reboot these machines (without using the GUI) as it stands. I suspect the first step will probably involve moving the OOB interface and/or figuring out how to setup the fabric inband without screwing with the IP allocations.

Amy Rich [:arr] [:arich]

Assignee

Comment 11

•

11 years ago

catlee: when can we delete the three existing machines and rebuild them on the new ssds?

Flags: needinfo?(catlee)

Amy Rich [:arr] [:arich]

Assignee

Updated

•

11 years ago

Depends on: 1020424

Chris AtLee [:catlee]

Reporter

Comment 12

•

11 years ago

Any time. Coordinate with buildduty to take them out of production first.

Flags: needinfo?(catlee)

Amy Rich [:arr] [:arich]

Assignee

Updated

•

11 years ago

Summary: Move remaining seamicro machines into production → image 64 seamicro machines for production

Amy Rich [:arr] [:arich]

Assignee

Updated

•

11 years ago

Depends on: 1018450

Amy Rich [:arr] [:arich]

Assignee

Comment 13

•

11 years ago

The three existing machines were disabled in slavealloc and their disk configurations have been wiped and re-done with the proper SSD partitions. I've configured the disks for all but nodes 4-12 (which are waiting on an SSD swap), and have kicked off installs for all of the machines that have disks. It looks like we having issues talking to the domain controllers, though, so getting the machines functional is blocked on some help from netops.

Amy Rich [:arr] [:arich]

Assignee

Comment 14

•

11 years ago

Attached file disk configuration for the seamicro — Details

All new SSDs have been configured, and we're attempting installs on all nodes. We might still be blocked by flow issues, so I'll report back when all installs are successful.

Amy Rich [:arr] [:arich]

Assignee

Updated

•

11 years ago

Assignee: mcornmesser → arich

Amy Rich [:arr] [:arich]

Assignee

Comment 15

•

10 years ago

After a great deal of debugging, hacking, and hand-holding, I think I've gotten all but 004 and 0031 installed. Those two are having tftp issues which are probably the fault of the seamicro's weird internal networking. I'll bang on them more next week.

Amy Rich [:arr] [:arich]

Assignee

Comment 16

•

10 years ago

mgerva: catlee says you're the one who can start smoke testing these before we get them into production, just to make sure that things are working and we're seeing the performance we expected (we went from half of a 1T SSD to 1/4 of a 1T ssd).

Flags: needinfo?(mgervasini)

Massimo Gervasini [:massimo]

Comment 17

•

10 years ago

Thanks arr! b-2008-sm-000{1..3} have been re-enabled in slavealloc. b-2008-sm-0033 should be ready soon.

Flags: needinfo?(mgervasini)

Massimo Gervasini [:massimo]

Comment 18

•

10 years ago

b-2008-sm-0033 has been enabled on slavealloc too.

Massimo Gervasini [:massimo]

Comment 19

•

10 years ago

All the seamicros (except for 0004 and 0031) have been enabled on slavealloc. Most of the seamicro are already accepting jobs but some of them, after a reboot, are just showing the following message (when connecting with RDP): "Please wait for the Group Policy Client..." here's the list of machines that show the above message: (try) b-2008-sm-0013 b-2008-sm-0016 b-2008-sm-0017 b-2008-sm-0020 b-2008-sm-0021 b-2008-sm-0024 b-2008-sm-0025 b-2008-sm-0026 b-2008-sm-0027 b-2008-sm-0030 (build) b-2008-sm-0044 b-2008-sm-0059 b-2008-sm-0060 b-2008-sm-0063 b-2008-sm-0064

Amy Rich [:arr] [:arich]

Assignee

Comment 20

•

10 years ago

Going to split out the issues with 0004 and 0031 into a different bug.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Justin Wood (:Callek)

Updated

•

10 years ago

Flags: needinfo?(bugspam.Callek)

You need to log in before you can comment on or make changes to this bug.