Closed
Bug 712456
Opened 12 years ago
Closed 10 years ago
upgrade heatfink/fan/memory on remaining ix builder machines and move them to scl3
Categories
(Infrastructure & Operations :: DCOps, task)
Infrastructure & Operations
DCOps
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Unassigned)
References
Details
(Whiteboard: #UJU-709-53233 - replacement drive for 009)
Low-priority at the moment since scl3 isn't built yet. These hosts will need to be allocated space in scl3 and moved there, via iX for heatsink/fan replacement. mw32-ix-slave01 mw32-ix-slave02 mw32-ix-slave03 mw32-ix-slave04 mw32-ix-slave05 mw32-ix-slave06 mw32-ix-slave07 mw32-ix-slave08 mw32-ix-slave09 mw32-ix-slave10 mw32-ix-slave11 mw32-ix-slave12 mw32-ix-slave13 mw32-ix-slave14 mw32-ix-slave15 mw32-ix-slave16 mw32-ix-slave17 mw32-ix-slave18 mw32-ix-slave19 mw32-ix-slave20 mw32-ix-slave21 mw32-ix-slave22 mw32-ix-slave23 mw32-ix-slave24 mw32-ix-slave25 mw32-ix-slave26
Reporter | ||
Updated•12 years ago
|
Blocks: releng-scl3
Updated•12 years ago
|
Assignee: server-ops-releng → arich
colo-trip: --- → mtv1
Reporter | ||
Comment 1•12 years ago
|
||
I don't think this is *quite* ready for an mtv1 trip yet!
colo-trip: mtv1 → ---
Updated•12 years ago
|
Summary: move mw32-ix-slave* to scl3 → upgrade heatfink/fan/memory and move mw32-ix-slave* to scl3
Comment 2•12 years ago
|
||
Assigning this to Jake and we can have iX come out and do this once we are ready to move them.
Assignee: arich → jwatkins
Status: NEW → ASSIGNED
Comment 3•12 years ago
|
||
linux-ix-ref mv-moz2-linux-ix-slave01 mv-moz2-linux-ix-slave02 mv-moz2-linux-ix-slave03 mv-moz2-linux-ix-slave04 mv-moz2-linux-ix-slave05 mv-moz2-linux-ix-slave06 mv-moz2-linux-ix-slave07 mv-moz2-linux-ix-slave08 mv-moz2-linux-ix-slave09 mv-moz2-linux-ix-slave10 mv-moz2-linux-ix-slave11 mv-moz2-linux-ix-slave12 mv-moz2-linux-ix-slave13 mv-moz2-linux-ix-slave14 mv-moz2-linux-ix-slave15 mv-moz2-linux-ix-slave16 mv-moz2-linux-ix-slave17 mv-moz2-linux-ix-slave18 mv-moz2-linux-ix-slave19 mv-moz2-linux-ix-slave20 mv-moz2-linux-ix-slave21 mv-moz2-linux-ix-slave22 mv-moz2-linux-ix-slave23 mw32-ix-slave01 mw32-ix-slave02 mw32-ix-slave03 mw32-ix-slave04 mw32-ix-slave05 mw32-ix-slave06 mw32-ix-slave07 mw32-ix-slave08 mw32-ix-slave09 mw32-ix-slave10 mw32-ix-slave11 mw32-ix-slave12 mw32-ix-slave13 mw32-ix-slave14 mw32-ix-slave15 mw32-ix-slave16 mw32-ix-slave17 mw32-ix-slave18 mw32-ix-slave19 mw32-ix-slave20 mw32-ix-slave21 mw32-ix-slave22 mw32-ix-slave23 mw32-ix-slave24 mw32-ix-slave25 mw32-ix-slave26 win32-ix-ref
Summary: upgrade heatfink/fan/memory and move mw32-ix-slave* to scl3 → upgrade heatfink/fan/memory and move mw32-ix-slave* and mv-moz2-linux-ix-slave*, and ix ref machines to scl3
Updated•12 years ago
|
Assignee: jwatkins → mlarrain
Updated•12 years ago
|
No longer blocks: releng-scl3
Updated•12 years ago
|
Assignee: mlarrain → server-ops
Component: Server Operations: RelEng → Server Operations: DCOps
QA Contact: zandr → dmoore
Comment 4•12 years ago
|
||
This project is on hold until the hardware is released for move. Please update here when DC Ops is cleared to begin planning.
Updated•12 years ago
|
Whiteboard: [reit]
Updated•12 years ago
|
colo-trip: --- → mtv1
Comment 5•12 years ago
|
||
This bug is not currently actionable, so I'm making it infra-only to avoid confusion.
Group: infra
Updated•12 years ago
|
Group: infra
Updated•12 years ago
|
Summary: upgrade heatfink/fan/memory and move mw32-ix-slave* and mv-moz2-linux-ix-slave*, and ix ref machines to scl3 → upgrade heatfink/fan/memory on remaining ix builder machines and move them to scl3
Comment 6•11 years ago
|
||
Please upgrade the following machines now but leave them in service in mtv1: mv-moz2-linux-ix-slave01 mv-moz2-linux-ix-slave02 mv-moz2-linux-ix-slave03 mv-moz2-linux-ix-slave04 mv-moz2-linux-ix-slave05 mv-moz2-linux-ix-slave06 mv-moz2-linux-ix-slave07 mv-moz2-linux-ix-slave08 mv-moz2-linux-ix-slave09 mv-moz2-linux-ix-slave10 mv-moz2-linux-ix-slave11 mv-moz2-linux-ix-slave12 mv-moz2-linux-ix-slave13 mv-moz2-linux-ix-slave14 mv-moz2-linux-ix-slave15 mv-moz2-linux-ix-slave16 mv-moz2-linux-ix-slave17 mv-moz2-linux-ix-slave18 mv-moz2-linux-ix-slave19 mv-moz2-linux-ix-slave20 mv-moz2-linux-ix-slave21 mv-moz2-linux-ix-slave22 mv-moz2-linux-ix-slave23 They can be taken offline at any time.
Comment 7•11 years ago
|
||
Hostnames have been remapped per upcoming retask: mv-moz2-linux-ix-slave01 bld-centos6-ix-051 mv-moz2-linux-ix-slave02 bld-centos6-ix-052 mv-moz2-linux-ix-slave03 bld-centos6-ix-053 mv-moz2-linux-ix-slave04 bld-centos6-ix-054 mv-moz2-linux-ix-slave05 bld-centos6-ix-055 mv-moz2-linux-ix-slave06 bld-centos6-ix-056 mv-moz2-linux-ix-slave07 bld-centos6-ix-057 mv-moz2-linux-ix-slave08 bld-centos6-ix-058 mv-moz2-linux-ix-slave09 bld-centos6-ix-059 mv-moz2-linux-ix-slave10 bld-centos6-ix-060 mv-moz2-linux-ix-slave11 bld-centos6-ix-061 mv-moz2-linux-ix-slave12 bld-centos6-ix-062 mv-moz2-linux-ix-slave13 bld-centos6-ix-063 mv-moz2-linux-ix-slave14 bld-centos6-ix-064 mv-moz2-linux-ix-slave15 bld-centos6-ix-065 mv-moz2-linux-ix-slave16 bld-centos6-ix-066 mv-moz2-linux-ix-slave17 bld-centos6-ix-067 mv-moz2-linux-ix-slave18 bld-centos6-ix-068 mv-moz2-linux-ix-slave19 bld-centos6-ix-069 mv-moz2-linux-ix-slave20 bld-centos6-ix-070 mv-moz2-linux-ix-slave21 bld-centos6-ix-071 mv-moz2-linux-ix-slave22 bld-centos6-ix-072 mv-moz2-linux-ix-slave23 bld-centos6-ix-073
Comment 8•11 years ago
|
||
My mistake, that should be: bld-linux64-ix-051.build.mtv1.mozilla.com bld-linux64-ix-052.build.mtv1.mozilla.com bld-linux64-ix-053.build.mtv1.mozilla.com bld-linux64-ix-054.build.mtv1.mozilla.com bld-linux64-ix-055.build.mtv1.mozilla.com bld-linux64-ix-056.build.mtv1.mozilla.com bld-linux64-ix-057.build.mtv1.mozilla.com bld-linux64-ix-058.build.mtv1.mozilla.com bld-linux64-ix-059.build.mtv1.mozilla.com bld-linux64-ix-060.build.mtv1.mozilla.com bld-linux64-ix-061.build.mtv1.mozilla.com bld-linux64-ix-062.build.mtv1.mozilla.com bld-linux64-ix-063.build.mtv1.mozilla.com bld-linux64-ix-064.build.mtv1.mozilla.com bld-linux64-ix-065.build.mtv1.mozilla.com bld-linux64-ix-066.build.mtv1.mozilla.com bld-linux64-ix-067.build.mtv1.mozilla.com bld-linux64-ix-068.build.mtv1.mozilla.com bld-linux64-ix-069.build.mtv1.mozilla.com bld-linux64-ix-070.build.mtv1.mozilla.com bld-linux64-ix-071.build.mtv1.mozilla.com bld-linux64-ix-072.build.mtv1.mozilla.com bld-linux64-ix-073.build.mtv1.mozilla.com
Comment 9•11 years ago
|
||
Slight change of plan here. 19 of the iX boxes are going to become linux foopys to get us off of the Mac minis there. Can we please rename a subset of these for use as foopies, specifically: bld-linux64-ix-0[55-73]
Comment 10•11 years ago
|
||
coop: I'll handle that in a different bug next week since it's just a name change. That doesn't impact the need to upgrade the hardware in this dcops bug.
Comment 11•11 years ago
|
||
So far we have upgraded the heatsinks and memory of Ix systems asset tags 3132, 3133, 3134, 3135, 3136, 3137, 3139, 3140, and 3144. We will complete the rest of the upgrades on Monday.
Comment 12•11 years ago
|
||
The only one of those that had a working IPMI was 3139 (foopy125). Could you please make sure that the IPMI comes up on each device so I can kickstart them?
Comment 13•11 years ago
|
||
So I may have found some secret sauce to making the IPMI lan connections recover. From the local machine, you have to set the IP src type to static, wait till it picks that up, then switch it back to dhcp. This seems to work MUCH more reliably than doing an mc reset. ipmitool lan set 1 ipsrc static ipmitool lan set 1 ipsrc dhcp Doing this I got the connections to all but foopy122 (doesn't seem to take) and foopy124 (can't get to the host) up. Please check on those two.
Comment 14•11 years ago
|
||
We've completed the heatsink/memory upgrade of: bld-linux64-ix-051.build.mtv1.mozilla.com bld-linux64-ix-052.build.mtv1.mozilla.com bld-linux64-ix-053.build.mtv1.mozilla.com bld-linux64-ix-054.build.mtv1.mozilla.com bld-linux64-ix-055.build.mtv1.mozilla.com bld-linux64-ix-056.build.mtv1.mozilla.com bld-linux64-ix-057.build.mtv1.mozilla.com bld-linux64-ix-058.build.mtv1.mozilla.com bld-linux64-ix-059.build.mtv1.mozilla.com bld-linux64-ix-060.build.mtv1.mozilla.com bld-linux64-ix-061.build.mtv1.mozilla.com bld-linux64-ix-062.build.mtv1.mozilla.com bld-linux64-ix-063.build.mtv1.mozilla.com bld-linux64-ix-064.build.mtv1.mozilla.com bld-linux64-ix-065.build.mtv1.mozilla.com bld-linux64-ix-066.build.mtv1.mozilla.com bld-linux64-ix-067.build.mtv1.mozilla.com bld-linux64-ix-068.build.mtv1.mozilla.com bld-linux64-ix-069.build.mtv1.mozilla.com bld-linux64-ix-070.build.mtv1.mozilla.com bld-linux64-ix-071.build.mtv1.mozilla.com bld-linux64-ix-072.build.mtv1.mozilla.com bld-linux64-ix-073.build.mtv1.mozilla.com Foopy124, is still not cooperating, the green indicator light is still not responsive. We troubleshooted the equipment by resetting,swapping the ethernet cable, plugging it in another port. As well as, opening it up and check if anything was lose when we were upgrading it.
Comment 15•11 years ago
|
||
Okay, so all of the ones done last week and this week look good except: foopy124 - I can log into the mgmt console, but when I power the machine on, it doesn't even get a display. That means that either something needs to be reseated, or the magic smoke got out. foopy122 - I can get to the running OS but can't get to the IPMI. May be it just needs to have the power unplugged and plugged back in, or it may be a bad cable or switch port?
Comment 16•11 years ago
|
||
foopy122 has bad IPMI, foopy124 had a bad DIMM and is back online.
Comment 17•11 years ago
|
||
The rest of these machines are now out of warranty will be decommissioned when their current purpose is fulfilled. They will not be moving to scl3.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Assignee: server-ops → server-ops-dcops
Comment 18•10 years ago
|
||
It turns out we're not going to decommission the iX machines after all, so reopening this bug to get these machines hardware upgraded and moved to scl3: mw32-ix-slave01.build.mtv1.mozilla.com mw32-ix-slave02.build.mtv1.mozilla.com mw32-ix-slave03.build.mtv1.mozilla.com mw32-ix-slave04.build.mtv1.mozilla.com mw32-ix-slave05.build.mtv1.mozilla.com mw32-ix-slave06.build.mtv1.mozilla.com mw32-ix-slave07.build.mtv1.mozilla.com mw32-ix-slave08.build.mtv1.mozilla.com mw32-ix-slave09.build.mtv1.mozilla.com mw32-ix-slave10.build.mtv1.mozilla.com mw32-ix-slave11.build.mtv1.mozilla.com mw32-ix-slave12.build.mtv1.mozilla.com win32-ix-ref.build.mtv1.mozilla.com linux-ix-ref.build.mtv1.mozilla.com
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 19•10 years ago
|
||
The primary nics of the machines in comment 18 should be put on vlan 236 in scl3 when they are moved. The management nics should be on vlan 216.
Comment 20•10 years ago
|
||
replaced heat sinks, seems we're short one heat sink for asset-03107, which we'll try to procure.
Comment 21•10 years ago
|
||
ServiceNow order submitted. REQ0021153
Comment 22•10 years ago
|
||
RITM0022787
Updated•10 years ago
|
colo-trip: mtv1 → scl3
Comment 23•10 years ago
|
||
van: we have three more machines that need upgrades (bug 948997). Are we lacking the parts for those? If so, how much will it cost to buy new parts (it may not be worth it)?
Updated•10 years ago
|
Flags: needinfo?(vle)
Comment 24•10 years ago
|
||
:arr, we are short on heat sinks so we would have to order 3 more. they're $35 each before taxes/shipping. http://www.heatsinkfactory.com/cooljag-den-7-cpu-cjg-36.html
Flags: needinfo?(vle)
Comment 25•10 years ago
|
||
Okay, let's go ahead and order the parts in expectation that we're going to do these last 3 machines soon after we wrap up the tegra move.
Comment 26•10 years ago
|
||
Order has been placed through servicenow. RITM0022977
Whiteboard: [reit] → 3 heatsinks ordered RITM0022977
Comment 27•10 years ago
|
||
Added the bug for the last three machines to be upgraded and moved to scl3. That should bring us to a total of 17 machines moving to scl3 and being repurposed as windows2008r2 builders. We're still hashing out the hostnames for these, but please put all the primary nics on VLAN236 (winbuild) https://inventory.mozilla.org/en-US/core/vlan/139/ and the oob interfaces on VLAN216 (inband) https://inventory.mozilla.org/en-US/core/vlan/138/
Blocks: 948997
Comment 28•10 years ago
|
||
I've started a spreadsheet to track the move for these: https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0AhyKG0L2cstIdEMzd2RoMk1UaHg4Ty10Q1NHQXNzd0E I noticed that there are TWO switch ports listed for the primary nic, and I presume that's either an error or we mistakenly cabled up the secondary nic on those as well. If we did the latter, that's unnecessary, since we only use the primary and not secondary nic. Please make sure that we're not wasting cable and switches there if so. I put in switch1.r202-10.console.scl3.mozilla.net:ge-0/0/<port> in the spreadsheet to update inventory, so if that's incorrect, please let me know and update that column. I've also added the last three systems so that you guys can fill in switch ports, rack, rack order, and oob switch. I'm going to work with uberj to get all of the information updated in inventory based on this ss.
Comment 29•10 years ago
|
||
Inventory/DNS/DHCP has been updated for the 15 machines that are already in scl3. dcops: can someone verify that they're on the correct vlans and power cycle them so that the oob interfaces are pingable?
Comment 30•10 years ago
|
||
uber: can you take a look at amy's spreadsheet? i have updated the location and switch info for the last 3 hosts. please also note that i have updated the name of the switch, giving it the FQDN and the correct name. arr: inband mgmt switch and host switch ports have been configured. let me know if you're having any issues. vle@switch1.r202-10.ops.releng.scl3.mozilla.net# show member-range ge-0/0/23 to ge-0/0/36; member-range ge-0/0/7 to ge-0/0/9; unit 0 { family ethernet-switching { port-mode access; vlan { members releng-winbuild; } } }
Comment 31•10 years ago
|
||
:uberj: I also had to change the FQDN of the ipmi since it was missing the "releng" atom (so the SREG, A, PTR, and CNAMEs will need to be updated for that.
Comment 32•10 years ago
|
||
:van: I am unable to reach the ipmi interfaces for 0002, 0011, and 0014. :uberj: might you get a chance to update the information from the spreadsheet and add the CNAMEs today?
Flags: needinfo?(juber)
Comment 33•10 years ago
|
||
:van: also, 9 doesn't seem to see its disk, and 13 doesn't look like it's powering on.
Comment 34•10 years ago
|
||
I've updated info and created cnames. Just need to track down macs for https://inventory.mozilla.org/en-US/systems/show/1645/ https://inventory.mozilla.org/en-US/systems/show/1644/ and https://inventory.mozilla.org/en-US/systems/show/1643/
Comment 35•10 years ago
|
||
:arr, [0002,0011,0014] are back online. 0013 was hung during the reimage phase and ive rebooted it. 0009 has a bad disk that is no longer spinning up. can we replace it with any spare drive(we have spare 1tbs) or does it have to match the specs of the other iX hosts? we moved [0017-0019] and i've confirmed IPMI is reachable.
Comment 36•10 years ago
|
||
>we moved [0017-0019] and i've confirmed IPMI is reachable.
I meant we moved [0015-0017].
Comment 37•10 years ago
|
||
van: the drive for 0009 should match specs, please order and replace. and I can't reach ipmi for 0015
Comment 38•10 years ago
|
||
#UJU-709-53233 opened for drive RMA/quote. IPMI fixed for 15 and 17.
Comment 40•10 years ago
|
||
following up with iX regarding hard drive for 009.
Updated•10 years ago
|
Whiteboard: 3 heatsinks ordered RITM0022977 → #UJU-709-53233 - replacement drive for 009
Comment 41•10 years ago
|
||
heat sinks upgraded on hosts and moved to SCL3. we're running into an issue with 0009 as it wont image after drive swap. i have opened Bug 964535 for relops to take a look at "The Dirty Environment" error. closing bug as we have a different bug to track 0009.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•