Closed
Bug 1193002
Opened 9 years ago
Closed 9 years ago
decommission more pandas/foopies and mobile imaging servers once bug 1183877 lands
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kmoir, Assigned: kmoir)
References
Details
Attachments
(2 files, 2 obsolete files)
40.71 KB,
patch
|
kmoir
:
checked-in+
|
Details | Diff | Splinter Review |
14.32 KB,
patch
|
kmoir
:
review+
kmoir
:
checked-in+
|
Details | Diff | Splinter Review |
reallocate the associated foopies to linux32 test machines and reimage
Assignee | ||
Comment 1•9 years ago
|
||
We are currently having linux32 test capacity issues.
In inventory these machines appear to be the same, do they need any hardware changes (video card?) to run talos tests?
foopy -> iX Systems - iX21X4 2U Neutron
linux32 -> iX Systems - iX21X4 2U Neutron (Releng Talos Config 1)
Flags: needinfo?(arich)
Summary: disable more pandas once bug 1183877 lands → disable more pandas once bug 1183877 lands and reallocate foopies as linux32 test machines
Assignee | ||
Updated•9 years ago
|
Summary: disable more pandas once bug 1183877 lands and reallocate foopies as linux32 test machines → disable more pandas once bug 1183877 lands and reallocate foopies as linux32 talos machines
Comment 2•9 years ago
|
||
foopies and talos machines have different model CPUs:
foopy: model name : Intel(R) Xeon(R) CPU X3470 @ 2.93GHz
talos: model name : Intel(R) Xeon(R) CPU X3450 @ 2.67GHz
Flags: needinfo?(arich)
Comment 3•9 years ago
|
||
see bug 1056139 for the necessary modifications to turn foopies into talos machines.
Comment 4•9 years ago
|
||
Also note that the foopies come out of warranty in November.
Assignee | ||
Comment 5•9 years ago
|
||
:jmaher I don't know if it is worth reallocating the foopies given that their cpus are different from the existing ix machines and their warranties soon expire
Flags: needinfo?(jmaher)
Comment 6•9 years ago
|
||
maybe :bc: could use these for autophone?
Flags: needinfo?(jmaher) → needinfo?(bob)
Comment 7•9 years ago
|
||
The only use I can think of would be as Autophone controllers but we are moving away from using Mac minis as controllers and to using Linux servers.
Flags: needinfo?(bob)
Comment 8•9 years ago
|
||
if the foopies are out of warranty in a few weeks, then we can just shut them off when they are done. Do we have a plan for the webserver that the android tests would use? I know talos uses the webserver, and we still run talos.
Comment 9•9 years ago
|
||
:jmaher: Which web server are you referring to?
Each panda rack has a set of foopies and and imaging server. If we stop using the pandas in that rack, we can shut off and decomm that infrastructure (except for rack 1, since that's where the master imaging server that syncs up with the database lives).
Comment 10•9 years ago
|
||
relengwebadm.private.scl3.mozilla.com, here is a wiki outlining the update process:
https://wiki.mozilla.org/ReleaseEngineering:Buildduty:Other_Duties#Update_mobile_talos_webhosts
I assume this is just a few machines and in due time we can decommission them as well.
Comment 11•9 years ago
|
||
relengweb houses many web services for releng, so it won't be decommissioned any time soon.
Comment 12•9 years ago
|
||
I'm not sure what the plan for talos is going forward and whether or not you'll still need those vhosts, but if not we can delete them. That's a different issue than disposing of this hardware, though.
Assignee | ||
Comment 13•9 years ago
|
||
Just disabled pandas 0022-060, 0082-0306 in slavealloc
Summary: disable more pandas once bug 1183877 lands and reallocate foopies as linux32 talos machines → disable more pandas once bug 1183877 lands and reallocate foopies
Assignee | ||
Comment 14•9 years ago
|
||
patch to decommission pandas 0022-0060, 0082-0306, 0901-0903 0610-618 and foopies:
102-104,39-56
Attachment #8678913 -
Flags: review?(bugspam.Callek)
Comment 15•9 years ago
|
||
Comment on attachment 8678913 [details] [diff] [review]
bug1193002.patch
Review of attachment 8678913 [details] [diff] [review]:
-----------------------------------------------------------------
stamp
Attachment #8678913 -
Flags: review?(bugspam.Callek) → review+
Comment 16•9 years ago
|
||
When we do the next batch (assuming it isn't "all of them"), can we please do them by chassis and clean out entire racks at a time? Getting rid of the rest of p3 and p10 first would be optimal. This will make life much easier for dcops, relops, and netops since we can decomm the mobile imaging servers at the same time if we do a whole rack (thus emptying out the rack and allowing netops to delete the vlan, etc).
Here's the remaining mappings:
panda-relay-027.p3.releng.mozilla.com (303-312)
panda-relay-028.p3.releng.mozilla.com (313-323)
panda-relay-029.p3.releng.mozilla.com (324-334)
panda-relay-030.p3.releng.mozilla.com (335-343, 624-625)
panda-relay-031.p4.releng.mozilla.com (346-356)
panda-relay-032.p4.releng.mozilla.com (357-367)
panda-relay-033.p4.releng.mozilla.com (369-378)
panda-relay-034.p4.releng.mozilla.com (379-389)
panda-relay-035.p4.releng.mozilla.com (390-400)
panda-relay-036.p4.releng.mozilla.com (401-410)
panda-relay-037.p4.releng.mozilla.com (412-422)
panda-relay-038.p4.releng.mozilla.com (423-433)
panda-relay-039.p5.releng.mozilla.com (434-443)
panda-relay-040.p5.releng.mozilla.com (445-455)
panda-relay-041.p5.releng.mozilla.com (456-466, 628)
panda-relay-042.p5.releng.mozilla.com (45, 467-477, 629)
panda-relay-043.p5.releng.mozilla.com (478-488)
panda-relay-044.p5.releng.mozilla.com (491-499)
panda-relay-045.p5.releng.mozilla.com (500-510)
panda-relay-046.p5.releng.mozilla.com (511-521)
panda-relay-047.p6.releng.mozilla.com (522-532)
panda-relay-048.p6.releng.mozilla.com (533-543)
panda-relay-049.p6.releng.mozilla.com (544-554)
panda-relay-050.p6.releng.mozilla.com (555-565)
panda-relay-051.p6.releng.mozilla.com (57, 69, 566-576)
panda-relay-052.p6.releng.mozilla.com (577-585)
panda-relay-053.p6.releng.mozilla.com (589-598, 634)
panda-relay-054.p6.releng.mozilla.com (599-609)
panda-relay-079.p10.releng.mozilla.com (874-884)
panda-relay-080.p10.releng.mozilla.com (887-909)
Updated•9 years ago
|
Summary: disable more pandas once bug 1183877 lands and reallocate foopies → decommission more pandas/foopies and mobile imaging servers once bug 1183877 lands
Assignee | ||
Comment 17•9 years ago
|
||
Attachment #8678913 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → kmoir
Comment 18•9 years ago
|
||
FYI, 0620 - 0629 should also be absent, since they were decommisioned long ago as part of decomming rack p7.
Comment 19•9 years ago
|
||
I missed:
panda-relay-005.p10.releng.scl3.mozilla.com (58-68)
panda-relay-006.p10.releng.scl3.mozilla.com (70-80)
Assignee | ||
Comment 20•9 years ago
|
||
looking at slavealloc pandas-620 to 629 are running jobs so it appears they were not decommed. Perhaps they are assigned to as replacements to racks where other pandas died?
Comment 21•9 years ago
|
||
Apparently someone has been doing that and not updating nagios. :/
I'll add that to my patch in the other bug.
Assignee | ||
Updated•9 years ago
|
Attachment #8679038 -
Flags: checked-in+
Comment 22•9 years ago
|
||
Okay, we've moved the master mobile imaging server to p4, so let's decomm things in that rack LAST and aim for p3 and p10 FIRST if we're doing another partial batch.
Assignee | ||
Comment 23•9 years ago
|
||
Amy I talked to jmaher in the mobile meeting and he said that it is reasonable to expect that talos on autophone be completely implemented by the end of Q1. That being said, we still have a couple of hundred pandas enabled (although a lot of them seem to be in a broken state, perhaps buildduty is not actively rebooting them) that are easily keeping up with the current load. We could disable some more, let me know what racks you would prefer and we can move forward with these changes.
Flags: needinfo?(arich)
Comment 24•9 years ago
|
||
I'm not sure how many you want to disable, but I'd like to go in this order. Being able to finish off entire racks at a time would be great. I've separated them out so that the two half racks we have left are first. I've also made sure that p4 is last, since that's where the master mozpool server is now.
panda-relay-027.p3.releng.mozilla.com (303-312)
panda-relay-028.p3.releng.mozilla.com (313-323)
panda-relay-029.p3.releng.mozilla.com (324-334)
panda-relay-030.p3.releng.mozilla.com (335-343, 624-625)
panda-relay-005.p10.releng.scl3.mozilla.com (58-68)
panda-relay-006.p10.releng.scl3.mozilla.com (70-80)
panda-relay-079.p10.releng.mozilla.com (874-884)
panda-relay-080.p10.releng.mozilla.com (887-909)
panda-relay-039.p5.releng.mozilla.com (434-443)
panda-relay-040.p5.releng.mozilla.com (445-455)
panda-relay-041.p5.releng.mozilla.com (456-466, 628)
panda-relay-042.p5.releng.mozilla.com (45, 467-477, 629)
panda-relay-043.p5.releng.mozilla.com (478-488)
panda-relay-044.p5.releng.mozilla.com (491-499)
panda-relay-045.p5.releng.mozilla.com (500-510)
panda-relay-046.p5.releng.mozilla.com (511-521)
panda-relay-047.p6.releng.mozilla.com (522-532)
panda-relay-048.p6.releng.mozilla.com (533-543)
panda-relay-049.p6.releng.mozilla.com (544-554)
panda-relay-050.p6.releng.mozilla.com (555-565)
panda-relay-051.p6.releng.mozilla.com (57, 69, 566-576)
panda-relay-052.p6.releng.mozilla.com (577-585)
panda-relay-053.p6.releng.mozilla.com (589-598, 634)
panda-relay-054.p6.releng.mozilla.com (599-609)
panda-relay-031.p4.releng.mozilla.com (346-356)
panda-relay-032.p4.releng.mozilla.com (357-367)
panda-relay-033.p4.releng.mozilla.com (369-378)
panda-relay-034.p4.releng.mozilla.com (379-389)
panda-relay-035.p4.releng.mozilla.com (390-400)
panda-relay-036.p4.releng.mozilla.com (401-410)
panda-relay-037.p4.releng.mozilla.com (412-422)
panda-relay-038.p4.releng.mozilla.com (423-433)
Flags: needinfo?(arich)
Assignee | ||
Comment 25•9 years ago
|
||
:vlad or alin
Could you go through the list of pandas that are showing up as orange in slave health
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=panda
and reboot them to try to get the to work. Then we can move forward with patches to disable some more panda racks with a better idea of how many remaining pandas are actually working.
Flags: needinfo?(vlad.ciobancai)
Flags: needinfo?(alin.selagea)
Comment 26•9 years ago
|
||
Because I'm curious like that, I already did the ones that have been idle for more than two weeks, which unsurprisingly resulted in "unreachable" bugs being filed for all of them except the ones that don't have tracking bugs already filed, because bug 1126879.
Do we actually want to make dcops remember how to deal with pandas, which they haven't touched for months, and replace the sd cards in all those ones which are probably saying by being dead that they are the ones we most want to decomm?
Comment 27•9 years ago
|
||
Wups, not quite all, https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=panda&name=panda-0387 actually did reboot.
Assignee | ||
Comment 28•9 years ago
|
||
No, my intention was not to get dcops to deal with pandas and replace sd cards. I was just going to get buildduty look at the existing pool of pandas (which has a huge number that appear to be broken) and reboot them to see if they could come back online so we could have a good idea of how many chassis we can actally decomm.
Comment 29•9 years ago
|
||
Ah, reboot them other than via slaveapi so they don't get unreachables?
Comment 30•9 years ago
|
||
Any buildduty troubleshooting with pandas should be done via mozpool. It tries to perform any corrective meaures (that don't require physical intervention) and will tell you what the failure state is: http://mobile-imaging-004.p4.releng.scl3.mozilla.com/ui/lifeguard.html
There are only 12 panda boards that are showing hardware issues, so I suspect any other issues are not related to the pandas, but the foopies, buildbot, etc. One can always force a reimage of the panda from mozpool if need be.
Comment 31•9 years ago
|
||
Attached the patch to decommission the rest of panda slave regarding to the comment #24
Flags: needinfo?(vlad.ciobancai)
Attachment #8680693 -
Flags: review?(kmoir)
Comment 32•9 years ago
|
||
Updated the patch
Attachment #8680693 -
Attachment is obsolete: true
Attachment #8680693 -
Flags: review?(kmoir)
Attachment #8680715 -
Flags: review?(kmoir)
Assignee | ||
Comment 33•9 years ago
|
||
pandas from bug1193002.patch removed from slavealloc db
Updated•9 years ago
|
Flags: needinfo?(alin.selagea)
Assignee | ||
Updated•9 years ago
|
Attachment #8680715 -
Flags: review?(kmoir) → review+
Comment 34•9 years ago
|
||
(In reply to Vlad Ciobancai [:vladC] from comment #32)
> Created attachment 8680715 [details] [diff] [review]
> bug1193002_v2.patch
>
> Updated the patch
Disabled all the panda's from the above patch in slave alloc
Assignee | ||
Updated•9 years ago
|
Attachment #8680715 -
Flags: checked-in+
Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•