Closed
Bug 851579
Opened 11 years ago
Closed 11 years ago
Please reimage mv-moz2-linux-ix-slave[05-23] as linux cent6.2 kickstartable foopies
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Assigned: arich)
References
Details
(Whiteboard: [reit])
These 3 foopies, at least, have been failing rm -rf * stuff many times recently, and usually a bunch of them at once. I suspect a bad wIO due to the OSX version, but I'm unsure if that is true. Currently we can't verify wIO on these because gmond is unable to chart it. Eventual goal is to reimage all Mac Foopies as Linux Foopies, (prior to being able to move to newer hardware, as we use for the panda racks). This will help us get to a sane, single-platform for our foopy-based work. Once you have an ETA for doing this work, let me know so I can make sure the machines are not actively taking jobs.
Assignee | ||
Comment 1•11 years ago
|
||
We can re-use one of the iX systems that's slated to be retasked as a mock builder if you want centos 6 (this would be trivial). This is what I suggested in the planning meeting this week. We don't have the time/manpower to create a completely new imaging procedure and OS for a few minis we're going to repurpose at the end of the year (at the latest), though.
Assignee | ||
Updated•11 years ago
|
Assignee: server-ops-releng → arich
Updated•11 years ago
|
Whiteboard: [reit]
Comment 2•11 years ago
|
||
After talking with Amy about the difficulties this would present in terms of imaging, we're going to use some of the iX machines in mtv instead. Morphing this bug to reflect that. Amy: I just wrote https://bugzilla.mozilla.org/show_bug.cgi?id=847529#c8 but is there a different procedure for imaging foopys vs other centos6 machines, i.e. should the machines destined to become foopys be removed entirely from but 847529 and be dealt with separately here?
Assignee | ||
Comment 3•11 years ago
|
||
These will be foopy109 - foopy127. I'm doing a kickstart of foopy125 now.
Assignee | ||
Comment 4•11 years ago
|
||
So far I've done: foopy119.build.mtv1.mozilla.com foopy120.build.mtv1.mozilla.com foopy121.build.mtv1.mozilla.com foopy123.build.mtv1.mozilla.com foopy125.build.mtv1.mozilla.com foopy126.build.mtv1.mozilla.com foopy127.build.mtv1.mozilla.com
Assignee | ||
Comment 5•11 years ago
|
||
The following are also done now: foopy109.build.mtv1.mozilla.com foopy110.build.mtv1.mozilla.com foopy111.build.mtv1.mozilla.com foopy112.build.mtv1.mozilla.com foopy113.build.mtv1.mozilla.com foopy114.build.mtv1.mozilla.com foopy115.build.mtv1.mozilla.com foopy116.build.mtv1.mozilla.com foopy117.build.mtv1.mozilla.com foopy118.build.mtv1.mozilla.com Still waiting on hands on help from dcops for foopy122.build.mtv1.mozilla.com foopy124.build.mtv1.mozilla.com
Reporter | ||
Comment 6•11 years ago
|
||
moving as follows: Mac foopies foopy05 --> foopy109 foopy06 --> foopy110 foopy07 --> foopy111 foopy08 --> foopy112 foopy09 --> foopy113 foopy10 --> foopy114 foopy11 --> foopy115 foopy12 --> foopy116 foopy13 --> foopy117 foopy14 --> foopy118 foopy15 --> foopy119 foopy16 --> foopy120 foopy17 --> foopy121 foopy18 --> foopy122****on hold**** foopy19 --> foopy123 foopy20 --> foopy124****on hold**** foopy21μ--> <--> foopy22 --> foopy125 foopy23 --> foopy126 foopy24 --> foopy127 μ (decommissioned already) with the following command (from the foopies, with my own ssh agent forwarded to make life easier) tar -c /builds/{tegra,panda}-*/{buildbot.tac,*.flg} | ssh root@foopy$foopy.build.mozilla.org "tar x"; ssh root@foopy$foopy.build.mozilla.org "mv builds/* /builds/; rm -rf builds; chown -R cltbld.cltbld /builds/tegra*"; for i in /builds/tegra*/; do touch $i/disabled.flg; done this basically took the *.flg's (including disabled) along and created the dir's on the new foopies. I then waited ~10 minutes (for the old slaves to shutdown), started the masters, then started watch_devices on the new foopies and stopped watch_devices on the old I then copied to the new foopies the ssh key needed to push the state logs to the tegra dashboard. and rm -rf the device dir's on their old foopies. Lastly landed http://hg.mozilla.org/build/tools/rev/706c0231416f for the devices.json update. Which I updated the repo on all non scl1 foopies. --- foopy18 and 20 are now the only remaining mac foopies
Reporter | ||
Comment 7•11 years ago
|
||
(In reply to Amy Rich [:arich] [:arr] from comment #5) > Still waiting on hands on help from dcops for > > foopy122.build.mtv1.mozilla.com > foopy124.build.mtv1.mozilla.com Amy, To help me with planning, do we have an ETA on getting these ones addressed yet?
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(arich)
Assignee | ||
Comment 8•11 years ago
|
||
I'm still waiting on DCOps to take a look at them, and they've been swamped with all the new releng hardware coming into scl3 as well as trying to get the power stuff done in scl1 for the additional mac minis. Their work is being tracked in bug 712456.
Depends on: 712456
Flags: needinfo?(arich)
Assignee | ||
Comment 9•11 years ago
|
||
Did the other 17 machines work as foopies? I'm also wondering if we can spread out the load of the last two minis over the existing iX machines (if they can handle a higher load).
Assignee | ||
Comment 10•11 years ago
|
||
foopy124 had a bad DIMM which has been replaced. foopy122 still doesn't answer on the IPMI interface (van tried various things to get it to stay up), but it's been imaged by van from the console, so it's up and running without IPMI (which, hey, no worse than the mac foopies). I had also opened bug 853835 for foopy127 which went down unexpectedly with a ton of network errors. The cable has been swapped out and the machine rebooted. I haven't seen any errors in ifconfig eth0 since.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 11•11 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #6) > foopy18 and 20 are now the only remaining mac foopies These are now done!
Reporter | ||
Comment 12•11 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #11) > (In reply to Justin Wood (:Callek) from comment #6) > > foopy18 and 20 are now the only remaining mac foopies > > These are now done! well except for the devices.json (which the dashboard uses) http://hg.mozilla.org/build/tools/rev/032e79f19a88
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•