Closed Bug 851579 Opened 11 years ago Closed 11 years ago

Please reimage mv-moz2-linux-ix-slave[05-23] as linux cent6.2 kickstartable foopies

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86_64
Windows 7
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Callek, Assigned: arich)

References

Details

(Whiteboard: [reit])

These 3 foopies, at least, have been failing rm -rf * stuff many times recently, and usually a bunch of them at once.

I suspect a bad wIO due to the OSX version, but I'm unsure if that is true.

Currently we can't verify wIO on these because gmond is unable to chart it.

Eventual goal is to reimage all Mac Foopies as Linux Foopies, (prior to being able to move to newer hardware, as we use for the panda racks).

This will help us get to a sane, single-platform for our foopy-based work.

Once you have an ETA for doing this work, let me know so I can make sure the machines are not actively taking jobs.
We can re-use one of the iX systems that's slated to be retasked as a mock builder if you want centos 6 (this would be trivial).  This is what I suggested in the planning meeting this week.

We don't have the time/manpower to create a completely new imaging procedure and OS for a few minis we're going to repurpose at the end of the year (at the latest), though.
Assignee: server-ops-releng → arich
Whiteboard: [reit]
After talking with Amy about the difficulties this would present in terms of imaging, we're going to use some of the iX machines in mtv instead. Morphing this bug to reflect that.

Amy: I just wrote https://bugzilla.mozilla.org/show_bug.cgi?id=847529#c8 but is there a different procedure for imaging foopys vs other centos6 machines, i.e. should the machines destined to become foopys be removed entirely from but 847529 and be dealt with separately here?
No longer blocks: 839375, 843918, 851562
Depends on: 721456, 847529
Summary: Please reimage foopy19, 20, 24 as linux cent6.2 kickstartable foopies → Please reimage mv-moz2-linux-ix-slave[05-23] as linux cent6.2 kickstartable foopies
These will be foopy109 - foopy127.  I'm doing a kickstart of foopy125 now.
So far I've done:

foopy119.build.mtv1.mozilla.com
foopy120.build.mtv1.mozilla.com
foopy121.build.mtv1.mozilla.com

foopy123.build.mtv1.mozilla.com

foopy125.build.mtv1.mozilla.com
foopy126.build.mtv1.mozilla.com
foopy127.build.mtv1.mozilla.com
The following are also done now:

foopy109.build.mtv1.mozilla.com
foopy110.build.mtv1.mozilla.com
foopy111.build.mtv1.mozilla.com
foopy112.build.mtv1.mozilla.com
foopy113.build.mtv1.mozilla.com
foopy114.build.mtv1.mozilla.com
foopy115.build.mtv1.mozilla.com
foopy116.build.mtv1.mozilla.com
foopy117.build.mtv1.mozilla.com
foopy118.build.mtv1.mozilla.com

Still waiting on hands on help from dcops for 

foopy122.build.mtv1.mozilla.com
foopy124.build.mtv1.mozilla.com
moving as follows:

Mac foopies
foopy05 --> foopy109
foopy06 --> foopy110
foopy07 --> foopy111
foopy08 -->  foopy112
foopy09 -->  foopy113
foopy10 -->  foopy114
foopy11 --> foopy115
foopy12 -->  foopy116
foopy13 --> foopy117
foopy14 -->  foopy118
foopy15 -->  foopy119
foopy16 -->  foopy120
foopy17 --> foopy121
foopy18 -->  foopy122****on hold****
foopy19 --> foopy123
foopy20 -->  foopy124****on hold****
foopy21μ--> <-->
foopy22 -->  foopy125
foopy23 -->  foopy126
foopy24 -->  foopy127

μ (decommissioned already)

with the following command (from the foopies, with my own ssh agent forwarded to make life easier)

tar -c /builds/{tegra,panda}-*/{buildbot.tac,*.flg} | ssh root@foopy$foopy.build.mozilla.org "tar x"; ssh root@foopy$foopy.build.mozilla.org "mv builds/* /builds/; rm -rf builds; chown -R cltbld.cltbld /builds/tegra*"; for i in /builds/tegra*/; do touch $i/disabled.flg; done

this basically took the *.flg's (including disabled) along and created the dir's on the new foopies.

I then waited ~10 minutes (for the old slaves to shutdown), started the masters, then started watch_devices on the new foopies and stopped watch_devices on the old

I then copied to the new foopies the ssh key needed to push the state logs to the tegra dashboard. and rm -rf the device dir's on their old foopies.

Lastly landed http://hg.mozilla.org/build/tools/rev/706c0231416f for the devices.json update. Which I updated the repo on all non scl1 foopies.

---

foopy18 and 20 are now the only remaining mac foopies
(In reply to Amy Rich [:arich] [:arr] from comment #5)
> Still waiting on hands on help from dcops for 
> 
> foopy122.build.mtv1.mozilla.com
> foopy124.build.mtv1.mozilla.com

Amy,

To help me with planning, do we have an ETA on getting these ones addressed yet?
Flags: needinfo?(arich)
I'm still waiting on DCOps to take a look at them, and they've been swamped with all the new releng hardware coming into scl3 as well as trying to get the power stuff done in scl1 for the additional mac minis.  Their work is being tracked in bug 712456.
Depends on: 712456
Flags: needinfo?(arich)
Did the other 17 machines work as foopies?  I'm also wondering if we can spread out the load of the last two minis over the existing iX machines (if they can handle a higher load).
foopy124 had a bad DIMM which has been replaced.

foopy122 still doesn't answer on the IPMI interface (van tried various things to get it to stay up), but it's been imaged by van from the console, so it's up and running without IPMI (which, hey, no worse than the mac foopies).

I had also opened bug 853835 for foopy127 which went down unexpectedly with a ton of network errors.  The cable has been swapped out and the machine rebooted.  I haven't seen any errors in ifconfig eth0 since.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Blocks: 848983
Blocks: 848995
Blocks: 848992
(In reply to Justin Wood (:Callek) from comment #6)
> foopy18 and 20 are now the only remaining mac foopies

These are now done!
(In reply to Justin Wood (:Callek) from comment #11)
> (In reply to Justin Wood (:Callek) from comment #6)
> > foopy18 and 20 are now the only remaining mac foopies
> 
> These are now done!

well except for the devices.json (which the dashboard uses) http://hg.mozilla.org/build/tools/rev/032e79f19a88
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.