Closed Bug 635964 Opened 14 years ago Closed 14 years ago

re-purpose 16 darwin9 minis for darwin10 service

Categories

(Release Engineering :: General, defect, P4)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

(Whiteboard: [slaveduty][buildslaves])

Attachments

(4 files, 1 obsolete file)

The following slaves were killed in bug 604497 (which is secret for unrelated reasons), and will be brought back up as 10.6 builders and try systems. moz2-darwin9-slave29 moz2-darwin9-slave30 moz2-darwin9-slave31 moz2-darwin9-slave32 moz2-darwin9-slave33 moz2-darwin9-slave34 moz2-darwin9-slave35 moz2-darwin9-slave36 moz2-darwin9-slave37 try-mac-slave20 try-mac-slave21 try-mac-slave22 try-mac-slave23 try-mac-slave24 try-mac-slave25 try-mac-slave26 try-mac-slave27 try-mac-slave28 try-mac-slave29 Spencer is currently cracking them and adding RAM in the IT area.
As well as the RAM upgrade, these machines need OS 10.6 reimaging, and being renamed to: moz2-darwin10-slave{53...61} try-mac64-slave(27...36}
Following mac minis re-imaged and had their ram upgraded from 2gb to 4 gb QM-PLEOPARD-TRY12 QM-PLEOPARD-TRY13 QM-PLEOPARD-TRY14 QM-PLEOPARD-TRY15 QM-PLEOPARD-TRY16 QM-PUBUNTU-TRY12 could not upgrade ram(screw is stripped) QM-PUBUNTU-TRY13 QM-PUBUNTU-TRY14 QM-PUBUNTU-TRY15 QM-PUBUNTU-TRY16 MOZ2-DRAWIN9-SLAVE30 MOZ2-DRAWIN9-SLAVE29 MOZ2-DRAWIN9-SLAVE31 MOZ2-DRAWIN9-SLAVE32 MOZ2-DRAWIN9-SLAVE33 MOZ2-DRAWIN9-SLAVE34 MOZ2-DRAWIN9-SLAVE35 MOZ2-DRAWIN9-SLAVE36 MOZ2-DRAWIN9-SLAVE37 ZACK-TESTING
(In reply to comment #2) > Following mac minis re-imaged and had their ram upgraded from 2gb to 4 gb I'm assuming these are old names for the machines? > QM-PLEOPARD-TRY12 > QM-PLEOPARD-TRY13 > QM-PLEOPARD-TRY14 > QM-PLEOPARD-TRY15 > QM-PLEOPARD-TRY16 > QM-PUBUNTU-TRY12 could not upgrade ram(screw is stripped) > QM-PUBUNTU-TRY13 > QM-PUBUNTU-TRY14 > QM-PUBUNTU-TRY15 > QM-PUBUNTU-TRY16 So the above came from another pool of unused minis? > MOZ2-DRAWIN9-SLAVE30 > MOZ2-DRAWIN9-SLAVE29 > MOZ2-DRAWIN9-SLAVE31 > MOZ2-DRAWIN9-SLAVE32 > MOZ2-DRAWIN9-SLAVE33 > MOZ2-DRAWIN9-SLAVE34 > MOZ2-DRAWIN9-SLAVE35 > MOZ2-DRAWIN9-SLAVE36 > MOZ2-DRAWIN9-SLAVE37 Presumably those are the same moz2-darwin9-slaveNN mentioned in comment 0. > ZACK-TESTING and this was in our inventory already. These have been reimaged, but what names have they been given? And what happened to the try-mac-slaveNN mentioned in comment 0? Did you do any additional diagnostics on try-mac-slave28?
The QM- hostnames were on the front of the machines, those are actually the try-mac-slave machines listed in comment 0. Spencer, you'll find them in inventory under the try-mac names that are on the backs of the machines. Please give them new names per comment 1 and update inventory accordingly. As to try-mac-slave28, let's see what happens post-imaging.
(In reply to comment #4) > As to try-mac-slave28, let's see what happens post-imaging. iirc, this slave was also causing grief before imaging. Maybe it needs to go to greener pastures?
zandr/ssh are on top of it.
all macs have been re-named, re-labeled and updated in inventory moz2-darwin10-slave{53...61} try-mac64-slave(27...36} John said he had trouble trying to ssh into TRY-MAC64-SLAVE35
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Thanks, spencer! Now we need to bring these back up and attach them to masters - a slaveduty job.
Assignee: shui → server-ops-releng
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
oops, back to the releng component for bear.
Assignee: server-ops-releng → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
Power and network would be a good first step, actually. /me goes back to the server closet to make that happen.
Is nagios updated to monitor these new slaves? Can we set that up once they're powered up? New bug, or on this one?
Found in triage with zandr: (In reply to comment #10) > Power and network would be a good first step, actually. /me goes back to the > server closet to make that happen. 1) instead of putting in new server room, will just put back on shelf in QA lab, and connect to power/network there. 2) nagios not yet done. Not sure why this in RelEng, moving to IT to resolve.
Assignee: nobody → shui
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
Whiteboard: [slaveduty][buildslaves] → [slaveduty][buildslaves][subject to embargo]
Powered up and networked in a much neater AFK. It's friday at 5, I'll investigate dns/dhcp later.
I started looking at these Friday night, and inventory/dns/hostname are inconsistent, with at least two missing from inventory altogether. Spencer, could you verify each of moz2-darwin10-slave{53...61} try-mac64-slave(27...36} And make sure that each machine is: * in inventory with a correct IP address * in DNS with a correct IP address and hostname * has its hostname set to match * is labeled to match the above. Since everything else is software, I'd start by comparing labels to MAC addresses on the physical boxes, and work through DHCP to DNS to inventory.
OK, verification complete. Inventory/DNS/DHCP updated and verified against hostname and physical labeling. Over to Releng for setup of new slaves: moz2-darwin10-slave53-61 try-mac64-slave27,29-36 NB: There is no try-mac-64-slave28. That machine (the former try-mac-slave29) has a stripped screw that is currently preventing upgrade.
Assignee: shui → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
Status: REOPENED → NEW
Priority: -- → P3
This is still waiting for setup. Bump the priority if we're seeing wait times that suggest these slaves will be useful.
Priority: P3 → P4
I'll get puppet set up, and leave the re-mastering to later.
Assignee: nobody → dustin
Attachment #521859 - Flags: review?(bhearsum)
New version with the old machines removed
Attachment #521859 - Attachment is obsolete: true
Attachment #521859 - Flags: review?(bhearsum)
Attachment #521865 - Flags: review?(bhearsum)
Attachment #521865 - Flags: review?(bhearsum) → review+
OK, these machines are all talking to mv-production-puppet now, but not doing any buildbottish work.
This should add these machines to the staging configuration, for testing purposes.
Attachment #521905 - Flags: review?(catlee)
I'm seeing Mar 25 12:51:07 production-puppet puppetmasterd[19135]: Allowing unauthenticated client try-mac64-slave38.build.mozilla.org(10.250.48.198) access to puppetca.getcert Mar 25 12:51:07 production-puppet puppetmasterd[19135]: Certificate request does not match existing certificate; run 'puppetca --clean try-mac-slave28.build.mozilla.org'. In the mpt-production-puppet logs. Is this the missing try-mac64-slave28? Or the previously-problematic try-mac-slave28? Or both? 198.48.250.10.in-addr.arpa domain name pointer try-mac64-slave38.build.mozilla.org. No forward DNS for either hostname ? (10.250.48.198) at 00:16:CB:B0:75:23 [ether] on eth0
Assignee: dustin → nobody
Attachment #521905 - Flags: review?(catlee) → review+
Attachment #521905 - Flags: checked-in+
Assignee: nobody → dustin
moz2-darwin10-slave53-61 are up and running on sm02 (via slave alloc, at that!)
try-mac64-slave27,29-36 are up and running on sm02 as well.
/etc/hosts entries for the re-tasked hosts removed from bm-admin01, and their nagios configuration modified.
Let's hold off on moving these to production for a moment - joduinn is suggesting re-re-purposing them for developers, since we're now over-capacity on these builders.
Attachment #523403 - Flags: review?(dustin)
Attachment #523403 - Flags: review?(dustin) → review+
As of bug 647051, this bug is now only about moz2-darwin10-slave53 moz2-darwin10-slave54 moz2-darwin10-slave55 moz2-darwin10-slave56 try-mac64-slave27 try-mac64-slave29 try-mac64-slave30 try-mac64-slave31 which are all still in AFK, powered on, and in staging.
Attachment #523641 - Flags: review?(dustin) → review+
aki, if I switch the ssh keys and select a production pool in slavealloc, are the production masters ready to welcome these new slaves?
Whiteboard: [slaveduty][buildslaves][subject to embargo] → [slaveduty][buildslaves]
In production now, via slavealloc (keys changed too, don't worry): moz2-darwin10-slave53 moz2-darwin10-slave54 moz2-darwin10-slave55 moz2-darwin10-slave56 try-mac64-slave27 try-mac64-slave29 try-mac64-slave30 try-mac64-slave31
Status: NEW → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
? (10.250.48.198) at 00:16:CB:B0:75:23 [ether] on eth0 This is physically labeled as try-mac64-slave28, and forward DNS is correct for this host. Reverse, not so much. Fixed.
^^ copied to bug 652983
I am removing from our configs and from slave-alloc many of these slaves in bug 700705.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: