Closed Bug 709591 Opened 13 years ago Closed 13 years ago

most linux64 builders not running

Categories

(Release Engineering :: General, defect, P2)

x86_64
Linux

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: coop)

Details

(Whiteboard: [buildduty])

Attachments

(1 file)

These appear to be stuck in puppet.  Puppet is stuck due to

Dec 11 08:29:24 scl-production-puppet-new puppetmasterd[3663]: Could not parse for environment production: Node talos-r3-leopard-061 is already defined at /etc/puppet/manifests/scl-production.pp:1127; cannot redefine at /etc/puppet/manifests/scl-production.pp:1131

which was landed in bug 683734.
Attached patch m709591.patchSplinter Review
Assignee: server-ops-releng → djmitche
Landed as a break-fix patch.  scl-production-puppet is getting hammered, but hosts seem to be coming back, if slowly.

Over to releng to monitor and close when this is confirmed fixed.
Assignee: djmitche → nobody
Severity: normal → major
QA Contact: zandr → release
It wasn't coming back - it was just thrashing.

I used iptables to firewall off all but 10.12.48.0/24, and get those puppetized, and I'll keep adding /24's as each one gets finished.  It seems to be recovering this way.
catlee landed a fix this morning (bug 711604). I'll make sure any offline slaves sync-up with puppet and make it back into production.
Assignee: nobody → coop
Status: NEW → ASSIGNED
OS: All → Linux
Priority: -- → P2
Hardware: All → x86_64
Whiteboard: [buildduty]
Oops. I did an initial pass back in December, and another one now. Only known-bad linux64 slaves are currently offline.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: