Closed Bug 564658 Opened 14 years ago Closed 14 years ago

Incomplete Mac talos slaves prematurely put in production

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: nthomas)

References

Details

As bug 564221 comment 5 notes, the 10.6 slaves added by bug 564221 and the 10.5 slaves added by bug 564230, talos-r3-snow-{021...050} and talos-r3-leopard-{041...050}, are not yet production ready, at the very least because they lack hg.

However, as

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273369332.1273369433.31907.gz
Rev3 MacOSX Leopard 10.5.8 mozilla-central opt test crashtest on 2010/05/08 18:42:12
s: talos-r3-leopard-043

and

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273368239.1273368242.28576.gz
Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central debug test crashtest on 2010/05/08 18:23:59
s: talos-r3-snow-045

show, they are already in production, taking test runs, and burning when they try to clone build tools without having hg installed to do so.

This is the remaining thing keeping the tree closed, in the 48th hour of this closure.
The issue here is probably bug 564565. Some of these slaves haven't updated from the base reference image using puppet, so they're missing important things like hg.

Of talos-r3-leopard-{041...050}, the ones without a clean puppet run were 43,45
47, and 49. Those are now updated so I've left them in play.
Assignee: nobody → nrthomas
Status: NEW → ASSIGNED
One second thoughts, having some network issues myself, and the connectivity to the nfs mount is doesn't seem reliable enough to have these in service. So I've disabled buildbot by renaming the tac file to buildbot.tac.off. The exceptions are
* talos-r3-leopard-050
* talos-r3-snow-021 thru 030
which I can't raise via ssh. Nagios thinks the snow leopard boxes are down too, but isn't configured for the leopard ones yet.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Summary: Unready Mac talos slaves prematurely put in production → Incomplete Mac talos slaves prematurely put in production
Blocks: 557294
I imagine maintaining an nfs mount in the face of bug 555794 comment #117 isn't very easy. So the scope would be all the slaves in Castro using puppet - the failure is just much more obvious in new ones. (Translation: don't deploy anything using puppet until the network is functioning better)
I'm surprised this happened....Puppet isn't supposed to exit until it successfully syncs up. I filed bug 564778 on it.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.