Some leopard boxes not rebooting properly

RESOLVED FIXED

Status

Release Engineering
General
P2
normal
RESOLVED FIXED
9 years ago
5 years ago

People

(Reporter: catlee, Assigned: catlee)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Comment hidden (empty)
(Assignee)

Comment 1

9 years ago
qm-pleopard-trunk04 and talos-rev2-leopard03 aren't able to reboot via the count_and_reboot.py step.  It looks like sudo is asking for a password, which is strange.

Neither have DISPLAY set in the environment, which to me says that buildbot isn't running from a terminal properly.

I just rebooted both of them manually, and they're still not able to reboot post-talos run.

Maybe something's different in how buildbot is started on boot on these machines?
I've seen some issues with leopard machines unable to restart buildbot and have gotten around the issue by adding a cron job that restarts buildbot if it is not found post-reboot.

Maybe we could just do something like that here?

Going to future for now until we have someone to assign this to.
Component: Release Engineering → Release Engineering: Future
(Assignee)

Comment 3

9 years ago
(In reply to comment #2)
> I've seen some issues with leopard machines unable to restart buildbot and have
> gotten around the issue by adding a cron job that restarts buildbot if it is
> not found post-reboot.

Hmm, this sounds like this could be the culprit.  Maybe we should use the same method as we're using on the build slaves.
(Assignee)

Comment 4

9 years ago
Going to try adding
        <key>StartInterval</key>
        <integer>600</integer>

to the buildbot launch agent on some of the leopard slaves.

So far added to talos-rev2-leopard01 (which is in staging, but hasn't shown this problem), and talos-rev2-leopard03 (in production, and has exhibited this problem)
(Assignee)

Updated

9 years ago
Assignee: nobody → catlee
Component: Release Engineering: Future → Release Engineering
OS: Mac System 9.x → Mac OS X
Priority: -- → P2
(Assignee)

Comment 5

9 years ago
All production machines have been updated with this.

Still need to update the reference image.
(Assignee)

Comment 6

9 years ago
Reference machine has been updated, waiting for new image to be taken (bug 505761)
Depends on: 505761
(Assignee)

Comment 7

9 years ago
Done like dinner.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.