qm-pleopard-trunk04 and talos-rev2-leopard03 aren't able to reboot via the count_and_reboot.py step. It looks like sudo is asking for a password, which is strange. Neither have DISPLAY set in the environment, which to me says that buildbot isn't running from a terminal properly. I just rebooted both of them manually, and they're still not able to reboot post-talos run. Maybe something's different in how buildbot is started on boot on these machines?
I've seen some issues with leopard machines unable to restart buildbot and have gotten around the issue by adding a cron job that restarts buildbot if it is not found post-reboot. Maybe we could just do something like that here? Going to future for now until we have someone to assign this to.
Component: Release Engineering → Release Engineering: Future
(In reply to comment #2) > I've seen some issues with leopard machines unable to restart buildbot and have > gotten around the issue by adding a cron job that restarts buildbot if it is > not found post-reboot. Hmm, this sounds like this could be the culprit. Maybe we should use the same method as we're using on the build slaves.
Going to try adding <key>StartInterval</key> <integer>600</integer> to the buildbot launch agent on some of the leopard slaves. So far added to talos-rev2-leopard01 (which is in staging, but hasn't shown this problem), and talos-rev2-leopard03 (in production, and has exhibited this problem)
Assignee: nobody → catlee
Component: Release Engineering: Future → Release Engineering
OS: Mac System 9.x → Mac OS X
Priority: -- → P2
All production machines have been updated with this. Still need to update the reference image.
Reference machine has been updated, waiting for new image to be taken (bug 505761)
Depends on: 505761
Done like dinner.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.