Closed Bug 660080 Opened 9 years ago Closed 6 years ago

Rethink Rebooting

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1028191

People

(Reporter: dustin, Unassigned)

References

Details

(Whiteboard: [buildslaves])

From an earlier email thread:

Our current process for rebooting slaves at the end of a build is
causing multiple headaches:

 * masters tend to get stuck thinking the slave is still around - this
is worse with slavealloc, since the slave will not necessarily
re-connect to the same master.  Catlee saw this today.  I think it has
to do with the slave powering off without terminating the buildslave
process or even the TCP connection

 * snow and leopard slaves have been failing to reboot but killing the
buildslave process lately (bug 648665)

 * The pidfiles left around when the buildslave process does not shut
down cleanly cause problems on startup (bug 652847)

 * Where to get count_and_reboot.py from is problematic (bug 646580, bug 659344)

 * buildbot-start monitoring, and in fact the whole approach to ensuring slaves are up to date and healthy, requires frequent slave reboots (so bug 633277 will be WONTFIX'd sooner or later).

I think this may also be responsible for some of our hung reconfigs, but
I can't prove that.  Honestly, I can't prove any of the above.

The proposal is this:

When buildslave-0.8.4pre-moz1 is completely deployed, it ships with
the Idleizer, which means it has innate knowledge of how to reboot the
machine.  This could be trivially expanded so that a custom command sets
a "reboot immediately after next disconnect" flag on the Idleizer.  Then
the DisconnectStep would be all that's required to reboot the slave.

Armen, given you've filed a number of reboot-related bugs recently, do you want to work on this?
And in bug 660059, we're looking at making the reboot more verbose.
Blocks: 660059
(In reply to comment #0)
> Armen, given you've filed a number of reboot-related bugs recently, do you
> want to work on this?

No, not necessarily.
Once the right version of buildbot is deployed everywhere I would be more easily persuaded.
Priority: -- → P3
Whiteboard: [buildslaves]
Product: mozilla.org → Release Engineering
Component: Other → General Automation
QA Contact: catlee
Duping forward to bug 1028191.

If we can use Idleizer for this, great! I think the behaviour we want is basically: run buildbot until either
a) idle after some small amount of time (30 minutes?)
b) after the current job is done

I think this could be accomplished by triggering the graceful shutdown process 30 minutes after the slave starts.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1028191
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.