Closed
Bug 576158
Opened 15 years ago
Closed 15 years ago
reboot linux slaves after 10 hours of inactivity
Categories
(Release Engineering :: General, defect, P3)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 565397
People
(Reporter: jhford, Unassigned)
Details
(Whiteboard: [automation])
Sometimes we have slaves that are idle for an extended period of time. As puppet changes are only applied on boot, it is possible for a slave to be up for hours between puppet run and build time. In support of our mobile test pool I wrote a script that reboots a slave after 10 hours of uptime or if a job has been going for more than 10 hours.
It is located at http://hg.mozilla.org/build/tools/file/10b1787feb65/buildfarm/mobile/n900-imaging/rootfs/bin/uptime-check.py
This script currently runs only on linux but porting to another platform should be mainly finding a source of system uptime information. We'd need to make the first step of *any and every* build on these systems be copying the uptime information to the offset file. The script calls out to reboot-user which is used to cleanly reboot the machine (shutting down the buildslave first). reboot-user does have some mobile specifics that could be removed.
This script is running in production in our N900 pool and is reliably rebooting our slaves after 10 hours of non-activity. It also is tested to respect the offset file.
This would mean that after deploying a puppet change, we know that after 10 hours every slave is either up to date (because it has rebooted and is back in the pool) or is not up to date (because puppet is running or is about to run), though this depends on making sure that the buildbot slave is only started after a successful puppet run.
Reporter | ||
Updated•15 years ago
|
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•