Alert on/auto-delete stale cleanup lock files on buildbot masters

RESOLVED WORKSFORME

Status

Release Engineering
Platform Support
RESOLVED WORKSFORME
4 years ago
4 years ago

People

(Reporter: jhopkins, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

4 years ago
buildbot-master10 did not run a disk cleanup task for several months due to a stale lock file referenced by /etc/cron.d/bm10-tests1-tegra (and leading to bug 930021).

Two possible options:
- alert on such stale lock files via Nagios
- auto-delete stale lock files
We have 
 @hourly cltbld find /var/lock/cltbld -name lockfile.bbdb -mmin +360 -delete
on both bm10 and something modern like bm52. However I think this is a left over from before the queue system for log upload + status db insert + pulse message, and no longer has any effect.

On bm10, the master cleanup cron uses $HOME/lockfile.bm10-tests1-tegra_cleanup, while on bm52 it's /var/lock/cltbld/lockfile.bm52-tests1-linux_cleanup.
(Reporter)

Updated

4 years ago
Component: Other → Platform Support
QA Contact: joduinn → coop

Comment 2

4 years ago
This only affected the older masters: those setup by hand for the mobile devices. These old masters have all been replaced by ones setup from puppet now, and they all have the buildmaster-cron entries ported by Massimo:

http://hg.mozilla.org/build/puppet/diff/e1c695967cc0/modules/buildmaster/templates/buildmaster-cron.erb
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.