Signing servers (signing4, signing5, signing6) running out of disk space

RESOLVED FIXED

Status

Release Engineering
General
--
blocker
RESOLVED FIXED
3 years ago
3 months ago

People

(Reporter: pmoore, Assigned: pmoore)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

Nagios alerts in #buildduty


nagios-releng
13:58:12 Wed 04:58:12 PDT [4988] signing6.srv.releng.scl3.mozilla.com:disk - / is CRITICAL: DISK CRITICAL - free space: / 19504 MB (6% inode=99%): (http://m.mozilla.org/disk+-+/)
13:59:13 Wed 04:59:12 PDT [4989] signing4.srv.releng.scl3.mozilla.com:disk - / is CRITICAL: DISK CRITICAL - free space: / 27696 MB (9% inode=99%): (http://m.mozilla.org/disk+-+/)
14:39:13 Wed 05:39:12 PDT [4991] signing5.srv.releng.scl3.mozilla.com:disk - / is WARNING: DISK WARNING - free space: / 32889 MB (11% inode=99%): (http://m.mozilla.org/disk+-+/)
The usage is genuine, but the problem is that the release build artefacts are taking up the most space.

Probably due to the high number of releases we are having at the moment.

e.g. on signing6, /builds/signing/rel-key-signing-server is 186GB, and /builds/signing is 257GB of the total 260GB used
It looks like cleanup strategy is based on age of artefact, rather than available disk space:
https://github.com/mozilla/build-tools/blob/master/lib/python/signing/server.py#L408 -> https://github.com/mozilla/build-tools/blob/master/lib/python/signing/server.py#L331
A short term solution (to avoid filling up disks completely) is to reduce server.max_file_age in the signing.ini file.

Currently it is set to 12 hours:

<snip>

[server]
listen = 0.0.0.0
port = 9120
redis = 
max_file_age = 43200 ; 12 hours
cleanup_interval = 600 ; 10 Minutes
daemonize = yes

</snip>

A longer term solution might be to change cleanup strategy to be based on available free disk space.
Created attachment 8494480 [details] [diff] [review]
bug1072274_puppet_v1.patch

Not sure if I'll need to restart the signing servers to pick up the change from the template, after puppet lands the change?
Assignee: nobody → pmoore
Status: NEW → ASSIGNED
Attachment #8494480 - Flags: review?(rail)
this is hitting the trees as well
(In reply to Carsten Book [:Tomcat] from comment #5)
> this is hitting the trees as well

and since we get more and more red results because of it closed integration trees
Attachment #8494480 - Flags: review?(rail) → review+

Updated

3 years ago
Severity: normal → blocker
Component: Tools → Buildduty
QA Contact: hwine → bugspam.Callek
Immediate issue fixed. Will raise a separate bug for long term solution (comment 3).
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Component: Buildduty → Tools
QA Contact: bugspam.Callek → hwine
Resolution: --- → FIXED
See Also: → bug 1104822
Component: Tools → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.