Signing servers (signing4, signing5, signing6) running out of disk space

RESOLVED FIXED

Status

Release Engineering
Tools
--
blocker
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: pmoore, Assigned: pmoore)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

3 years ago
Nagios alerts in #buildduty


nagios-releng
13:58:12 Wed 04:58:12 PDT [4988] signing6.srv.releng.scl3.mozilla.com:disk - / is CRITICAL: DISK CRITICAL - free space: / 19504 MB (6% inode=99%): (http://m.mozilla.org/disk+-+/)
13:59:13 Wed 04:59:12 PDT [4989] signing4.srv.releng.scl3.mozilla.com:disk - / is CRITICAL: DISK CRITICAL - free space: / 27696 MB (9% inode=99%): (http://m.mozilla.org/disk+-+/)
14:39:13 Wed 05:39:12 PDT [4991] signing5.srv.releng.scl3.mozilla.com:disk - / is WARNING: DISK WARNING - free space: / 32889 MB (11% inode=99%): (http://m.mozilla.org/disk+-+/)
(Assignee)

Comment 1

3 years ago
The usage is genuine, but the problem is that the release build artefacts are taking up the most space.

Probably due to the high number of releases we are having at the moment.

e.g. on signing6, /builds/signing/rel-key-signing-server is 186GB, and /builds/signing is 257GB of the total 260GB used
(Assignee)

Comment 2

3 years ago
It looks like cleanup strategy is based on age of artefact, rather than available disk space:
https://github.com/mozilla/build-tools/blob/master/lib/python/signing/server.py#L408 -> https://github.com/mozilla/build-tools/blob/master/lib/python/signing/server.py#L331
(Assignee)

Comment 3

3 years ago
A short term solution (to avoid filling up disks completely) is to reduce server.max_file_age in the signing.ini file.

Currently it is set to 12 hours:

<snip>

[server]
listen = 0.0.0.0
port = 9120
redis = 
max_file_age = 43200 ; 12 hours
cleanup_interval = 600 ; 10 Minutes
daemonize = yes

</snip>

A longer term solution might be to change cleanup strategy to be based on available free disk space.
(Assignee)

Comment 4

3 years ago
Created attachment 8494480 [details] [diff] [review]
bug1072274_puppet_v1.patch

Not sure if I'll need to restart the signing servers to pick up the change from the template, after puppet lands the change?
Assignee: nobody → pmoore
Status: NEW → ASSIGNED
Attachment #8494480 - Flags: review?(rail)
this is hitting the trees as well
(In reply to Carsten Book [:Tomcat] from comment #5)
> this is hitting the trees as well

and since we get more and more red results because of it closed integration trees
Attachment #8494480 - Flags: review?(rail) → review+
Severity: normal → blocker
Component: Tools → Buildduty
QA Contact: hwine → bugspam.Callek
(Assignee)

Comment 7

3 years ago
Immediate issue fixed. Will raise a separate bug for long term solution (comment 3).
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Component: Buildduty → Tools
QA Contact: bugspam.Callek → hwine
Resolution: --- → FIXED
(Assignee)

Updated

3 years ago
See Also: → bug 1104822
You need to log in before you can comment on or make changes to this bug.