Closed
Bug 1263820
Opened 8 years ago
Closed 2 years ago
Implement monitoring of signing servers
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: nthomas, Unassigned)
Details
We don't have any visibility into failures until a human happens to notice (release automation failing, people find it on treeherder). For example, wonky hardware (bug 1083156), issues starting the signing processes (bug 1210686), and today we had mac-v2-signing7 getting overwhelmed. The only recourse is to go spelunking in the log, which is pretty chatty. Some sort of metrics/monitoring would be helpful, assuming this system is here for a while. Nagios is just doing simple checks like ping, disk space, load, ntp, and 3x signing procs running. eg * the signing server processes could use syslog to message about failures, which would get carried into papertrail where we can SNS alert in #buildduty when over some failure threshold (either absolute # or percentage failing) * we could graph lots of things - # of successful/failing/pending signing jobs, I/O and cpu load. Not sure exactly how we do that, maybe it's OK to push graphite data if it's one-way.
Comment 1•8 years ago
|
||
* teach signing servers to say something like "go away, try another signing server" if they are overwhelmed.
Reporter | ||
Comment 2•8 years ago
|
||
Oh yes, the client side needs to be smarter too. In the mac repacks yesterday they kept asking a single server for about 5 hours.
Assignee | ||
Updated•7 years ago
|
Component: Tools → General
Comment 3•6 years ago
|
||
I think this can be closed, with the new TC world, am I right?
Flags: needinfo?(aki)
Comment 4•6 years ago
|
||
The signing scriptworkers still use the signing servers, so monitoring would be good. We're moving the keys to autograph for MARs at some point soon, and autograph will get APK signing at some point as well, but we still have all the windows, mac, and gpg signing that will stay on the signing servers for the foreseeable future.
Flags: needinfo?(aki)
Comment 5•2 years ago
|
||
Signing servers are EOL.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•