Closed Bug 1263820 Opened 8 years ago Closed 2 years ago

Implement monitoring of signing servers

Tracking

(Not tracked)

Status:

RESOLVED INVALID

People

(Reporter: nthomas, Unassigned)

Details

Nick Thomas [:nthomas] (UTC+12)

Reporter

Description

•

8 years ago

We don't have any visibility into failures until a human happens to notice (release automation failing, people find it on treeherder). For example, wonky hardware (bug 1083156), issues starting the signing processes (bug 1210686), and today we had mac-v2-signing7 getting overwhelmed. The only recourse is to go spelunking in the log, which is pretty chatty.

Some sort of metrics/monitoring would be helpful, assuming this system is here for a while. Nagios is just doing simple checks like ping, disk space, load, ntp, and 3x signing procs running. 

eg
* the signing server processes could use syslog to message about failures, which would get carried into papertrail where we can SNS alert in #buildduty when over some failure threshold (either absolute # or percentage failing)
* we could graph lots of things - # of successful/failing/pending signing jobs, I/O and cpu load. Not sure exactly how we do that, maybe it's OK to push graphite data if it's one-way.

Rail Aliiev [:rail]

Comment 1

•

8 years ago

* teach signing servers to say something like "go away, try another signing server" if they are overwhelmed.

Nick Thomas [:nthomas] (UTC+12)

Reporter

Comment 2

•

8 years ago

Oh yes, the client side needs to be smarter too. In the mac repacks yesterday they kept asking a single server for about 5 hours.

Nobody; OK to take it and work on it

Assignee

Updated

•

7 years ago

Component: Tools → General

Mihai Tabara [:mtabara]⌚️GMT

Comment 3

•

6 years ago

I think this can be closed, with the new TC world, am I right?

Flags: needinfo?(aki)

Aki Sasaki (not active)

Comment 4

•

6 years ago

The signing scriptworkers still use the signing servers, so monitoring would be good. We're moving the keys to autograph for MARs at some point soon, and autograph will get APK signing at some point as well, but we still have all the windows, mac, and gpg signing that will stay on the signing servers for the foreseeable future.

Flags: needinfo?(aki)

Aki Sasaki (not active)

Comment 5

•

2 years ago

Signing servers are EOL.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → INVALID

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Implement monitoring of signing servers

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: nthomas, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5