Closed
Bug 1286605
Opened 9 years ago
Closed 8 years ago
Add nagios checks for buildbot bridge services
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: bhearsum, Assigned: aselagea)
References
Details
(Whiteboard: [bbb])
I just got these e-mails:
Warning: your queue "queue/buildbot-bridge/log_uploaded" on exchange "could not be determined" is
overgrowing (7898 ready messages, 7898 total messages).
The queue will be automatically deleted when it exceeds 16000 messages.
Make sure your clients are running correctly and are cleaning up unused
durable queues.
Warning: your queue "queue/buildbot-bridge/started" on exchange "could not be determined" is
overgrowing (8410 ready messages, 8410 total messages).
The queue will be automatically deleted when it exceeds 16000 messages.
Make sure your clients are running correctly and are cleaning up unused
durable queues.
Comment 1•9 years ago
|
||
Looks related to the DB fail over. Restarted the dead services.
Do we need a nagios check for this? How would buildduty normally find out about these emails?
Comment 3•9 years ago
|
||
it'd be great to have nagios checks not running buildbot-bridge services.
Comment 4•9 years ago
|
||
Buildduty should be able to help us get these checks setup.
Component: General Automation → Buildduty
QA Contact: catlee → bugspam.Callek
Summary: buildbot bridge queue is growing, bridge is possibly broken? → Add nagios checks for buildbot bridge services
Updated•8 years ago
|
Blocks: bbb-improvements
Updated•8 years ago
|
Whiteboard: [bb-database failover] → [bbb]
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → aselagea
Assignee | ||
Comment 5•8 years ago
|
||
We already have a check in place for the buildbot-bridge services, e.g:
"nagios-releng> Thu 16:00:07 PDT [4087] buildbot-master82.bb.releng.scl3.mozilla.com:procs - buildbot-bridge is CRITICAL: PROCS CRITICAL: 0 processes with regex args /builds/bbb/bin/buildbot-bridge (http://m.mozilla.org/procs+-+buildbot-bridge)"
Judging by the time Ben received the e-mail, it was before me or Andrei had come online and then Rail restarted the dead services to solve the issue.
@Rail: is there something else you think we'd need here?
Flags: needinfo?(rail)
Assignee | ||
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•