buildapi.pvt.build.mozilla.org problems

RESOLVED DUPLICATE of bug 1272516

Status

Infrastructure & Operations
WebOps: Other
RESOLVED DUPLICATE of bug 1272516
2 years ago
2 years ago

People

(Reporter: nthomas, Unassigned)

Tracking

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2974] )

(Reporter)

Description

2 years ago
We've had trouble with week with buildapi, eg from nagios
  nagios-releng> Thu 14:22:21 PDT [4026] buildapi.pvt.build.mozilla.org:http - /buildapi/self-serve/jobs is CRITICAL: CRITICAL - Socket timeout after 10 seconds 

There was a throttle limit put in a few days ago (IRC only), a tree closure (buyg 1271661), and today we have web1.releng.webapp.scl3 hitting errors like
[Thu May 12 14:41:31 2016] [error] [client 10.22.81.211] (11)Resource temporarily unavailable: mod_wsgi (pid=26271): Unable to connect to WSGI daemon process 'buildapi' on '/var/run/wsgi.27589.0.1.sock' after multiple attempts as listener backlog limit was exceeded. 

Just after bug 127661 we had to restart apache on web2 because it wasn't responding, while web1 was fine. Today it's the other way round. fox2mike has restarted apache on web2 to temporarily resolve.

Updated

2 years ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2974]
(Reporter)

Comment 1

2 years ago
NB: there was a mod_wsgi upgrade and virtualenv recreate in bug 1271661. See also the deps on that for current issues with the normal app deploy process.
(Reporter)

Updated

2 years ago
Depends on: 1272516
We're going to dupe this in favor of the "add New Relic support" bug, since that's the full actionable item here for us at this time. Further work on improving the releng applications for New Relic metrics will be tracked there.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1272516
You need to log in before you can comment on or make changes to this bug.