Closed Bug 1272514 Opened 8 years ago Closed 8 years ago

buildapi.pvt.build.mozilla.org problems

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1272516

People

(Reporter: nthomas, Unassigned)

References

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2974] )

We've had trouble with week with buildapi, eg from nagios
  nagios-releng> Thu 14:22:21 PDT [4026] buildapi.pvt.build.mozilla.org:http - /buildapi/self-serve/jobs is CRITICAL: CRITICAL - Socket timeout after 10 seconds 

There was a throttle limit put in a few days ago (IRC only), a tree closure (buyg 1271661), and today we have web1.releng.webapp.scl3 hitting errors like
[Thu May 12 14:41:31 2016] [error] [client 10.22.81.211] (11)Resource temporarily unavailable: mod_wsgi (pid=26271): Unable to connect to WSGI daemon process 'buildapi' on '/var/run/wsgi.27589.0.1.sock' after multiple attempts as listener backlog limit was exceeded. 

Just after bug 127661 we had to restart apache on web2 because it wasn't responding, while web1 was fine. Today it's the other way round. fox2mike has restarted apache on web2 to temporarily resolve.
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2974]
NB: there was a mod_wsgi upgrade and virtualenv recreate in bug 1271661. See also the deps on that for current issues with the normal app deploy process.
Depends on: 1272516
We're going to dupe this in favor of the "add New Relic support" bug, since that's the full actionable item here for us at this time. Further work on improving the releng applications for New Relic metrics will be tracked there.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.