We've had trouble with week with buildapi, eg from nagios nagios-releng> Thu 14:22:21 PDT  buildapi.pvt.build.mozilla.org:http - /buildapi/self-serve/jobs is CRITICAL: CRITICAL - Socket timeout after 10 seconds There was a throttle limit put in a few days ago (IRC only), a tree closure (buyg 1271661), and today we have web1.releng.webapp.scl3 hitting errors like [Thu May 12 14:41:31 2016] [error] [client 10.22.81.211] (11)Resource temporarily unavailable: mod_wsgi (pid=26271): Unable to connect to WSGI daemon process 'buildapi' on '/var/run/wsgi.27589.0.1.sock' after multiple attempts as listener backlog limit was exceeded. Just after bug 127661 we had to restart apache on web2 because it wasn't responding, while web1 was fine. Today it's the other way round. fox2mike has restarted apache on web2 to temporarily resolve.
NB: there was a mod_wsgi upgrade and virtualenv recreate in bug 1271661. See also the deps on that for current issues with the normal app deploy process.
We're going to dupe this in favor of the "add New Relic support" bug, since that's the full actionable item here for us at this time. Further work on improving the releng applications for New Relic metrics will be tracked there.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1272516
You need to log in before you can comment on or make changes to this bug.