slavealloc running really slowly today

RESOLVED FIXED

Status

Release Engineering
General
RESOLVED FIXED
7 years ago
5 years ago

People

(Reporter: dustin, Assigned: dustin)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

We've seen

21:35 < nagios-sjc1> [86] slavealloc.build.scl1:http_expect - slavealloc.build.scl1 is CRITICAL: (Service Check Timed Out)

for most of the day, and it's been causing sadness all around.  Slavealloc is slowing down enough that nginx is not willing to wait for it.

The net effect to production is that slaves wait for a bit while starting up.  nginx eventually times out, and the slave falls back to its old buildbot.tac.  I can get rid of the wait by turning slavealloc off.
So the root cause here is slow DNS in scl1 (bug 666487).  Two things fixed it:
 1. run nscd
 2. don't call socket.getfqdn() for every request

Patch for the latter momentarily.
Created attachment 541275 [details] [diff] [review]
m666486-tools-p1-r1.patch

Easy fix
Attachment #541275 - Flags: review?(nrthomas)
Attachment #541275 - Flags: review?(nrthomas) → review+
landed and deployed.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.