Closed
Bug 858659
Opened 12 years ago
Closed 12 years ago
many jobs not starting (or taking a long time to start) on linux test masters
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Unassigned)
Details
We have over 300 fedora jobs pending right now, and tons of idle slaves. The jobs aren't starting though.
13:41 <+catlee> but I think bm24 is so overloaded it's taking hours to process RPC
calls
| Reporter | ||
Comment 1•12 years ago
|
||
bm18 seems to be in the worst shape - i don't see it handing out any jobs to r3 machines. It's got a graceful shutdown started, but I'm not sure who did it. I also don't see anything in twistd.log about the graceful shutdown being started...this master is very broken right now.
| Reporter | ||
Comment 2•12 years ago
|
||
bm18's slaves haven't run a single job since march 28th, i'm restarting it the hard way.
I'm guessing that someone initiated a graceful shutdown on the 28th and something happened, and it didn't shut down. $5 says something to do with ec2 slaves mucked it up.
| Reporter | ||
Comment 3•12 years ago
|
||
bm18 is back up and fedora pending is down to 73.
14:00 <+catlee> that's the same hung slave issue
14:00 <+catlee> we're only protected against it in a disconnect step
14:00 <+catlee> if the slave dies in other steps, we can still hang
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
| Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
| Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•