Closed Bug 1295993 Opened 9 years ago Closed 9 years ago

Large new items queues on several buildbot masters

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aselagea, Unassigned)

References

Details

Noticed several alerts like the one below in #buildduty. nagios-releng> Wed 07:42:28 PDT [4109] buildbot-master70.bb.releng.use1.mozilla.com:Command Queue is CRITICAL: 642 new items:oldest item is 1653s old (http://m.mozilla.org/Command+Queue) The masters seem to claim the jobs, but they are not being processed.
Masters affected at this point: bm70, bm73, bm77, bm94. Disabled them in slavealloc and did a graceful shutdown. Waiting for them to finish running the current jobs.
So buildbot-master74 is not a problem and it is a windows build master. However, the trees are closed for bug 1295950 so I'm not sure if there were simply not jobs queued for them and somehow they were redirected to the other masters. http://nagios1.private.releng.scl3.mozilla.com/releng-scl3/cgi-bin/status.cgi?navbarsearch=1&host=buildbot-master74 I wonder if the root cause is that bug 1295950 caused a huge number of retries and the masters simply couldn't keeup up.
All four buildbot-masters finished their graceful shutdown, so I rebooted and enabled them back in slavealloc.
Depends on: 1295446
The queues are not a problem now. bm70, 73, 77 haven't run jobs since they were rebooted however, the trees have been closed most of the day for bug 1295950. bm94 has run a job since it was rebooted.
The masters look good at the moment, they all started running jobs.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.