Closed Bug 1109752 Opened 10 years ago Closed 10 years ago

All trees closed due to high AWS pending test backlog

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RyanVM, Assigned: Callek)

Details

Attachments

(1 file)

I'm seeing high numbers of pending AWS tests across all trees and nagios is alerting in #buildduty about high numbers of pending jobs.

All trees closed, including Gaia.
New instances are failing to start buildbot because runner can't find hg via hgtool.py. We're bleeding capacity as old instances terminate themselves to pick up the new image.
Clarifying this is a configuration issue on the AWS machines (can't find the hgtool.py script), not an issue interacting with hg.m.o
Attached patch fixSplinter Review
This patch should fix the root of the problem

In parallel :rail is reverting the golden AMI's to yesterday's ones which will avoid this bustage alltogether
Assignee: nobody → bugspam.Callek
Status: NEW → ASSIGNED
Attachment #8534536 - Flags: review?(winter2718)
Attachment #8534536 - Flags: review+
Attachment #8534536 - Flags: review+
As an update, we're just waiting for the current pending backlog to come down before reopening. Rail's revert is working for getting the line moving in the right direction :)
Backlog is looking better and new linux test jobs appear to be starting reasonably fast now. I'm reopening everything.
Attachment #8534536 - Flags: review?(winter2718)
Cautiously optimistic here, marking as fixed.

We'll know for sure after tomorrow's AMI's get generated.

A link that showed the problem today: https://www.hostedgraphite.com/da5c920d/grafana/#/dashboard/temp/e5db589335c850ef95f52b85c2585442aa61c401?panelId=5&fullscreen
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
and I landed an unsaved version, and tested said version -- thus I caused bustage.

The fix:
https://hg.mozilla.org/build/puppet/rev/01a37f44eafe
https://hg.mozilla.org/build/puppet/rev/521aa8dd8a02
I don't like the conditional here, as depending on install order /usr/bin/hg may end up pointing to the releng hg or the system hg.

Maybe the two packages should explicitly conflict, so that only one can be installed?
Flags: needinfo?(bugspam.Callek)
Flags: needinfo?(bugspam.Callek)
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: