I'm seeing high numbers of pending AWS tests across all trees and nagios is alerting in #buildduty about high numbers of pending jobs. All trees closed, including Gaia.
New instances are failing to start buildbot because runner can't find hg via hgtool.py. We're bleeding capacity as old instances terminate themselves to pick up the new image.
Clarifying this is a configuration issue on the AWS machines (can't find the hgtool.py script), not an issue interacting with hg.m.o
Created attachment 8534536 [details] [diff] [review] fix This patch should fix the root of the problem In parallel :rail is reverting the golden AMI's to yesterday's ones which will avoid this bustage alltogether
Assignee: nobody → bugspam.Callek
Status: NEW → ASSIGNED
As an update, we're just waiting for the current pending backlog to come down before reopening. Rail's revert is working for getting the line moving in the right direction :)
Backlog is looking better and new linux test jobs appear to be starting reasonably fast now. I'm reopening everything.
Cautiously optimistic here, marking as fixed. We'll know for sure after tomorrow's AMI's get generated. A link that showed the problem today: https://www.hostedgraphite.com/da5c920d/grafana/#/dashboard/temp/e5db589335c850ef95f52b85c2585442aa61c401?panelId=5&fullscreen
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
and I landed an unsaved version, and tested said version -- thus I caused bustage. The fix: https://hg.mozilla.org/build/puppet/rev/01a37f44eafe https://hg.mozilla.org/build/puppet/rev/521aa8dd8a02
I don't like the conditional here, as depending on install order /usr/bin/hg may end up pointing to the releng hg or the system hg. Maybe the two packages should explicitly conflict, so that only one can be installed?
You need to log in before you can comment on or make changes to this bug.