AWS build slaves in jacuzzis not being started

RESOLVED INCOMPLETE

Status

Release Engineering
Buildduty
--
critical
RESOLVED INCOMPLETE
3 years ago
3 years ago

People

(Reporter: philor, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
Looks like the first jacuzzi to dry out was "Android armv7 API 11+ fx-team build" with four slaves, all stopped, and pending jobs going back 13.5 hours, but we also have several hours of pending and no live slaves for:

Android armv7 API 11+ b2g-inbound build
Android armv7 API 11+ b2g-inbound debug build
Linux fx-team build
b2g_fx-team_linux32_gecko build

and lots of other jacuzzis where there's only one live slave, so we currently have pending and if we have no load for long enough for that one slave to go idle, I expect we'll be hosed.

fx-team is closed, b2g-inbound is approval-only so that gaia commits which don't trigger Android anyway can still land but Gecko pushes can't, mozilla-central is de facto closed since it has jacuzzis that will be dried out by the next time I have something to merge there, and mozilla-inbound is hanging on as long as it neither has a long period without any pushes nor has too many pushes at once.
(Reporter)

Comment 1

3 years ago
A curious state of affairs - I thought the jacuzzi allocator "fixed" it by adding one more slave to the affected jacuzzis, making them barely workable by having one slave, but at least in the case of Android armv7 API 11+ fx-team build, https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?include=bld-linux64-spot-096,bld-linux64-spot-420,bld-linux64-spot-490,bld-linux64-spot-491,bld-linux64-spot-498 it added 096, which did not pick up any of the pending Android builds, while the other four who had been sitting idled all suddenly woke up and took jobs at the time of the allocator commit.

Still critical since it could still leave us broken again at any time, but nothing's currently closed over it.
Severity: blocker → critical
Summary: Trees closed, AWS build slaves in jacuzzis not being started → AWS build slaves in jacuzzis not being started
(Reporter)

Updated

3 years ago
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.