Figure out why the entire tst-w64-ec2 pool is dead

RESOLVED FIXED

Status

RESOLVED FIXED
5 years ago
3 months ago

People

(Reporter: philor, Assigned: jhopkins)

Tracking

Details

(URL)

(Reporter)

Description

5 years ago
Apparently things aren't quite as straightforward as I thought, since I filed the tracking bugs about most of the pool being dead on September 19th, and then when https://tbpl.mozilla.org/?tree=Date&rev=073011f5cae4 was pushed on September 24th, more than one slave (though fewer than the whole pool) actually ran tests.

Since then, though, I've just been killing the pending jobs on try when they started annoying me by pending for more than a day or two, and since my retriggers on that Date push have been pending for more than 45 minutes and the oldest try job I haven't killed has been pending for more than 4 hours, I'm pretty sure they're either all dead, or they needed to be added to watch_pending and they're all asleep with no idea they should wake up and do anything.
(Reporter)

Comment 1

5 years ago
They did get recreated, but now they're back down to one slave which is out of disk space.
(Reporter)

Comment 2

5 years ago
And apparently I've just been killing every job scheduled for an entire platform for 60 days now.
(Reporter)

Updated

5 years ago
Blocks: 950206
(Assignee)

Comment 3

5 years ago
philor: There were no tst-w64-ec2-xxx instances on AWS when I looked yesterday.  I don't know who/what killed them off or why.  However, I did create 4 new instances, which :vlad says should be enough for our win64 testing.
(In reply to John Hopkins (:jhopkins) from comment #3)
> philor: There were no tst-w64-ec2-xxx instances on AWS when I looked
> yesterday.  I don't know who/what killed them off or why.  However, I did
> create 4 new instances, which :vlad says should be enough for our win64
> testing.

jhopkins: can you follow up on this, make sure those new instances are being used, and then just close this out?  Thanks.
Assignee: nobody → jhopkins
(Assignee)

Comment 5

5 years ago
The 4 AWS instances I set up have indeed taken jobs on the Date branch.  Since I am actively working on win64 testing again, I will closely keep an eye on things.  Thanks
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.