Closed
Bug 1422295
Opened 7 years ago
Closed 6 years ago
gecko-t-win10-64-gpu instances are hanging
Categories
(Release Engineering :: General, enhancement)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: dustin, Unassigned)
Details
Per bug 1422184, as of about 12 hours ago we have seen gecko-t-win10-64-gpu with 500 instances provisioned (the maximum) but a rapidly growing pending. Looking at some of the workers, they've all died while running their last job, which was resolved claim-expired. So something is crashing/hanging these hosts, but they are not terminating and thus are sitting idle. We have determined that it is possible to limp along by occasionally terminating all of the workers of a given workerType, and letting the provisioner re-provision them. This is manual intervention and not very efficient, though. I'm happy to give permission to terminate all to anyone who needs it to continue this pattern.
Comment 1•7 years ago
|
||
Related to https://bugzilla.mozilla.org/show_bug.cgi?id=1372172? Rob posted a script in comment 12 (https://bugzilla.mozilla.org/show_bug.cgi?id=1372172#c12) of that bug that might be useful if the cause is the same (impaired instances).
Comment 2•7 years ago
|
||
Hypothesis - the script in https://bugzilla.mozilla.org/show_bug.cgi?id=1372172#c12 is running in a cron somewhere under a superuser aws account. Since we disabled superuser accounts yesterday, that probably broke. Jonas, can we reenable the superuser accounts until grenade is back from PTO?
Flags: needinfo?(jopsen)
Comment 3•7 years ago
|
||
Both grenade and markco use the script, running it from their laptops. Rob thought he had potentially found a fix while working on the OS theme issue (1343049?), but I'm not sure if that actually landed or if it just didn't quite work out.
Comment 4•7 years ago
|
||
i found and fixed the permissions issue and am running the cron script successfully again...
Flags: needinfo?(jopsen)
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•