Closed
Bug 736336
(talos-r4-lion-006)
Opened 14 years ago
Closed 12 years ago
talos-r4-lion-006 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mozilla, Unassigned)
References
()
Details
(Whiteboard: [buildduty][capacity][buildslaves][needs diagnostics])
[18:09] <nthomas> aki|buildduty: talos-r4-lion-006 has gone mad. Failed to download some tests with the cryptic:
[18:09] <nthomas> firefox-13.0a2.en-US.mac64.tests.zip: Invalid argument
[18:09] <nthomas> Cannot write to `firefox-13.0a2.en-US.mac64.tests.zip' (Invalid argument).
[18:09] <nthomas> then failed to in rebooting
[18:09] <nthomas> IOError: [Errno 22] invalid mode ('w') or filename: '../reboot_count.txt'
[18:09] <nthomas> I've done a graceful shutdown of it on the master
[18:09] <nthomas> refusing ssh
Comment 1•14 years ago
|
||
re-puppetized, and back into the pool.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 2•13 years ago
|
||
Has failed every one of the last 25 jobs :-s
https://secure.pub.build.mozilla.org/buildapi/recent/talos-r4-lion-006
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 3•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=18187783&tree=Mozilla-Inbound
"exceptions.AttributeError: 'module' object has no attribute 'SlaveFileDownloadCommand'"
Comment 4•13 years ago
|
||
Also:
https://tbpl.mozilla.org/php/getParsedLog.php?id=18187789&tree=Mozilla-Inbound
{
Connecting to build.mozilla.org|10.22.74.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1,169 (1.1K) [application/x-sh]
installdmg.sh: Invalid argument
Cannot write to `installdmg.sh' (Invalid argument).
}
Comment 5•13 years ago
|
||
disabled in slavealloc, if the machine isn't rebooting I'll grab it tomorrow, (just mention here if its still taking jobs)
Comment 6•13 years ago
|
||
Thank you :-)
Comment 7•13 years ago
|
||
It seems to be taking jobs still sadly
Updated•13 years ago
|
Comment 9•13 years ago
|
||
(In reply to Ed Morley (Away 20th Dec-2nd Jan) [UTC+0; email:edmorley@moco] from comment #7)
> It seems to be taking jobs still sadly
It seems to have died as of ~ an hour after your comment, I can't ssh in right now for some reason, so I'll call this disabled for now.
Comment 10•13 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #9)
> It seems to have died as of ~ an hour after your comment, I can't ssh in
> right now for some reason, so I'll call this disabled for now.
Oh I pressed the graceful shutdown button (not really expecting it to work, given that I checked and it was indeed disabled in slavealloc, so thought it just wasn't paying attention), at which point it stopped eating jobs every few mins :-)
Comment 11•13 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #9)
> It seems to have died as of ~ an hour after your comment, I can't ssh in
> right now for some reason, so I'll call this disabled for now.
Recall that all r4s and r5s are connected to PDUs, you just have to grab the PDU data from the inventory. That's the quickest way (coupled with disabling in slavealloc) to lower the hammer when one of these machines is eating jobs in quick succession.
Comment 12•13 years ago
|
||
So it seems like we haven't tried re-imaging or hardware diagnostics on this. We should do that next.
Comment 13•13 years ago
|
||
Put back into production after reimage: renabled in slavealloc and rebooted.
Comment 14•13 years ago
|
||
green builds so far
Status: REOPENED → RESOLVED
Closed: 14 years ago → 13 years ago
Resolution: --- → FIXED
Updated•13 years ago
|
Comment 15•13 years ago
|
||
host has a logic board/power issue. Doesn't want to stay on. We'll track this host through bug 843472.
| Reporter | ||
Comment 16•13 years ago
|
||
Up and taking jobs; green.
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
| Reporter | ||
Updated•13 years ago
|
Whiteboard: [buildduty][capacity][buildslaves][badslave?] → [buildduty][capacity][buildslaves]
Comment 17•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=23531172&tree=Mozilla-Central
exceptions.AttributeError: 'module' object has no attribute 'SlaveShellCommand'
So far it's only chewed up 40 jobs, but at 1 second each it has potential.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 18•12 years ago
|
||
Disabled in slavealloc
Comment 19•12 years ago
|
||
And the buildbot graceful shutdown used since it was still taking jobs.
Updated•12 years ago
|
Whiteboard: [buildduty][capacity][buildslaves] → [buildduty][capacity][buildslaves][needs diagnostics]
Comment 20•12 years ago
|
||
waiting on sane bringup procedure for this slave class.
Comment 21•12 years ago
|
||
Testing with PuppetAgain in bug 891880.
| Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Comment 22•12 years ago
|
||
been back in production for awhile
Status: REOPENED → RESOLVED
Closed: 13 years ago → 12 years ago
Resolution: --- → FIXED
Comment 23•12 years ago
|
||
Back from loan in bug 917082.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 24•12 years ago
|
||
Taking jobs in production.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•