Closed Bug 736336 (talos-r4-lion-006) Opened 14 years ago Closed 12 years ago

talos-r4-lion-006 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Unassigned)

References

()

Details

(Whiteboard: [buildduty][capacity][buildslaves][needs diagnostics])

[18:09] <nthomas> aki|buildduty: talos-r4-lion-006 has gone mad. Failed to download some tests with the cryptic: [18:09] <nthomas> firefox-13.0a2.en-US.mac64.tests.zip: Invalid argument [18:09] <nthomas> Cannot write to `firefox-13.0a2.en-US.mac64.tests.zip' (Invalid argument). [18:09] <nthomas> then failed to in rebooting [18:09] <nthomas> IOError: [Errno 22] invalid mode ('w') or filename: '../reboot_count.txt' [18:09] <nthomas> I've done a graceful shutdown of it on the master [18:09] <nthomas> refusing ssh
Depends on: 736337
re-puppetized, and back into the pool.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
https://tbpl.mozilla.org/php/getParsedLog.php?id=18187783&tree=Mozilla-Inbound "exceptions.AttributeError: 'module' object has no attribute 'SlaveFileDownloadCommand'"
Also: https://tbpl.mozilla.org/php/getParsedLog.php?id=18187789&tree=Mozilla-Inbound { Connecting to build.mozilla.org|10.22.74.128|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1,169 (1.1K) [application/x-sh] installdmg.sh: Invalid argument Cannot write to `installdmg.sh' (Invalid argument). }
disabled in slavealloc, if the machine isn't rebooting I'll grab it tomorrow, (just mention here if its still taking jobs)
Thank you :-)
It seems to be taking jobs still sadly
(In reply to Ed Morley (Away 20th Dec-2nd Jan) [UTC+0; email:edmorley@moco] from comment #7) > It seems to be taking jobs still sadly It seems to have died as of ~ an hour after your comment, I can't ssh in right now for some reason, so I'll call this disabled for now.
(In reply to Justin Wood (:Callek) from comment #9) > It seems to have died as of ~ an hour after your comment, I can't ssh in > right now for some reason, so I'll call this disabled for now. Oh I pressed the graceful shutdown button (not really expecting it to work, given that I checked and it was indeed disabled in slavealloc, so thought it just wasn't paying attention), at which point it stopped eating jobs every few mins :-)
(In reply to Justin Wood (:Callek) from comment #9) > It seems to have died as of ~ an hour after your comment, I can't ssh in > right now for some reason, so I'll call this disabled for now. Recall that all r4s and r5s are connected to PDUs, you just have to grab the PDU data from the inventory. That's the quickest way (coupled with disabling in slavealloc) to lower the hammer when one of these machines is eating jobs in quick succession.
So it seems like we haven't tried re-imaging or hardware diagnostics on this. We should do that next.
Depends on: 828602
Put back into production after reimage: renabled in slavealloc and rebooted.
green builds so far
Status: REOPENED → RESOLVED
Closed: 14 years ago13 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Depends on: 842538
Resolution: FIXED → ---
Depends on: 843472
Depends on: 843590
host has a logic board/power issue. Doesn't want to stay on. We'll track this host through bug 843472.
Up and taking jobs; green.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Whiteboard: [buildduty][capacity][buildslaves][badslave?] → [buildduty][capacity][buildslaves]
https://tbpl.mozilla.org/php/getParsedLog.php?id=23531172&tree=Mozilla-Central exceptions.AttributeError: 'module' object has no attribute 'SlaveShellCommand' So far it's only chewed up 40 jobs, but at 1 second each it has potential.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Disabled in slavealloc
And the buildbot graceful shutdown used since it was still taking jobs.
Whiteboard: [buildduty][capacity][buildslaves] → [buildduty][capacity][buildslaves][needs diagnostics]
Depends on: 889870
waiting on sane bringup procedure for this slave class.
Testing with PuppetAgain in bug 891880.
Product: mozilla.org → Release Engineering
been back in production for awhile
Status: REOPENED → RESOLVED
Closed: 13 years ago12 years ago
Resolution: --- → FIXED
Depends on: 917082
Back from loan in bug 917082.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 937190
Taking jobs in production.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.