Closed
Bug 752185
(talos-r4-snow-041)
Opened 12 years ago
Closed 11 years ago
talos-r4-snow-041 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P3)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Unassigned)
References
Details
(Whiteboard: [buildduty][badslave?][decommission?])
Needs a reboot.
Comment 1•12 years ago
|
||
This error was being repeated on the screen "disk0s2: media is not present." The system came back online after a reboot
Comment 2•12 years ago
|
||
back in production
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 3•12 years ago
|
||
I think this slave has RAM issues or something: Bug 784328 Bug 784323 Bug 781820 Bug 781816 Bug 781150 Bug 781148 Bug 780889 I've been merrily filing bugs for these new crashes - but I'm starting to suspect the slave now, since 7 out of the 21 new crashes in the last two weeks have come from that slave alone - and those failures (above) have not been seen on any other machine. Please can we take this machine out of production and run a memory diag or something? :-)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 4•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=14615019&tree=Mozilla-Inbound
Reporter | ||
Comment 5•12 years ago
|
||
Disabled in slavealloc.
Comment 6•12 years ago
|
||
hardware diagnostics were run twice with no errors.
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•12 years ago
|
Depends on: 787281
Whiteboard: [buildduty][buildslaves][capacity] → [buildduty][badslave?][decommission?]
Comment 7•12 years ago
|
||
Looks like we didn't try to re-image this machine yet. Let's do that, and then kill it with fire if it doesn't work.
No longer depends on: 787281
Comment 8•12 years ago
|
||
Running tests fine again in production.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 9•12 years ago
|
||
The problem is, other than the now-burned edmorley, the rest of us aren't likely to file on a random-looking crash like https://tbpl.mozilla.org/php/getParsedLog.php?id=16767443&tree=Mozilla-Inbound, we'll just blow it off.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 10•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16792815&tree=Fx-Team (a talos run with a strange and sudden "process killed by signal 11")
Comment 11•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=16800234&tree=Mozilla-Inbound with exactly the sort of "this is the only slave that ever has or ever will hit it" memory corruption GC crash that started the whole "pull busted slaves and run diagnostics on them" thing.
Comment 12•12 years ago
|
||
disabled in slavealloc
Comment 13•12 years ago
|
||
Setup needed.
Comment 14•11 years ago
|
||
Well, after this machine sat idle for nearly two months I've done post-imaging setup and it's back in production.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Comment 15•11 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=18649267&tree=Mozilla-Inbound is it crashing with a minidump so malformed, minidump_stackwalk just stared at it saying "wtf is that? is that an AMD64 crash? wtf?"
Comment 16•11 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=18654576&tree=Mozilla-Aurora and https://tbpl.mozilla.org/php/getParsedLog.php?id=18652897&tree=Mozilla-Inbound are exactly the sort of... oh, I already typed that in comment 11. Please apply the comment 7 solution and decomm it - it is clearly and unquestionably busted, and we apparently lack sufficient diagnostics to even guess what parts to start replacing.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 17•11 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #16) > https://tbpl.mozilla.org/php/getParsedLog.php?id=18654576&tree=Mozilla- > Aurora and > https://tbpl.mozilla.org/php/getParsedLog.php?id=18652897&tree=Mozilla- > Inbound are exactly the sort of... oh, I already typed that in comment 11. > > Please apply the comment 7 solution and decomm it - it is clearly and > unquestionably busted, and we apparently lack sufficient diagnostics to even > guess what parts to start replacing. We've been looking at repairing the r4 machines lately, so we'll give that a try first.
Comment 18•11 years ago
|
||
Diagnostics found corrupt files, was reimaged and brought back to life just now.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 19•11 years ago
|
||
And we're seeing the same failures on it that we were seeing before in bug 781816. https://tbpl.mozilla.org/?tree=Mozilla-Aurora
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 20•11 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=19465580&tree=Mozilla-Aurora
Comment 21•11 years ago
|
||
Disabled in slavealloc.
Comment 22•11 years ago
|
||
Diagnostics does not help. Change memory, re-image (bug 864979) and try again. If not, decommission.
Depends on: 864979
Comment 23•11 years ago
|
||
Already puppetized. Updated the old password + autologin. Back in the pool, for good or ill.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 24•11 years ago
|
||
Yeah, ill. It did one try reftest job, finished with an exception, and has since "done" 393 jobs all saying "device not configured" the first time it tries to write anything to disk, setting retry on everything else and burning talos.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 25•11 years ago
|
||
Rebooted and disabled in slavealloc to stop the rot. Decommission time ?
Comment 26•11 years ago
|
||
Sent for decomm in bug 885875.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•