Closed
Bug 740505
(talos-r3-fed-070)
Opened 13 years ago
Closed 11 years ago
decommission talos-r3-fed-070
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: coop, Unassigned)
References
Details
(Whiteboard: [buildduty][buildslave][capacity])
No description provided.
Comment 1•13 years ago
|
||
no vga sig | *REIMAGED*
This slave had a corrupted filesystem after it was rebooted. I defaulted to reimaging rather than repair the fs.
Comment 2•13 years ago
|
||
re-enabled in slavealloc
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 3•13 years ago
|
||
Needed to have some puppet mucking before it got into production again.
Comment 4•13 years ago
|
||
Please reboot.
Comment 5•13 years ago
|
||
File system is busted again:
https://tbpl.mozilla.org/php/getParsedLog.php?id=14675369&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=14675551&tree=Mozilla-Inbound
Severity: normal → blocker
Reporter | ||
Comment 6•13 years ago
|
||
It's been disabled in slavealloc now, so bumping down to normal.
Severity: blocker → normal
Comment 7•13 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=14675286&tree=Mozilla-Inbound was when it went bad, sort of oddly timed during shutdown of the successful test run.
Comment 8•13 years ago
|
||
ssh'd in and killed the buildbot process, since the slavealloc disable doesn't take effect if it's not rebooting. Attempted to reboot it.
Comment 9•12 years ago
|
||
Back in production.
Status: REOPENED → RESOLVED
Closed: 13 years ago → 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Comment 10•12 years ago
|
||
Producing green jobs.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 11•12 years ago
|
||
needs a reboot
Comment 12•12 years ago
|
||
Filesystem is corrupt (comment 1 comment 5), or at least it's read-only and burning jobs, so now it needs to be disabled, but slavealloc won't be enough (comment 6 comment 8) so you'll need to ssh in (comment 8).
Severity: normal → major
Comment 13•12 years ago
|
||
I did 'sudo reboot' on this, and now it's unresponsive to ssh. Disabled in slavealloc just in case.
Lets reimage again, it seems like it helps for a few months each time.
Comment 14•12 years ago
|
||
Hmm, if you ssh to this host you can notice DNS vs hostname mismatch:
$ ssh talos-r3-fed-070.build.mozilla.org hostname
talos-r3-fed64-070.build.mozilla.org
^^
It is not a fed64 slave:
$ uname -m
i686
Comment 15•12 years ago
|
||
Bad post-imaging maybe?
Comment 16•12 years ago
|
||
I'll fix this rail, my typo
Comment 17•12 years ago
|
||
back in production
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 18•12 years ago
|
||
Ok, since I don't have any sign this is anything bug a slave-issue so far, we had a handful of tooltool errors on a single job with this slave
https://tbpl.mozilla.org/php/getParsedLog.php?id=19145041&tree=Mozilla-Inbound&full=1#error0
e.g.
15:23:49 ERROR - ERROR - transfer from http://runtime-binaries.pvt.build.mozilla.org/tooltool//sha512/cea07d65a39a244961f183711b14d054c90690b69a79d89a3783f9a634f9ace7f6e70033e963a4f58ca8482b3aec8d4c5d3227cc7a0bc61e6afeccf2acc1a789 to emulator.zip failed due to a difference of 586906717 bytes
If it happens again, but only for this slave we'll pull this. If it happens for another slave anytime soon we'll hand off to IT for why runtime-binaries is acting up in terms of packets.
Comment 19•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=19149223&tree=Mozilla-Inbound - a timeout in a test which has never timed out before.
Comment 20•12 years ago
|
||
Ok, reopening -- disabling in slavealloc and leaving for coop/Tomcat to diagnose on monday.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 21•12 years ago
|
||
Puppetized, enabled, running in production.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 22•12 years ago
|
||
That boy still ain't right.
https://tbpl.mozilla.org/php/getParsedLog.php?id=21160623&tree=Mozilla-Inbound is, um, some sort of untar failure.
https://tbpl.mozilla.org/php/getParsedLog.php?id=21154002&tree=Firefox is an unzip timeout.
https://tbpl.mozilla.org/php/getParsedLog.php?id=21157524&tree=Mozilla-Inbound is tooltool not liking what it downloaded.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 23•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=21181678&tree=Mozilla-Inbound - read-only filesystem
Comment 24•12 years ago
|
||
Back in production.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 25•12 years ago
|
||
Might want to take it back out...
https://tbpl.mozilla.org/php/getParsedLog.php?id=21348135&tree=Mozilla-B2g18 (and a ton more like it)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 26•12 years ago
|
||
It's been disabled, and not even online at the moment...
Comment 27•12 years ago
|
||
puppetized and back in production.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Reporter | ||
Updated•12 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 28•12 years ago
|
||
You forgot the part where usually the person putting it back in production says "ran two green jobs" as though that was a sign of health. Though in this case, it would be "ran zero green jobs."
https://tbpl.mozilla.org/php/getParsedLog.php?id=22213793&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=22216536&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=22221969&tree=Mozilla-Central
Please disable it, and put it out of our collective misery, it's decomm time.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 29•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=22225261&tree=Mozilla-Inbound - now it's back to having a read-only filesystem, which typically results in eating up dozens or hundreds of jobs since they only take a minute each.
Severity: major → blocker
Comment 30•12 years ago
|
||
buildbot stopped, disabled in slavealloc.
Comment 31•12 years ago
|
||
I just love the dazed look in their eyes after you club them while they're in the middle of a frantic speed-RETRY run.
Severity: blocker → normal
Comment 32•12 years ago
|
||
We're fine without it as we don't have that much load on the rev3 fedoras anymore.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 33•12 years ago
|
||
Reopening to officially decommission.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: talos-r3-fed-070 problem tracking → decommission talos-r3-fed-070
Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Comment 34•11 years ago
|
||
Removed from buildbot-configs in https://hg.mozilla.org/build/buildbot-configs/rev/bb86e30e979b
It's not in Puppet anymore an it's not in slavealloc, so I think that's all there is to do.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•