Closed Bug 740505 (talos-r3-fed-070) Opened 10 years ago Closed 8 years ago

decommission talos-r3-fed-070

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

x86
Linux

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Unassigned)

References

Details

(Whiteboard: [buildduty][buildslave][capacity])

No description provided.
Depends on: 740798
No longer depends on: 740798
Depends on: 740798
no vga sig | *REIMAGED*

This slave had a corrupted filesystem after it was rebooted.  I defaulted to reimaging rather than repair the fs.
re-enabled in slavealloc
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Needed to have some puppet mucking before it got into production again.
Please reboot.
Status: RESOLVED → REOPENED
Depends on: 784921
Resolution: FIXED → ---
It's been disabled in slavealloc now, so bumping down to normal.
Severity: blocker → normal
https://tbpl.mozilla.org/php/getParsedLog.php?id=14675286&tree=Mozilla-Inbound was when it went bad, sort of oddly timed during shutdown of the successful test run.
ssh'd in and killed the buildbot process, since the slavealloc disable doesn't take effect if it's not rebooting. Attempted to reboot it.
Depends on: 786345
No longer depends on: 785854
Back in production.
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Depends on: 798394
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Depends on: 799510
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Depends on: 814216
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Depends on: 816278
Resolution: FIXED → ---
Producing green jobs.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
needs a reboot
Status: RESOLVED → REOPENED
Depends on: 828016
Resolution: FIXED → ---
Filesystem is corrupt (comment 1 comment 5), or at least it's read-only and burning jobs, so now it needs to be disabled, but slavealloc won't be enough (comment 6 comment 8) so you'll need to ssh in (comment 8).
Severity: normal → major
I did 'sudo reboot' on this, and now it's unresponsive to ssh. Disabled in slavealloc just in case.

Lets reimage again, it seems like it helps for a few months each time.
Depends on: 829401
Hmm, if you ssh to this host you can notice DNS vs hostname mismatch:

$ ssh talos-r3-fed-070.build.mozilla.org hostname
talos-r3-fed64-070.build.mozilla.org
            ^^

It is not a fed64 slave:

$ uname -m
i686
Bad post-imaging maybe?
I'll fix this rail, my typo
back in production
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Ok, since I don't have any sign this is anything bug a slave-issue so far, we had a handful of tooltool errors on a single job with this slave

https://tbpl.mozilla.org/php/getParsedLog.php?id=19145041&tree=Mozilla-Inbound&full=1#error0

e.g.
15:23:49    ERROR -  ERROR - transfer from http://runtime-binaries.pvt.build.mozilla.org/tooltool//sha512/cea07d65a39a244961f183711b14d054c90690b69a79d89a3783f9a634f9ace7f6e70033e963a4f58ca8482b3aec8d4c5d3227cc7a0bc61e6afeccf2acc1a789 to emulator.zip failed due to a difference of 586906717 bytes

If it happens again, but only for this slave we'll pull this. If it happens for another slave anytime soon we'll hand off to IT for why runtime-binaries is acting up in terms of packets.
Ok, reopening -- disabling in slavealloc and leaving for coop/Tomcat to diagnose on monday.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 837997
Puppetized, enabled, running in production.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
That boy still ain't right.

https://tbpl.mozilla.org/php/getParsedLog.php?id=21160623&tree=Mozilla-Inbound is, um, some sort of untar failure.

https://tbpl.mozilla.org/php/getParsedLog.php?id=21154002&tree=Firefox is an unzip timeout.

https://tbpl.mozilla.org/php/getParsedLog.php?id=21157524&tree=Mozilla-Inbound is tooltool not liking what it downloaded.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Back in production.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Might want to take it back out...
https://tbpl.mozilla.org/php/getParsedLog.php?id=21348135&tree=Mozilla-B2g18 (and a ton more like it)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
It's been disabled, and not even online at the moment...
puppetized and back in production.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Depends on: 863036
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
You forgot the part where usually the person putting it back in production says "ran two green jobs" as though that was a sign of health. Though in this case, it would be "ran zero green jobs."

https://tbpl.mozilla.org/php/getParsedLog.php?id=22213793&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=22216536&tree=Mozilla-Aurora
https://tbpl.mozilla.org/php/getParsedLog.php?id=22221969&tree=Mozilla-Central

Please disable it, and put it out of our collective misery, it's decomm time.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
https://tbpl.mozilla.org/php/getParsedLog.php?id=22225261&tree=Mozilla-Inbound - now it's back to having a read-only filesystem, which typically results in eating up dozens or hundreds of jobs since they only take a minute each.
Severity: major → blocker
buildbot stopped, disabled in slavealloc.
I just love the dazed look in their eyes after you club them while they're in the middle of a frantic speed-RETRY run.
Severity: blocker → normal
We're fine without it as we don't have that much load on the rev3 fedoras anymore.
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → FIXED
Reopening to officially decommission.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: talos-r3-fed-070 problem tracking → decommission talos-r3-fed-070
Product: mozilla.org → Release Engineering
Removed from buildbot-configs in https://hg.mozilla.org/build/buildbot-configs/rev/bb86e30e979b

It's not in Puppet anymore an it's not in slavealloc, so I think that's all there is to do.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.