Closed
Bug 730545
(talos-r4-lion-058)
Opened 12 years ago
Closed 11 years ago
talos-r4-lion-058 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Unassigned)
References
()
Details
(Whiteboard: [decomm])
Attachments
(2 files)
915 bytes,
patch
|
rail
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
430 bytes,
patch
|
rail
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #728535 +++ Same symptom as bug 728535, when talos-r4-snow-007 took to saying that every file in rm -rf tools was an invalid argument, and then dying download builds, saying Cannot write to `firefox-13.0a1.en-US.mac.dmg' (Invalid argument). Anyway, it's chewing up jobs (what jobs there are to chew right now) like crazy because it only takes a couple of seconds to fail to rm and then fail to save a downloaded build.
Comment 1•12 years ago
|
||
Disabled in slavealloc, and did a graceful shutdown on the master. Should in the naughty corner now.
Updated•12 years ago
|
Priority: -- → P3
Updated•12 years ago
|
Alias: talos-r4-lion-058
Component: Release Engineering → Release Engineering: Machine Management
QA Contact: release → armenzg
Summary: talos-r4-lion-058 is broken → talos-r4-lion-058 problem tracking
Comment 2•12 years ago
|
||
Back in the production pool.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 3•12 years ago
|
||
This mini has been repaired, reimaged and placed back in scl1. It is ready to be placed in production.
Comment 4•12 years ago
|
||
Back in production
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 5•12 years ago
|
||
Last job was 8 days ago.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 6•12 years ago
|
||
Power off and on again as it was refusing an ssh connection.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 7•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=14658404&tree=Ionmonkey "inflating: reftest/tests/layout/reftests/fonts/dejavu-sans/DejaVuSans-Oblique.ttf bad CRC 21b8d878 (should be 0b256820)" Expect me to be coming around claiming that the disk is bad before too much longer.
Reporter | ||
Comment 8•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=14758821&tree=Mozilla-Inbound Error 1000 (image data corrupted). calculated CRC32 $C8359846, expected CRC32 $269CA257
Reporter | ||
Comment 9•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=14760663&tree=Ionmonkey inflating: reftest/tests/layout/reftests/fonts/dejavu-sans/DejaVuSans.ttf bad CRC 21e182a9 (should be d8a4d667) Needs hardware diagnostics, I fear.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 10•12 years ago
|
||
comment 3 suggested that this slave was repaired 4 months ago and no one reported any issues. Perhaps it just broke again and that is why it was down for several days?
Comment 11•12 years ago
|
||
Added a note to slavealloc and disabled the slave.
Comment 12•12 years ago
|
||
Back into production now the RAM has been replaced.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 13•12 years ago
|
||
Made it 6 jobs before https://tbpl.mozilla.org/php/getParsedLog.php?id=15309593&tree=Mozilla-Inbound where it apparently went into a coma that was forcibly ended by a reconfig.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 14•12 years ago
|
||
host had memory issues and ram was replaced but looks like issues persisting. will need to bring to apple certified tech in desktop to rerun diagnostics. Bug 794184 opened to track.
Updated•12 years ago
|
Severity: major → normal
Reporter | ||
Comment 15•12 years ago
|
||
I don't actually want to ever see this slave back again, but I know I will anyway, and I feel duty-bound to comment before acking the nagios alert that's been going off forever now: bug 794184 comment 2 claimed there was nothing wrong with this clearly broken slave, and that it was going back into production, but hasn't been heard from in the 6 weeks since, so somebody probably ought to do something so we can get back to reimaging it every few days.
Comment 16•12 years ago
|
||
Did anyone ever actually put this back into production or test it out after the RAM was replaced?
Reporter | ||
Comment 17•12 years ago
|
||
The RAM replacement was 2012-09-14, went back into production 2012-09-17, only did six jobs and then died, no?
Comment 18•12 years ago
|
||
philor: Ah, okay, I was misreading the timeline and thought that the RAM had been replaced after comment 14. I've asked dcops to run some more diags on it to see if they can find something in addition to Apple's own hardware diags (which come up clean).
Comment 19•12 years ago
|
||
Diagnostics were run and came up clean. Returning to production. If it fails again, it will get a sharper hook.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 20•12 years ago
|
||
coop: We've run extensive diagnostics and they've repeatedly come up clean. Despite that, philor says it burns jobs almost immediately when put back into production (see comment 15). I'm not sure where to go from here.
Updated•12 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 21•12 years ago
|
||
(In reply to Amy Rich [:arich] [:arr] from comment #20) > coop: We've run extensive diagnostics and they've repeatedly come up clean. > Despite that, philor says it burns jobs almost immediately when put back > into production (see comment 15). I'm not sure where to go from here. Filed bug 818467 to get this mini re-imaged. One last try before decommissioning.
Comment 22•12 years ago
|
||
Down again. IRC conversation concluded that we should decommission ?
Comment 23•12 years ago
|
||
nthomas: yeah, hardware diags show nothing wrong, we've reimaged it, and all it does is burn jobs and go down.
Comment 24•11 years ago
|
||
Attachment #699975 -
Flags: review?(rail)
Comment 25•11 years ago
|
||
Attachment #699977 -
Flags: review?(rail)
Updated•11 years ago
|
Attachment #699975 -
Flags: review?(rail) → review+
Updated•11 years ago
|
Attachment #699977 -
Flags: review?(rail) → review+
Updated•11 years ago
|
Flags: needinfo?(nobody)
Comment 26•11 years ago
|
||
Hey ben, can we get these patches landed please :-) --> removing from buildduty queue
Flags: needinfo?(bhearsum)
Whiteboard: [badslave?][buildduty] → [badslave?]
Updated•11 years ago
|
Flags: needinfo?(nobody)
Comment 27•11 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #26) > Hey ben, can we get these patches landed please :-) > > --> removing from buildduty queue You could've just landed these yourself, but OK...
Flags: needinfo?(bhearsum)
Updated•11 years ago
|
Attachment #699975 -
Flags: checked-in+
Updated•11 years ago
|
Attachment #699977 -
Flags: checked-in+
Comment 28•11 years ago
|
||
in production
Comment 29•11 years ago
|
||
Decommissioned in bug 820115.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•11 years ago
|
Whiteboard: [badslave?] → [decomm]
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•