Closed
Bug 601123
Opened 15 years ago
Closed 15 years ago
please run hardware diagnostics on linux-ix-slave17
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Unassigned)
References
Details
(Whiteboard: [fixed by IX (drive with bad sectors), will return to SCL][buildduty])
This machine repeatedly got into uninterruptible sleep while doing disk-heavy operations and required rebooting.
Updated•15 years ago
|
Assignee: server-ops → jlazaro
Comment 1•15 years ago
|
||
Sent an email to IX support to take a look at this/ask for advice
Updated•15 years ago
|
Assignee: jlazaro → server-ops
Updated•15 years ago
|
Assignee: server-ops → jlazaro
Comment 2•15 years ago
|
||
Just handed this machine (Asset tag #4636) to Ramsey of IX Systems, root password reset, so that they can debug this issue further.
Comment 3•15 years ago
|
||
Looks like theyve found a bad disk on this machine, will update as soon as i find out more
Comment 4•15 years ago
|
||
Got this back yesterday, here's an update from Matt Finney of IX regarding this machine:
--
Hello,
The system has been repaired. Originally the drive was disconnecting and reconnecting to SATA repeatedly while idle. I ran Seagate diagnostics on the drive, which found 2 bad sectors that I repaired. I replaced the SATA cable as well.
I'm not able to reproduce any disk issues after those changes.
--
Will bring back on next trip to Internap
Flags: colo-trip+
Whiteboard: [fixed by IX (drive with bad sectors), will return to SCL]
Updated•15 years ago
|
Status: NEW → ASSIGNED
| Reporter | ||
Comment 5•15 years ago
|
||
Thanks!
I guess the overall conclusion with these machines is that we'll treat issues on a case-by-case basis? Eg, IX Systems has not found anything that leads them to believe there's a larger issue at work?
Updated•15 years ago
|
Assignee: jlazaro → jdow
Comment 6•15 years ago
|
||
Brought machine to Internap and it is online now.
Note: This machine was originally at Castro, so RelEng will need to update it to work at Internap.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Comment 7•15 years ago
|
||
Please put the root password back to the RelEng value.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 8•15 years ago
|
||
password reset to current RelEng value.
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Comment 9•15 years ago
|
||
I'm not sure this is fixed. It was pretty slow at cleaning up files from disk, so I started a fsck on /builds and 20 minutes later it's still on Pass 1. I'll let it run a bit more ...
Comment 10•15 years ago
|
||
Took 35 minutes on a fairly empty 168G partition, which seems a bit slow. More info on bug 611128.
Comment 11•15 years ago
|
||
Yeah, definitely still some disk error on this, see bug 611128 comment #4 for details on a 12x slowdown. We could try re-imaging the machine if it's a data error, or do more diagnostics to look for a busted drive. Thoughts ?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 12•15 years ago
|
||
Is this out of production, such that I can re-image at will?
Comment 13•15 years ago
|
||
ping?
Comment 14•15 years ago
|
||
I believe it's down. I can't ssh to it in any case, and it hasn't done any work since october.
Comment 15•15 years ago
|
||
I re-imaged it. It is currently stuck on trying to start puppet. Not sure how to get around that. Please check it out and see if things are better or not. The machine is reachable over IPMI at 10.12.48.253.
Comment 16•15 years ago
|
||
grabbing the bug to look at it's puppet config and see if we can get it back online monday
Assignee: jdow → bear
Updated•15 years ago
|
Whiteboard: [fixed by IX (drive with bad sectors), will return to SCL] → [fixed by IX (drive with bad sectors), will return to SCL][buildduty]
Comment 17•15 years ago
|
||
What is the status on this? If the box is back up can we close this bug?
Updated•15 years ago
|
Assignee: bear → server-ops-releng
Component: Server Operations → Server Operations: RelEng
QA Contact: mrz → zandr
Comment 18•15 years ago
|
||
Filed bug 636827 for the post-imaging work by releng.
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•