Got this back yesterday, here's an update from Matt Finney of IX regarding this machine: -- Hello, The system has been repaired. Originally the drive was disconnecting and reconnecting to SATA repeatedly while idle. I ran Seagate diagnostics on the drive, which found 2 bad sectors that I repaired. I replaced the SATA cable as well. I'm not able to reproduce any disk issues after those changes. -- Will bring back on next trip to Internap

Flags: colo-trip+

Whiteboard: [fixed by IX (drive with bad sectors), will return to SCL]

Justin Lazaro [:jlaz] (use needinfo)

Updated

•

15 years ago

Status: NEW → ASSIGNED

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 5

•

15 years ago

Thanks! I guess the overall conclusion with these machines is that we'll treat issues on a case-by-case basis? Eg, IX Systems has not found anything that leads them to believe there's a larger issue at work?

Justin Lazaro [:jlaz] (use needinfo)

Updated

•

15 years ago

Assignee: jlazaro → jdow

Justin Dow [:jabba]

Comment 6

•

15 years ago

Brought machine to Internap and it is online now. Note: This machine was originally at Castro, so RelEng will need to update it to work at Internap.

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Nick Thomas [:nthomas] (UTC+12)

Comment 7

•

15 years ago

Please put the root password back to the RelEng value.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Nick Thomas [:nthomas] (UTC+12)

Updated

•

15 years ago

Blocks: 611128

Justin Dow [:jabba]

Comment 8

•

15 years ago

password reset to current RelEng value.

Status: REOPENED → RESOLVED

Closed: 15 years ago → 15 years ago

Resolution: --- → FIXED

Nick Thomas [:nthomas] (UTC+12)

Comment 9

•

15 years ago

I'm not sure this is fixed. It was pretty slow at cleaning up files from disk, so I started a fsck on /builds and 20 minutes later it's still on Pass 1. I'll let it run a bit more ...

Nick Thomas [:nthomas] (UTC+12)

Comment 10

•

15 years ago

Took 35 minutes on a fairly empty 168G partition, which seems a bit slow. More info on bug 611128.

Nick Thomas [:nthomas] (UTC+12)

Comment 11

•

15 years ago

Yeah, definitely still some disk error on this, see bug 611128 comment #4 for details on a 12x slowdown. We could try re-imaging the machine if it's a data error, or do more diagnostics to look for a busted drive. Thoughts ?

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Justin Dow [:jabba]

Comment 12

•

15 years ago

Is this out of production, such that I can re-image at will?

Justin Dow [:jabba]

Comment 13

•

15 years ago

ping?

Chris AtLee [:catlee]

Comment 14

•

15 years ago

I believe it's down. I can't ssh to it in any case, and it hasn't done any work since october.

Justin Dow [:jabba]

Comment 15

•

15 years ago

I re-imaged it. It is currently stuck on trying to start puppet. Not sure how to get around that. Please check it out and see if things are better or not. The machine is reachable over IPMI at 10.12.48.253.

Mike Taylor [:bear]

Comment 16

•

15 years ago

grabbing the bug to look at it's puppet config and see if we can get it back online monday

Assignee: jdow → bear

Mike Taylor [:bear]

Updated

•

15 years ago

Whiteboard: [fixed by IX (drive with bad sectors), will return to SCL] → [fixed by IX (drive with bad sectors), will return to SCL][buildduty]

Corey Shields [:cshields]

Comment 17

•

15 years ago

What is the status on this? If the box is back up can we close this bug?

Corey Shields [:cshields]

Updated

•

15 years ago

Assignee: bear → server-ops-releng

Component: Server Operations → Server Operations: RelEng

QA Contact: mrz → zandr

Chris Cooper [:coop] (he/him)

Updated

•

15 years ago

Blocks: 636827

Chris Cooper [:coop] (he/him)

Comment 18

•

15 years ago

Filed bug 636827 for the post-imaging work by releng.

Status: REOPENED → RESOLVED

Closed: 15 years ago → 15 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

12 years ago

Component: Server Operations: RelEng → RelOps

Product: mozilla.org → Infrastructure & Operations

You need to log in before you can comment on or make changes to this bug.