Closed Bug 736911 (bld-linux64-ix-004) Opened 12 years ago Closed 11 years ago

bld-linux64-ix-004 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Unassigned)

References

Details

(Whiteboard: [buildduty][capacity][buildslaves][badslave?])

Attachments

(2 files)

It's down and I can't get to the ipmi interface either.
It's back now without intervention, timeline:

16:28 - twistd.log - finished an android-xul job on try, initiated reboot
16:29 - twistd.log - back on bm14, sits idle
16:37 - #buildduty - all the nagios checks timeout after 15 or 30 seconds
18:41 - twistd.log - loses master, tries to reconnect but has DNS failures until ...
19:41 - twistd.log - idleizer reboots it
19:42 - #buildduty - 88% packet loss (instead of 100)
19:47 - #buildduty - all checks go green
19:53 - twistd.log - starts android job

No history of issues in the list of recent builds.
No longer blocks: 734909
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Has a hard disk failure, screenshot to follow. It doesn't boot with the following error on POST:

SATA Port 0 ST3250318AS  CC38
       S.M.A.R.T Capable and Status BAD
....
AHCI Port0 Device Error
Press F1 to Resume
----

Interestingly it does boot if you use F1 but seems like a bad idea to trust it, even on try.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Whiteboard: [buildduty][capacity][buildslaves] → [buildduty][capacity][buildslaves][badslave?]
Depends on: 788115
Repuppeted and back in the try pool.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
This became bld-linux64-ix-004 in bug 847529.

It has a problem with /:

Checking filesystems
/dev/sda3 contains a file system with errors, check forced.
Error reading block 42691483 (Attempt to read block from filesystem resulted in short read) while reading directory block.

/dev/sda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY
Alias: linux-ix-slave10 → bld-linux64-ix-004
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: linux-ix-slave10 problem tracking → bld-linux64-ix-004 problem tracking
Attached image screenshot
Running fsck yields quite a few "Buffer I/O error on device sda3", which seems to mean the disk is packing up.
arr, would you recommend reimage or replace disk ?
Flags: needinfo?(arich)
nthomas: please open up a bug with dcops and have them investigate/run some diags.  This machine is still under warranty till the beginning of June, so it's a hardware problem, let's get it taken care of now.
Flags: needinfo?(arich)
Depends on: 854577
If this doesn't get reimaged, we need to copy over the correct xrbld ssh keys before putting this back into production. (https://bugzilla.mozilla.org/show_bug.cgi?id=837234#c23)
Enabled in slavealloc after image+new disk and copied ssh keys:

[cltbld@bld-linux64-ix-004 ~]$ scp cltbld@bld-linux64-ix-015:.ssh/* .ssh/

And rebooted
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → FIXED
(In reply to Justin Wood (:Callek) from comment #9)
> Enabled in slavealloc ...

Re-disabled. This host was pulled from the buildbot slave list when we reimaged many around this number as w64, did we intend to NOT make this a windows host, or what?
Status: RESOLVED → REOPENED
Flags: needinfo?(armenzg)
Resolution: FIXED → ---
I think I didn't ask for it to be re-imaged as win64 because it had a note on slavealloc (maybe bad drive) that made me worry and pick another machine instead.
Flags: needinfo?(armenzg)
Comment on attachment 741824 [details] [diff] [review]
[configs] add back to slaves list

Review of attachment 741824 [details] [diff] [review]:
-----------------------------------------------------------------

Oh... I even removed it! Sorry :(
Attachment #741824 - Flags: review?(armenzg) → review+
change in production (todo put machine back in pool)
re-enabled and rebooted.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: