Closed
Bug 736911
(bld-linux64-ix-004)
Opened 12 years ago
Closed 11 years ago
bld-linux64-ix-004 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Unassigned)
References
Details
(Whiteboard: [buildduty][capacity][buildslaves][badslave?])
Attachments
(2 files)
41.80 KB,
image/png
|
Details | |
710 bytes,
patch
|
armenzg
:
review+
|
Details | Diff | Splinter Review |
It's down and I can't get to the ipmi interface either.
Reporter | ||
Comment 1•12 years ago
|
||
It's back now without intervention, timeline: 16:28 - twistd.log - finished an android-xul job on try, initiated reboot 16:29 - twistd.log - back on bm14, sits idle 16:37 - #buildduty - all the nagios checks timeout after 15 or 30 seconds 18:41 - twistd.log - loses master, tries to reconnect but has DNS failures until ... 19:41 - twistd.log - idleizer reboots it 19:42 - #buildduty - 88% packet loss (instead of 100) 19:47 - #buildduty - all checks go green 19:53 - twistd.log - starts android job No history of issues in the list of recent builds.
Reporter | ||
Comment 2•12 years ago
|
||
Has a hard disk failure, screenshot to follow. It doesn't boot with the following error on POST: SATA Port 0 ST3250318AS CC38 S.M.A.R.T Capable and Status BAD .... AHCI Port0 Device Error Press F1 to Resume ---- Interestingly it does boot if you use F1 but seems like a bad idea to trust it, even on try.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Whiteboard: [buildduty][capacity][buildslaves] → [buildduty][capacity][buildslaves][badslave?]
Reporter | ||
Comment 3•12 years ago
|
||
Repuppeted and back in the try pool.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 4•11 years ago
|
||
This became bld-linux64-ix-004 in bug 847529. It has a problem with /: Checking filesystems /dev/sda3 contains a file system with errors, check forced. Error reading block 42691483 (Attempt to read block from filesystem resulted in short read) while reading directory block. /dev/sda3: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY
Alias: linux-ix-slave10 → bld-linux64-ix-004
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: linux-ix-slave10 problem tracking → bld-linux64-ix-004 problem tracking
Reporter | ||
Comment 5•11 years ago
|
||
Running fsck yields quite a few "Buffer I/O error on device sda3", which seems to mean the disk is packing up.
Reporter | ||
Comment 6•11 years ago
|
||
arr, would you recommend reimage or replace disk ?
Flags: needinfo?(arich)
Comment 7•11 years ago
|
||
nthomas: please open up a bug with dcops and have them investigate/run some diags. This machine is still under warranty till the beginning of June, so it's a hardware problem, let's get it taken care of now.
Flags: needinfo?(arich)
Comment 8•11 years ago
|
||
If this doesn't get reimaged, we need to copy over the correct xrbld ssh keys before putting this back into production. (https://bugzilla.mozilla.org/show_bug.cgi?id=837234#c23)
Comment 9•11 years ago
|
||
Enabled in slavealloc after image+new disk and copied ssh keys: [cltbld@bld-linux64-ix-004 ~]$ scp cltbld@bld-linux64-ix-015:.ssh/* .ssh/ And rebooted
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Comment 10•11 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #9) > Enabled in slavealloc ... Re-disabled. This host was pulled from the buildbot slave list when we reimaged many around this number as w64, did we intend to NOT make this a windows host, or what?
Status: RESOLVED → REOPENED
Flags: needinfo?(armenzg)
Resolution: FIXED → ---
Comment 11•11 years ago
|
||
I think I didn't ask for it to be re-imaged as win64 because it had a note on slavealloc (maybe bad drive) that made me worry and pick another machine instead.
Flags: needinfo?(armenzg)
Comment 12•11 years ago
|
||
Attachment #741824 -
Flags: review?(armenzg)
Comment 13•11 years ago
|
||
Comment on attachment 741824 [details] [diff] [review] [configs] add back to slaves list Review of attachment 741824 [details] [diff] [review]: ----------------------------------------------------------------- Oh... I even removed it! Sorry :(
Attachment #741824 -
Flags: review?(armenzg) → review+
Comment 15•11 years ago
|
||
change in production (todo put machine back in pool)
Comment 16•11 years ago
|
||
re-enabled and rebooted.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•