Closed Bug 652962 Opened 13 years ago Closed 13 years ago

buildbot-master2 having disk problems

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 655304

People

(Reporter: dustin, Assigned: zandr)

References

Details

dmesg is full of all sorts of goodness like

ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7fff000)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0xfffe000)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x10)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7ff000)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x38)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x38)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x39)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7ff000)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7fffff8)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x31)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0xfc)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x18)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7ffff1ff)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7ffffff3)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x100)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x39)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x80)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x3ffc0)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x1ffc0)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0xf0000)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7fffffc3)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x3ff8)
ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x40001fff)

Here's the last batch, with times (from /var/log/messages):

Apr 26 11:55:56 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x4)
Apr 26 11:57:56 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)
Apr 26 12:02:02 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)
Apr 26 12:02:44 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x2)
Apr 26 12:02:46 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)
Apr 26 12:07:50 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x4)
Apr 26 12:21:40 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)
Apr 26 12:22:16 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)
Apr 26 12:22:27 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0)
Apr 26 12:32:33 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7fff9fff)
Apr 26 12:49:24 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x9)
Apr 26 13:35:16 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x7c)
Apr 26 13:41:45 buildbot-master2 kernel: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0xfc)

hdparm gives:

/dev/sda:
 Timing cached reads:   28000 MB in  1.99 seconds = 14064.70 MB/sec
 Timing buffered disk reads:  216 MB in  3.01 seconds =  71.66 MB/sec

We've moved all of the slaves off this master for now.  What should we do with it?
Assignee: server-ops-releng → zandr
Blocks: 649734
I'll put this in the next batch going back to iX
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
As a reminder to self, since this bug is the first google hit for me for this error message:
 ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x4)
is more than likely harmless, and due to the ancient centos5.0 kernel not dealing well with hardware that doesn't do NCQ correctly.  I see this error both on perfectly-functional iX systems and on formerly-broken iX systems.

http://forum.soft32.com/linux/ata1-spurious-interrupt-irq_stat-0x8-active_tag-84148995-sac-ftopict337813.html
Better link for that thread:
  http://lkml.org/lkml/2006/12/27/174
How many other machines were we declaring dead due to these errors?
It's always been a secondary observation -- the machine is first reported as slow IO, and then *also* noted to have these errors.
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.