Closed Bug 694908 Opened 13 years ago Closed 13 years ago

sync1.db.scl2.svc: RAM, SATA issues

Categories

(Cloud Services :: Operations: Miscellaneous, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Atoll, Assigned: jlaz)

Details

(Whiteboard: [RAM swapped, check back in a week])

Either all four RAM sticks are bad, or this server needs a chassis swap.  I'm restarting MySQL to make sure it can still start up at all, but we should fix this ASAP before it destroys itself unexpectedly.

# dmesg | grep --only-matching 'DRAM-Bank=[0-9]' | sort | uniq -c
    133 DRAM-Bank=0
     96 DRAM-Bank=1
    123 DRAM-Bank=2
    147 DRAM-Bank=3

# dmesg | grep ^ata | grep 'failed' | sort | uniq -c
     36 ata3.00: failed command: READ DMA EXT
      1 ata4.00: failed command: FLUSH CACHE EXT
      1 ata4: reset failed, giving up
      4 ata4: softreset failed (device not ready)
     31 ata7.00: failed command: READ FPDMA QUEUED
      1 ata7: failed to read log page 10h (errno=-5)
      1 ata9.00: failed command: WRITE FPDMA QUEUED
We'll do a full repair pass tomorrow, but for now leaving in service with MySQL freshly restarted. If this goes down or starts flapping horrifically, escalate and we'll migrate.
Component: Operations → Operations: Hardware
RAM replaced, will check in a week to see if SATA/RAM issues persist
Whiteboard: [RAM swapped, check back in a week]
No RAM issues since, closing out
Assignee: nobody → jlaz
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Component: Operations: Hardware → Operations
You need to log in before you can comment on or make changes to this bug.