Closed Bug 1196641 Opened 9 years ago Closed 9 years ago

node1.bagheera.metrics.scl3.mozilla.com:Disk - /data is WARNING:

Categories

(Infrastructure & Operations :: MOC: Problems, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sal, Unassigned)

Details

No description provided.
nagios-scl3> (IRC) Wed 23:45:58 PDT [5477] node1.bagheera.metrics.scl3.mozilla.com:Disk - /data is WARNING: DISK WARNING - free space: /data 513007 MB (14% inode=99%): [root@node1.bagheera.metrics.scl3 sespinoza]# du -chx --max-depth=1 / 6.7G /usr 5.0G /var 4.0G /home 17G / 17G total also :sheeri [root@node1.bagheera.metrics.scl3 home]# du -hsc * | grep G | sort -nr | grep -v total 4.0G scabral
Summary: disk → node1.bagheera.metrics.scl3.mozilla.com:Disk - /data is WARNING:
Flags: needinfo?(scabral)
The problem is in /data, not in / I removed logs older than 14 days in /data, there were some March and February 2014 logs.
Flags: needinfo?(scabral)
This alerted again today: Thu 12:32:23 PDT [5258] node1.bagheera.metrics.scl3.mozilla.com:Disk - All is WARNING: DISK WARNING - free space: /data 378340 MB (10% inode=99%) Talked with sheeri: [12:37:05] <sheeri> ashlee: it's one node in a cluster, and other servers have more space, so I think this might just be regular growth [12:37:29] <sheeri> I'm thinking of ack'ing the warning and worrying if it goes critical, I think this is just normal growth, and we need to be off these servers by the end of the year
It's getting a bit cozy. Fri 01:57:39 PDT [5000] node1.bagheera.metrics.scl3.mozilla.com:Disk - /data is CRITICAL: DISK CRITICAL - free space: /data 180054 [rchilds@node1.bagheera.metrics.scl3 ~]$ sudo du -chx --max-depth=1 /data/kafka-logs/ | grep G 796G /data/kafka-logs/metrics-1 794G /data/kafka-logs/metrics-3 796G /data/kafka-logs/metrics-2 796G /data/kafka-logs/metrics-0 1.1G /data/kafka-logs/sslreports-0
alerting again nagios-scl3> (IRC) Fri 09:27:38 PDT [5149] node1.bagheera.metrics.scl3.mozilla.com:Disk - /data is CRITICAL: DISK CRITICAL - free space: /data 25596 MB (0% inode=99%): (http://m.mozilla.org/Disk+-+/data)
Had to restart bagheera and kafka to release some of the space. Took the opportunity to patch software and firmware. Everything came up OK and things are great now: -bash-4.1$ df -h /data Filesystem Size Used Avail Use% Mounted on /dev/sda4 3.5T 2.1T 1.3T 63% /data
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: MOC: Incidents → MOC: Problems
You need to log in before you can comment on or make changes to this bug.