Closed Bug 1079967 Opened 10 years ago Closed 9 years ago

/home cleanup time on peach-gw.peach.metrics.scl3.mozilla.com

Categories

(Infrastructure & Operations :: MOC: Problems, task)

Other
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nagiosapi, Assigned: rwatson)

References

()

Details

(Whiteboard: [id=nagios1.private.scl3.mozilla.com:438621])

Automated alert report from nagios1.private.scl3.mozilla.com:

Hostname: peach-gw.peach.metrics.scl3.mozilla.com
Service:  Disk - All
State:    WARNING
Output:   DISK WARNING - free space: /home 20897 MB (10% inode=97%):

Runbook:  http://m.allizom.org/Disk+-+All
[root@peach-gw.peach.metrics.scl3 home]# du -sh * | grep G | sort -nr | head -5
44G	tmary
36G	hulmer
28G	metrics-etl
8.3G	isegall
7.9G	juber
Flags: needinfo?(tmeyarivan)
Flags: needinfo?(juber)
Flags: needinfo?(isegall)
Flags: needinfo?(hulmer)
Summary: Disk - All on peach-gw.peach.metrics.scl3.mozilla.com is WARNING: DISK WARNING - free space: /home 20897 MB (10% inode=97%): → /home cleanup time on peach-gw.peach.metrics.scl3.mozilla.com
Automated alert recovery:

Hostname: peach-gw.peach.metrics.scl3.mozilla.com
Service:  Disk - All
State:    OK
Output:   DISK OK
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
[root@peach-gw.peach.metrics.scl3 home]# du -sh * | grep G | sort -nr | head -5
36G	hulmer
28G	metrics-etl
27G	tmary
8.3G	isegall
7.9G	juber
Flags: needinfo?(tmeyarivan)
Could you provide me with a little more info on what the issue is? I recognize I have a lot of big data sets in my /home/ right now but see both a 10% and a 97% above. Do I need to consolidate?
Flags: needinfo?(hulmer)
(In reply to Hamilton from comment #4)
> Could you provide me with a little more info on what the issue is? I
> recognize I have a lot of big data sets in my /home/ right now but see both
> a 10% and a 97% above. Do I need to consolidate?

Issue is that disk is getting full so cleanups needs to happen. You have the biggest homedir :-)
It's alerting again.
(In reply to Hamilton from comment #4)
> Could you provide me with a little more info on what the issue is? I
> recognize I have a lot of big data sets in my /home/ right now but see both
> a 10% and a 97% above. Do I need to consolidate?

Please consider compressing large data files under /home/hulmer:
(MB, path)

553     /home/hulmer/jydoop/outData/fennec-fhr-small.csv
2888    /home/hulmer/jydoop/outData/stub_flat.csv
11826   /home/hulmer/jydoop/outData/lightbeam-sample.csv
18073   /home/hulmer/jydoop/outData/fennec-fhr-sample.csv

468     /home/hulmer/fennec-rollups/fennec-monthly-2014-09-29.json
505     /home/hulmer/fennec-rollups/fennec-monthly-2014-10-13.json
506     /home/hulmer/fennec-rollups/fennec-monthly-2014-10-20.json
1474    /home/hulmer/fennec-rollups/fennec-weekly-2014-09-29.json
1524    /home/hulmer/fennec-rollups/fennec-weekly-2014-10-13.json
1539    /home/hulmer/fennec-rollups/fennec-weekly-2014-10-20.json
7261    /home/hulmer/fennec-rollups/fennec-daily-2014-10-13.json
7332    /home/hulmer/fennec-rollups/fennec-daily-2014-10-20.json

--
Flags: needinfo?(hulmer)
Flags: needinfo?(isegall)
Here is the most recent tally:

root@peach-gw.peach.metrics.scl3 home]# du -sh * | grep G
7.3G	amo_prod
3.1G	aphadke
1.5G	bcolloran
5.4G	bsmedberg
2.0G	dlarlet
4.1G	glind
2.6G	gszorc
35G	hulmer
72G	isegall
8.9G	juber
4.4G	kmaglione
5.9G	mchew
2.4G	metrics-etl
1.7G	mhilyaev
2.6G	rfradinho
2.3G	sguha
26G	tmary
[root@peach-gw.peach.metrics.scl3 home]#
Oh did I mention file system is filled up?

/dev/sdb1             197G  187G   18M 100% /home


that sucks;(
Haha yes - understood. I'm going to move some of my data dumps to hdfs - that should help quite a bit.

Thanks for the patience.
Flags: needinfo?(hulmer)
Moved my local dumps to hdfs (where they belong anyway). That should probably shave off ~32gb
Whiteboard: [id=nagios1.private.scl3.mozilla.com:438621] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2555] [id=nagios1.private.scl3.mozilla.com:438621]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2555] [id=nagios1.private.scl3.mozilla.com:438621] → [id=nagios1.private.scl3.mozilla.com:438621]
Guys, 

Its alive....


[root@peach-gw.peach.metrics.scl3 home]# du -sh * | grep G | sort -rn
58G	hulmer
33G	tmary
16G	metrics-etl
13G	sguha
8.9G	juber
8.3G	isegall
6.8G	amo_prod
5.9G	mchew
5.4G	bsmedberg
4.4G	kmaglione
4.1G	glind
3.1G	aphadke
2.6G	gszorc
2.0G	dlarlet
1.7G	mhilyaev
1.5G	bcolloran
[root@peach-gw.peach.metrics.scl3 home]#
Same again today. hulmer are you able to generate your logs compressed?
Flags: needinfo?(hulmer)
Yes - and I can get rid of old logs, so that will help. Just cleared the offending directory.
Flags: needinfo?(hulmer)
same again! Can this be taken care of?
Flags: needinfo?(hulmer)
Seems to have stopped alerting since old logs and compressions?
Assignee: nobody → rwatson
Seems to be stable for now. Will re-open if this occurs again.
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → FIXED
Annnnd as soon as I close it, it comes back:
	Wed 02:01:18 PST [5637] peach-gw.peach.metrics.scl3.mozilla.com:Disk - All is WARNING: DISK WARNING - free space: /home 20728 MB (10% inode=96%)

Can I ask for a general clean up please!
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Flags: needinfo?(juber)
Flags: needinfo?(hulmer)
Component: MOC: Incidents → MOC: Problems
You need to log in before you can comment on or make changes to this bug.