Closed Bug 849529 Opened 11 years ago Closed 11 years ago

socorro-collector[1-6].webapp.phx1 low disk

Categories

(Socorro :: Infra, task)

x86
macOS
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mburns, Unassigned)

References

Details

Sat 09:05:44 PST [163] socorro-collector4.webapp.phx1.mozilla.com:Disk - All Early is WARNING: DISK WARNING - free space: / 67708 MB (15% inode=82%): /dev/shm 5947 MB (100% inode=99%): /boot 870 MB (93% inode=99%)

Sat 08:53:35 PST [158] socorro-collector3.webapp.phx1.mozilla.com:Disk - All Early is WARNING: DISK WARNING - free space: / 66572 MB (15% inode=82%): /dev/shm 5947 MB (100% inode=99%): /boot 870 MB (93% inode=99%)
Assignee: tmeyarivan → nobody
Group: metrics-private
Component: Hadoop/HBase Operations → Infra
Product: Mozilla Metrics → Socorro
Target Milestone: Unreviewed → ---
Crashes backed up on disk again.

e.g.
[root@socorro-collector5.webapp.phx1 primaryCrashStore]# find . -type f -name *.dump | wc -l
3335144

I'm not sure how to get crashmover to go back and try to resubmit those crashes; lars is on the road.  mburns is paging jakem for assistance since he was in the trenches on this last time.
Depends on: 849566
Commit pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/c6833b40e079c56c8118e2e23535d02dc399e2ea
Merge pull request #1128 from twobraids/delete_all_dumps

Fixes Bug 849529
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Lars: are these actually deleted?  This bug should remain open until the piled up dumps have been removed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Related alert;

Sat 19:54:14 PST [144] sp-admin01.phx1.mozilla.com:Socorro Admin - cron_submitter-crash-reports.allizom.org.log is CRITICAL: FILE_AGE CRITICAL: /var/log/socorro/cron_submitter-crash-reports.allizom.org.log is 4295 seconds old and 85385563 bytes
Depends on: 849592
Another (possibly) related alert;
Sun 04:27:01 PDT [143] sp-admin01.phx1.mozilla.com:Socorro Admin - cron_create_partitions.log is CRITICAL: FILE_AGE CRITICAL: /var/log/socorro/cron_create_partitions.log is 605152 seconds old and 0 bytes
(In reply to Adrian Fernandez [:Aj] from comment #5)
> Another (possibly) related alert;
> Sun 04:27:01 PDT [143] sp-admin01.phx1.mozilla.com:Socorro Admin -
> cron_create_partitions.log is CRITICAL: FILE_AGE CRITICAL:
> /var/log/socorro/cron_create_partitions.log is 605152 seconds old and 0 bytes

This is not related. I made a mistake in the ownership of a new table. I will investigate why this is still a problem -- was able to run the partitioner function today without an error: 

breakpad=# SELECT * FROM weekly_report_partitions();
weekly_report_partitions|t
Time: 5747.353 ms
socorro-collector4.webapp.phx1 out of the zeus pools to be cleaned up by lars.
socorro-collector1 undrained in Zeus, socorro-collector2 is drained while :lars works on it.
socorro-collector2 undrained in Zeus, socorro-collector3 is drained now.
socorro-collector3 undrained and socorro-collector5 drained, as per lars's request on irc.
collector 5 undrained and collector 6 drained as per lars's request on irc.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.