Closed Bug 707226 Opened 14 years ago Closed 13 years ago

Socorro sp-web0[1-5].phx1.mozilla.com root disk running out of space

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jason, Assigned: rhelmer)

Details

Space was taken up by httpd error logs in /var/log/httpd/crash-stats.mozilla.com/ssl_error_log_2011* I have gzipped these large files in the meantime. This is what I mostly see in the logs: [Fri Dec 02 06:55:43 2011] [error] [client 10.8.81.200] PHP Warning: Invalid callback 1322837743.3588, no array or string given in Unknown on line 0, referer: https://crash-stats.mozilla.com/report/list?rang e_value=7&range_unit=days&signature=chromehang%20%7C%20NtGdiGetFontData&version=Firefox%3A11.0a1 [Fri Dec 02 06:55:43 2011] [error] [client 10.8.81.200] PHP Warning: Invalid callback 6.9080829705011E- 310, no array or string given in Unknown on line 0, referer: https://crash-stats.mozilla.com/report/list ?range_value=7&range_unit=days&signature=chromehang%20%7C%20NtGdiGetFontData&version=Firefox%3A11.0a1 [Fri Dec 02 06:55:43 2011] [error] [client 10.8.81.200] PHP Warning: Invalid callback 6.9080829705011E- 310, no array or string given in Unknown on line 0, referer: https://crash-stats.mozilla.com/report/list ?range_value=7&range_unit=days&signature=chromehang%20%7C%20NtGdiGetFontData&version=Firefox%3A11.0a1 [Fri Dec 02 06:55:43 2011] [error] [client 10.8.81.200] PHP Warning: Invalid callback 6.9080829705011E- 310, no array or string given in Unknown on line 0, referer: https://crash-stats.mozilla.com/report/list ?range_value=7&range_unit=days&signature=chromehang%20%7C%20NtGdiGetFontData&version=Firefox%3A11.0a1 [root@sp-web05.phx1 crash-stats.mozilla.com]# zgrep -c "Invalid callback" ssl_error_log_2011-1*.gz ssl_error_log_2011-11-21-15.gz:13106926 ssl_error_log_2011-11-21-21.gz:6416439 ssl_error_log_2011-11-24-08.gz:18400305 ssl_error_log_2011-11-27-23.gz:6565237 ssl_error_log_2011-11-28-18.gz:5672339 ssl_error_log_2011-11-29-09.gz:70960415 ssl_error_log_2011-11-29-22.gz:13065016 ssl_error_log_2011-11-30-09.gz:11648936 ssl_error_log_2011-11-30-10.gz:11076551
Depends on: 702318
Group: infrasec → infra
Group: infra
No longer depends on: 702318
We're getting a lot of similar messages on crash-stats-dev too, it's filling up too fast to keep up with. Looks like it's mostly: [Fri Dec 09 21:08:51 2011] [error] [client 10.2.74.210] PHP Warning: Invalid callback 1323493731.7324, no array or string given in Unknown on line 0, referer: https://crash-stats-dev.allizom.org/topcrasher/byversion/Firefox/10.0a2/7 Jason, any idea when this started on prod? Trying to figure out a regression window, crash-stats-dev doesn't log that far back.
So prod errors took down staging, do I understand that correctly?
The logs on prod are removed after 10 days so it is hard to say exactly when this begin to occur. During my initial investigation these errors were present in logs from Nov 21 2011.
Assignee: server-ops → jthomas
sp-web01 went critical today, we haven't tracked down the underlying cause yet. dumitru noticed that we're not compressing the logs when they are rotated so he is going to compress them.
I added a new cron that runs daily and gzip's the logs.
(In reply to Dumitru Gherman [:dumitru] from comment #5) > I added a new cron that runs daily and gzip's the logs. Ran every minute of 0400 and nearly took down the webheads :P Fixed it in puppet.
The logs are being gzipped so we probably won't see these hosts run out of space because of large log files. Assigning to rhelmer for further investigation.
Assignee: jthomas → rhelmer
Is this still a problem? If so (and we're rolling logs appropriately) I'd guess that this is due to incoming crashes not always being pushed into hbase and being orphaned in ~/socorro/ (which shouldn't happen but we've seen it before)
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.