disk usage seems to monotonically increase over time

RESOLVED FIXED

Status

Socorro
Antenna
RESOLVED FIXED
6 months ago
2 months ago

People

(Reporter: willkg, Assigned: willkg)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Miles and I were both looking at -prod Antenna dashboards and noticed that the disk usage was increasing monotonically over time. Seems like something is using disk, but not cleaning up after itself.

This bug covers investigating.
In a review from Greg somewhere, pretty sure he said that the cgi.FieldStorage can use disk. That might be the problem. I don't know offhand how it uses disk or whether it cleans up after itself. Maybe Antenna is missing something?

Beyond that, I'm not aware of any disk usage by Antenna. I'll ping Miles to give me access to look at a box and see what I can learn from that.
I looked around the box. There's nothing in /tmp, but there's 8.5gb+ in /var/log. Log rotation is set up to rotate weekly and keep 4 weeks of data. We haven't had this deploy run for 4 weeks, yet.

I think it's logs on disk. If that's correct, then after 4 weeks of disk usage monotonically increasing, we'll start seeing weekly sudden drops as files get deleted.

The last deploy was on May 25th. I'll keep an eye on the logs. If the theory proves true, we'll close this. Otherwise, I'll continue looking into it.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
After like 5 weeks, disk usage did a sudden drop:

https://app.datadoghq.com/dash/274773/antenna--prod?live=false&page=0&is_auto=false&from_ts=1498956270389&to_ts=1498979196329&tile_size=m&fullscreen=false&tpl_var_type=app

It's not a big drop. It went from 58% to 51%. Probably because that first log is only a partial week and weekly log rotation rotates on Sundays.

Given that it got all the way up to 58% with the load we currently have, do we want to adjust log rotation? Maybe keep fewer weeks?
I think we talked about this a while back and decided the current behavior is ok. It's possible that at some point, Antenna will get more busy and it'll be less ok, but we'll get notified by monitoring and can deal with that then.

Given that, I'm going to mark this FIXED.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.