Closed
Bug 1302790
Opened 8 years ago
Closed 8 years ago
Treeherder SCL3 prod DB usage increased by 75GB on 8-9th Sept
Categories
(Tree Management :: Treeherder: Infrastructure, defect, P1)
Tree Management
Treeherder: Infrastructure
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
Details
On 8-9th Sept, disk usage increased from 375GB to 445GB. Today (6 days later), it's at 456GB, and increasing each day. See: https://rpm.newrelic.com/accounts/677903/servers/6106888/disks#id=5b2253797374656d2f46696c6573797374656d2f5e646174612f557365642f6279746573222c2253797374656d2f4469736b2f5e6465765e73646231225d This blocks performing the DB migration for Heroku (bug 1283170), since we need to dump the DB to the same disk, however there's now insufficient space to do so, allowing headroom for safety (the disk has 984GB usable). I believe there are a couple of causes: 1) the recent schema migration means there is log data duplicated between several tables 2) (to a lesser extent) cycle_data is now timing out, so we're not expiring old data: https://rpm.newrelic.com/accounts/677903/applications/4180461/filterable_errors#/show/4eecf682-7a4f-11e6-a90b-b82a72d22a14_21418_27301/stack_trace?top_facet=transactionUiName&primary_facet=error.class&barchart=barchart&filters=%5B%7B%22key%22%3A%22transactionUiName%22%2C%22value%22%3A%22cycle-data%22%2C%22like%22%3Afalse%7D%5D&_k=0o5hw5 Please could we all have a concerted effort to see if we can drive the usage down here? Once we're on Heroku we can request as much disk as we like, so can stop worrying about these kind of things..
Comment 1•8 years ago
|
||
The easiest thing would be to stop ingesting text log artifacts and purge them from the database, since they're not actually being used for anything anymore. I left them in so we could revert, but it's been several days now.
Assignee | ||
Comment 2•8 years ago
|
||
I've discovered that the dumps for bug 1246965 was left behind on treeherder2.db.scl3, which are 45GB (they are gzipped). I've removed /data/dump_bug_1246965. [emorley@treeherder1.db.scl3 mysql]$ sudo find . -type f -printf "%f %s\n" | awk '{ split( $1, FILEPARTS, "." ); FILENAME=FILEPARTS[1]; SIZE_IN_MB = int($2/(1024*1024)); FILENAME_MAP[FILENAME]+=SIZE_IN_MB; } END { for( FILENAME in FILENAME_MAP ) { SIZE_IN_GB = FILENAME_MAP[FILENAME] / 1024; printf("%.1fGB %s\n", SIZE_IN_GB, FILENAME); } }' | sort -rh | head -n 15 81.4GB job_detail 57.0GB treeherder1-bin 56.0GB job_artifact 47.8GB performance_datum 47.1GB failure_line 44.7GB job 29.4GB text_log_step 20.8GB job_log 17.9GB text_log_error 7.2GB text_log_summary_line 4.1GB text_log_summary 1.4GB ibdata1 1.0GB revision 0.4GB treeherder1-relay-bin 0.3GB revision_map It's possible some of these tables are fragmented too.
Assignee | ||
Comment 3•8 years ago
|
||
Between: * stopping storing some old artifact types (bug 1258861, bug 1301729) * deleting the old data dump (comment 2) * purging old job_artifacts/defragging most tables (bug 1303069) * the bloated binlogs (from recent data migrations) slowly expiring (we keep 7 days worth) ...prod DB usage has dropped by 102 GB since Wednesday, and is now at 354 GB. There is now plenty of headroom for the DB dump now (and less to dump in the first place).
Assignee: nobody → emorley
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•