Closed
Bug 1241900
Opened 8 years ago
Closed 8 years ago
Clean up raw_adi and raw_adi_log older than 6 months ago
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: peterbe, Assigned: peterbe)
References
Details
As of today, we have 119,677,077 rows in raw_adi and 125,180,287 rows in raw_adi_logs. In https://github.com/mozilla/socorro/pull/3107 and https://github.com/mozilla/socorro/pull/3106 we intend to periodically clean up ancient records. This is done using the transaction based SQL framework that crontabber offers. The FIRST time it runs, it's going to attempt to delete some 31Gb of data all in one big fat transaction. So, let's do a a one-off cleaning without transactions first. Otherwise those crontabber apps are going to fail.
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 1•8 years ago
|
||
By the way, to get to number 31Gb I used:: breakpad=> select table_name, pg_relation_size(table_name), pg_size_pretty(pg_relation_size(table_name)) from information_schema.tables where table_schema = 'public' and (table_name='raw_adi' or table_name='raw_adi_logs'); table_name | pg_relation_size | pg_size_pretty --------------+------------------+---------------- raw_adi_logs | 17580548096 | 16 GB raw_adi | 17675247616 | 16 GB (2 rows) And then I guesstimated that the last 6 months worth of data in both is about 1Gb.
Assignee | ||
Updated•8 years ago
|
Summary: Clean up raw_adi and raw_adi_log → Clean up raw_adi and raw_adi_log older than 6 months ago
Assignee | ||
Comment 2•8 years ago
|
||
Hmm... Perhaps the simplest thing is to just manually execute this on the psql command line.
Assignee | ||
Comment 3•8 years ago
|
||
E.g. https://gist.github.com/peterbe/845f21b535b378eb0d9c
Assignee | ||
Comment 4•8 years ago
|
||
Ok. This has now been applied on stage and prod: Here's how I did it: # On prod $ wget https://gist.github.com/peterbe/845f21b535b378eb0d9c/raw/ddf83f4182d94403263f4e1f5f0563a0f19d27f7/clean-raw-adi.sql $ wget https://gist.github.com/peterbe/845f21b535b378eb0d9c/raw/ddf83f4182d94403263f4e1f5f0563a0f19d27f7/clean-raw-adi-logs.sql $ time psql -U breakpad_rw -h socorro-db.mocotoolsprod.net breakpad < clean-raw-adi.sql $ time psql -U breakpad_rw -h socorro-db.mocotoolsprod.net breakpad < clean-raw-adi-logs.sql In total it took 2m43.674s + 3m9.526s. Now, ...: breakpad=> select count(*) from raw_adi; select count(*) from raw_adi_logs; count ---------- 55776167 (1 row) count ---------- 55414320 (1 row) breakpad=> select min(report_date) from raw_adi_logs; select min(date) from raw_adi; min ------------ 2015-06-01 (1 row) min ------------ 2015-06-01 (1 row) So, instead of 119,677,077 + 125,180,287 (= 244,857,364) We only have 55,776,167 + 55,414,320 (= 111,190,487) rows on those two tables. It's still huge but the nice thing is that with those upcoming crontabber apps, it'll stay like this.
Assignee | ||
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•