Closed Bug 1241900 Opened 8 years ago Closed 8 years ago

Clean up raw_adi and raw_adi_log older than 6 months ago

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: peterbe, Assigned: peterbe)

References

Details

As of today, we have 119,677,077 rows in raw_adi and 125,180,287 rows in raw_adi_logs. 

In https://github.com/mozilla/socorro/pull/3107 and https://github.com/mozilla/socorro/pull/3106 we intend to periodically clean up ancient records. This is done using the transaction based SQL framework that crontabber offers. The FIRST time it runs, it's going to attempt to delete some 31Gb of data all in one big fat transaction. 

So, let's do a a one-off cleaning without transactions first. Otherwise those crontabber apps are going to fail.
Assignee: nobody → peterbe
Blocks: 1227142, 1227131
By the way, to get to number 31Gb I used::

breakpad=> select table_name, pg_relation_size(table_name), pg_size_pretty(pg_relation_size(table_name))
from information_schema.tables
where table_schema = 'public' and (table_name='raw_adi' or table_name='raw_adi_logs');
  table_name  | pg_relation_size | pg_size_pretty
--------------+------------------+----------------
 raw_adi_logs |      17580548096 | 16 GB
 raw_adi      |      17675247616 | 16 GB
(2 rows)

And then I guesstimated that the last 6 months worth of data in both is about 1Gb.
Summary: Clean up raw_adi and raw_adi_log → Clean up raw_adi and raw_adi_log older than 6 months ago
Hmm... Perhaps the simplest thing is to just manually execute this on the psql command line.
Ok. This has now been applied on stage and prod: Here's how I did it:

# On prod
$ wget https://gist.github.com/peterbe/845f21b535b378eb0d9c/raw/ddf83f4182d94403263f4e1f5f0563a0f19d27f7/clean-raw-adi.sql
$ wget https://gist.github.com/peterbe/845f21b535b378eb0d9c/raw/ddf83f4182d94403263f4e1f5f0563a0f19d27f7/clean-raw-adi-logs.sql
$ time psql -U breakpad_rw -h socorro-db.mocotoolsprod.net breakpad  < clean-raw-adi.sql
$ time psql -U breakpad_rw -h socorro-db.mocotoolsprod.net breakpad  < clean-raw-adi-logs.sql

In total it took 2m43.674s + 3m9.526s. 

Now, ...:

breakpad=> select count(*) from raw_adi; select count(*) from raw_adi_logs;
  count
----------
 55776167
(1 row)

  count
----------
 55414320
(1 row)
breakpad=> select min(report_date) from raw_adi_logs; select min(date) from raw_adi;
    min
------------
 2015-06-01
(1 row)
    min
------------
 2015-06-01
(1 row)


So, instead of 119,677,077 + 125,180,287 (= 244,857,364)
We only have 55,776,167 + 55,414,320 (= 111,190,487) rows on those two tables.

It's still huge but the nice thing is that with those upcoming crontabber apps, it'll stay like this.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.