Closed Bug 713255 Opened 13 years ago Closed 13 years ago

prune old data from buildbot database(s)

Categories

(Release Engineering :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: catlee)

References

Details

buildbot (statusdb) and buildbot_schedulers (schedulerdb) are quite large, each having more than 2 years of data in it. For most normal queries, we don't need more than a few months of data.

I have a snapshot of statusdb from february 2011, and one of schedulerdb from november 11.

Let's start by removing data prior to jan 1, 2011.

mpressman, sheeri: I need some advice as to the best way to prune data from all these related tables. One approach is to do something like:

DELETE FROM builds WHERE start <= '2011-01-01'
DELETE FROM build_properties WHERE NOT EXISTS (select * from builds where builds.id = build_properties.build_id)

but I suspect that EXISTS query will be expensive.

Do you have some other techniques we could look at using?
Chris,

That would be WONDERFUL to do!  MySQL supports multi-table deletes, so you should be able to do a left join (depending on the version of MySQL but I think anything >=5.0 should be able to do this):

DELETE FROM builds WHERE start <= '2011-01-01';
DELETE FROM build_properties LEFT JOIN builds ON builds.id = build_properties.build_id WHERE builds.id IS NULL;

Or you can cheat a bit - if the builds.id field is populated by an AUTO_INCREMENT field, you can get the max and delete all less than/equal to that:

SELECT @max_id:=max(id) FROM builds WHERE start <= '2011-01-01';
DELETE FROM builds WHERE id<=@max_id;
DELETE FROM build_properties WHERE build_id<=@max_id;

This won't actually free up any disk space until the table is defragmented (rebuilt somehow, perhaps with OPTIMIZE TABLE).

Please coordinate with us to do this maintenance.  It can slow replication, especially the last step of defragmentation.  I'm happy to do the maintenance or just be on hand when you do it, either way.
Assignee: nobody → scabral
Blocks: 617017
I have some cleanup scripts I've been working on for this. We should be dumping out data / pruning regularly.
Assignee: scabral → catlee
Priority: -- → P2
If it's a regular thing, we should use MySQL partitioning on the builds table, because then you can just drop the partition(s) with the older data.  Something like 1 partition per month would work.  I can see if I can dig up some scripts for automatically creating new partitions *and* a Nagios plugin to monitor, so if there aren't new partitions created we'll be notified before it's a problem.

If it's done that way, and partitions are just dropped, the disk space frees up right away and there's no extra step of defragmentation needed.
ran the cleanup script on statusdb:
2011-12-29 04:46:17,116 - builds: deleting old rows
2011-12-29 04:54:26,725 - builds: removed 3159506 rows in 489.61s
2011-12-29 04:54:26,726 - builds: analyzing
2011-12-29 04:54:45,297 - builds: analyzed in 18.57s
2011-12-29 04:54:45,297 - builders: deleting old rows
2011-12-29 04:54:49,765 - builders: removed 9141 rows in 4.47s
2011-12-29 04:54:49,765 - builders: analyzing
2011-12-29 04:54:50,326 - builders: analyzed in 0.56s
2011-12-29 04:54:50,327 - builder_slaves: deleting old rows
2011-12-29 04:55:28,521 - builder_slaves: removed 523562 rows in 38.19s
2011-12-29 04:55:28,521 - builder_slaves: analyzing
2011-12-29 04:55:37,256 - builder_slaves: analyzed in 8.73s
2011-12-29 04:55:37,256 - build_properties: deleting old rows
2011-12-29 05:40:14,219 - build_properties: removed 62227878 rows in 2676.96s
2011-12-29 05:40:14,220 - build_properties: analyzing
2011-12-29 05:43:47,819 - build_properties: analyzed in 213.60s
2011-12-29 05:43:47,820 - build_requests: deleting old rows
2011-12-29 05:43:47,877 - build_requests: removed 0 rows in 0.06s
2011-12-29 05:43:47,878 - build_requests: analyzing
2011-12-29 05:43:47,892 - build_requests: analyzed in 0.01s
2011-12-29 05:43:47,892 - requests: deleting old rows
2011-12-29 05:43:47,908 - requests: removed 0 rows in 0.01s
2011-12-29 05:43:47,908 - requests: analyzing
2011-12-29 05:43:47,911 - requests: analyzed in 0.00s
2011-12-29 05:43:47,911 - request_properties: deleting old rows
2011-12-29 05:43:47,954 - request_properties: removed 0 rows in 0.04s
2011-12-29 05:43:47,955 - request_properties: analyzing
2011-12-29 05:43:47,960 - request_properties: analyzed in 0.01s
2011-12-29 05:43:47,960 - steps: deleting old rows
2011-12-29 07:30:51,010 - steps: removed 91336774 rows in 6423.05s
2011-12-29 07:30:51,011 - steps: analyzing
2011-12-29 07:40:31,672 - steps: analyzed in 580.66s
2011-12-29 07:40:31,673 - sourcestamps: deleting old rows
2011-12-29 07:44:21,567 - sourcestamps: removed 1629545 rows in 229.89s
2011-12-29 07:44:21,568 - sourcestamps: analyzing
2011-12-29 07:44:23,427 - sourcestamps: analyzed in 1.86s
2011-12-29 07:44:23,428 - source_changes: deleting old rows
2011-12-29 07:45:21,512 - source_changes: removed 2268802 rows in 58.08s
2011-12-29 07:45:21,512 - source_changes: analyzing
2011-12-29 07:45:25,610 - source_changes: analyzed in 4.10s
2011-12-29 07:45:25,611 - changes: deleting old rows
2011-12-29 07:45:59,121 - changes: removed 585996 rows in 33.51s
2011-12-29 07:45:59,122 - changes: analyzing
2011-12-29 07:45:59,920 - changes: analyzed in 0.80s
2011-12-29 07:45:59,920 - file_changes: deleting old rows  
2011-12-29 07:47:21,083 - file_changes: removed 2363726 rows in 81.16s
2011-12-29 07:47:21,083 - file_changes: analyzing
2011-12-29 07:47:24,592 - file_changes: analyzed in 3.51s  
2011-12-29 07:47:24,592 - files: deleting old rows
2011-12-29 07:48:25,638 - files: removed 310141 rows in 61.04s
2011-12-29 07:48:25,638 - files: analyzing
2011-12-29 07:48:39,051 - files: analyzed in 13.41s
2011-12-29 07:48:39,051 - properties: deleting old rows
2011-12-29 08:06:00,865 - properties: removed 3129772 rows in 1041.81s
2011-12-29 08:06:00,866 - properties: analyzing
2011-12-29 08:07:08,206 - properties: analyzed in 67.34s   
2011-12-29 08:07:08,207 - master_slaves: deleting old rows 
2011-12-29 08:07:08,490 - master_slaves: removed 0 rows in 0.28s
2011-12-29 08:07:08,490 - master_slaves: analyzing
2011-12-29 08:07:08,497 - master_slaves: analyzed in 0.01s
2011-12-29 08:07:08,497 - masters: deleting old rows
2011-12-29 08:07:09,128 - masters: removed 5 rows in 0.63s
2011-12-29 08:07:09,129 - masters: analyzing
2011-12-29 08:07:09,136 - masters: analyzed in 0.01s
2011-12-29 08:07:09,136 - patches: deleting old rows
2011-12-29 08:07:09,147 - patches: removed 0 rows in 0.01s
2011-12-29 08:07:09,147 - patches: analyzing
2011-12-29 08:07:09,153 - patches: analyzed in 0.01s
2011-12-29 08:07:09,153 - slaves: deleting old rows
2011-12-29 08:07:15,563 - slaves: removed 432 rows in 6.41s
2011-12-29 08:07:15,563 - slaves: analyzing
2011-12-29 08:07:15,570 - slaves: analyzed in 0.01s
2011-12-29 08:07:15,570 - Deleted 167545280 rows in 12058.70s
we're still waiting on the slave to replicate. it's currently 11413 seconds behind.
Because There Will Be Questions, I closed m-i and m-c at 8:15, when the lag getting data to tbpl (and self-serve) started making me more nervous than usual.
Severity: normal → blocker
trees re-opened at around 12:40

we've worked around the slave replication by re-directing reporting to work off the master db.

the slave is currently 16,698 seconds behind now.
Severity: blocker → normal
12:59 < nagios-sjc1> tm-b01-slave01:MySQL Replication Lag is OK: Replication running.  Lag time: 0 seconds

slave finished replicating 8 hours 13 minutes after start of maintenance.
I have dropped the user on the master.
still running on the backup slave, the steps table became corrupt and needed repairing.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.