Closed Bug 1330479 Opened 8 years ago Closed 8 years ago

Remove unused datasource databases and tables

Categories

(Tree Management :: Treeherder: Infrastructure, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(2 files)

In the `treeherder` DB there are the following leftover tables from manual schema migrations/testing (compared to a clean Vagrant environment): NB: Stage RDS only has 50GB space remaining (since it has 500GB storage unlike prod's 750GB; these will be equalised next time we reset stage to be equal to prod due to the way snapshot restore works) Prod: + treeherder.job_test Stage: + treeherder.job_test + treeherder.job_with_push_id + treeherder.performance_datum_old Dev: + treeherder.performance_datum_mandatory_push In addition all environments will have the now unused per-project databases, that we should remove to free up space (since cycle_data is no longer running against them, and they duplicate data migrated to the main treeherder DB). The list of these databases varies by environment, so will need to be generated using: SELECT CONCAT('DROP DATABASE ', schema_name, ';') AS stmt FROM information_schema.SCHEMATA WHERE schema_name NOT IN ('mysql', 'information_schema', 'performance_schema', 'innodb', 'sys', 'tmp', 'treeherder'); Will, are you happy for me to clear these out? If not the databases quite yet, how about the leftover treeherder.* tables, to at least alleviate the stage space issues.
Flags: needinfo?(wlachance)
Thanks for filing this, I was meaning to but hadn't got around to it yet. We can remove the test databases immediately. I'd probably give another couple of weeks before removing the per-project databases: I highly doubt they would be helpful even in an emergency, but can't say the chance is non-zero. :)
Flags: needinfo?(wlachance)
(although now that I think about it, you should be able to remove the per-project databases from stage immediately)
Attached file SQL - stage
I've dropped the DBs / treeherder.foo tables on stage (see attachment), but will leave prod a bit longer. This at least alleviates the space issue on stage.
Attached file SQL - prod
Generated using: SELECT CONCAT('DROP DATABASE ', schema_name, ';') AS stmt FROM information_schema.SCHEMATA WHERE schema_name NOT IN ('mysql', 'information_schema', 'performance_schema', 'innodb', 'sys', 'tmp', 'treeherder'); Are you comfortable running this now, or should we wait a bit longer?
Attachment #8828102 - Flags: review?(wlachance)
Comment on attachment 8828102 [details] SQL - prod I'm not really sure what the right answer is to be honest, I always tends towards the side of better safe than sorry, but it is a bit difficult to imagine a case where this information would be genuinely useful. I guess I'll just leave it to you. :)
Attachment #8828102 - Flags: review?(wlachance) → review+
It's now been ~16 days since data stopped being ingested into the old tables (so they are getting increasingly stale), and more than that since we stopped returning data from those tables (so I would have thought we'd have found any missing/corrupt migrated data by now). As such, I've run the prod SQL against prod (plus also dev, since that just cloned from prod) :-)
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: