Closed Bug 421025 Opened 16 years ago Closed 16 years ago

Script to sanitize Bugzilla database needs to be updated for 3.0 schema

Categories

(bugzilla.mozilla.org :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: justdave, Assigned: justdave)

Details

Attachments

(3 files)

We have a script we use to sanitize a Bugzilla database to get all confidential data out of it before providing it to Universities who want to study our data.  We don't need it often (as evidenced by the fact that it's never been updated for 3.0).  But we got two requests in the last couple days which have both been approved, and it'd suck to offer them data from December 2006 (which is the last data we have which the script is known to work on).

The script currently works on version 2.20.x (not even 2.22).

I'll attach it momentarily.
OK, I take that back, it looks like we did get it updated for 2.22 in June of 2007.  I'd still appreciate a sanitycheck from anyone familiar with the schema. :)
hmm, the timing is wrong on that, we were already running 3.0 at the time.  Must have just been wiki cleanup.  I bet that's a typo and it really is for 2.20.
I dropped 2 DELETEs for tables that shouldn't exist in this schema, or the 2.20 for that matter.

I took a stab on deleting attachments from attach_data by joining on both attachments and bug_group_map (fist sql statement).
Attachment #314492 - Flags: review?(mkanat)
Everything in profiles_activity is deleted in the current script, but profile_setting is left alone.  Everything in namedqueries is deleted, but nothing related to watches are.

Is there something special about profiles_activity and namedqueires that differ a lot from other profile-related tables or watches? 
Attachment #314492 - Attachment mime type: application/octet-stream → text/plain
Assignee: nobody → alex
QA Contact: reed → other-bmo-issues
Settings were new at some point, and probably got left out on accident.  They should go, too if namedqueries do.  I also note that we have some hidden products on b.m.o now, and those should be deleted as well (after the bugs are) before the groups are nuked.  Any product with a mandatory/mandatory group control setting should go.  Flag inclusions/exclusions would also need to be updated if there's any flags defined for said products.  All components of said products would need to be nuked before the product is, too (and component CCs for those components), and milestones, versions (and the new product-specific fixed-in).

Wow, this is suddenly much more complicated.
I ended up converting this to a Perl script instead of raw SQL because that made it easy to run it within a copy of Bugzilla and hijack the built-in product-delete code, since deleting a product is so complicated.  Want to test it a little first, and make sure I'm doing it right.  Now if only I could find a database server with enough free disk space to load a copy of the production Bugzilla database (now 10 GB - need at least twice that free to be able to load the thing) :(
Attached file sanitizeme.pl
And here's the aforementioned perl script, for posterity.

Also added that I forgot to mention before is removing insidergroup-flagged data (comments and attachments that are marked 'isprivate').
Assignee: alex → justdave
Status: NEW → ASSIGNED
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment on attachment 314492 [details]
changed for the 3.0 schema

justdave has already been keeping this script up to date, so I don't need to review this one.
Attachment #314492 - Flags: review?(mkanat)
Component: Bugzilla: Other b.m.o Issues → General
Product: mozilla.org → bugzilla.mozilla.org
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: