Last Comment Bug 1187220 - develop a script to remove non-public data from the bmo database
: develop a script to remove non-public data from the bmo database
Status: RESOLVED FIXED
:
Product: bugzilla.mozilla.org
Classification: Other
Component: General (show other bugs)
: Production
: Unspecified Unspecified
-- normal (vote)
: ---
Assigned To: Byron Jones ‹:glob›
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-23 23:52 PDT by Byron Jones ‹:glob›
Modified: 2015-08-10 23:16 PDT (History)
6 users (show)
See Also:
Due Date:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
1187220_1.patch (8.33 KB, patch)
2015-07-27 02:31 PDT, Byron Jones ‹:glob›
dkl: review+
Details | Diff | Splinter Review
1187220_2.patch (8.33 KB, patch)
2015-08-10 10:29 PDT, Byron Jones ‹:glob›
dylan: review+
Details | Diff | Splinter Review

Description User image Byron Jones ‹:glob› 2015-07-23 23:52:26 PDT
develop a script to remove non-public data from the bmo database, so we have a dump that can be shared with researchers.

this will work using a whitelist of table columns, and the resulting database will not be able to drive a bugzilla installation.
Comment 1 User image Byron Jones ‹:glob› 2015-07-27 02:31:33 PDT
Created attachment 8639197 [details] [diff] [review]
1187220_1.patch

adds script/remove-non-public-data.pl. which:
- runs sanitizeme.pl
- drops tables and columns which are not listed in the whitelist
- deletes users with no activity public activity

with regards to the actual data i've chosen to include, i've only included data which is public, or is required in order to satisfy database relationships (eg. the internal IDs, linkage tables).

if i had any doubts about a field's inclusion i've dropped it.  it's much easier to add data later upon request than it is to remove already released data.
Comment 2 User image David Lawrence [:dkl] 2015-07-30 13:13:03 PDT
Comment on attachment 8639197 [details] [diff] [review]
1187220_1.patch

Review of attachment 8639197 [details] [diff] [review]:
-----------------------------------------------------------------

Codewise looks good. Making a run of it to make sure. r=dkl
Comment 3 User image David Lawrence [:dkl] 2015-07-30 20:39:32 PDT
Fresh import of 2015.04.12.sanitized.oneoff.sql.gz and then checksetup.pl to get it up to date.

Ran into the following error:

[snip]
dropping login_failure
dropping logincookies
dropping references to longdescs.isprivate
dropping references to longdescs.already_wrapped
dropping references to longdescs.edit_count
dropping columns from longdescs
DBD::mysql::db do failed: Incorrect key file for table 'longdescs'; try to repair it [for Statement "ALTER TABLE longdescs DROP COLUMN isprivate, DROP COLUMN already_wrapped, DROP COLUMN edit_count"] at scripts/remove-non-public-data.pl line 161.

Could be an issue with my environment. Any idea?

dkl
Comment 4 User image Byron Jones ‹:glob› 2015-07-30 21:38:07 PDT
(In reply to David Lawrence [:dkl] from comment #3)
> DBD::mysql::db do failed: Incorrect key file for table 'longdescs'; try to
> repair it [for Statement "ALTER TABLE longdescs DROP COLUMN isprivate, DROP
> COLUMN already_wrapped, DROP COLUMN edit_count"] at
> scripts/remove-non-public-data.pl line 161.

yes, it's a problem with your env -- you may not have enough free disk to create the temp table.
Comment 5 User image Dylan Hardison [:dylan] 2015-08-10 10:23:44 PDT
Good think I don't run my mysql on OSX:

DBD::mysql::db do failed: Table 'bugs_bmo_clean.PROFILES' doesn't exist [for Statement "
    DELETE FROM PROFILES
     WHERE (SELECT COUNT(*) FROM bugs_activity WHERE bugs_activity.who = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM bugs WHERE bugs.reporter = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM bugs WHERE bugs.assigned_to = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM bugs WHERE bugs.qa_contact = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM bugs WHERE bugs.qa_contact = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM longdescs WHERE longdescs.who = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM longdescs_tags_activity WHERE longdescs_tags_activity.who = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM attachments WHERE attachments.submitter_id = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM flags WHERE flags.setter_id = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM flags WHERE flags.requestee_id = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM flag_state_activity WHERE flag_state_activity.setter_id = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM flag_state_activity WHERE flag_state_activity.requestee_id = profiles.userid) = 0
"] at scripts/remove-non-public-data.pl line 174.

This looks suspiciously like a case-sensitive filesystem thing.
Comment 6 User image Byron Jones ‹:glob› 2015-08-10 10:29:02 PDT
Created attachment 8645797 [details] [diff] [review]
1187220_2.patch
Comment 7 User image Dylan Hardison [:dylan] 2015-08-10 12:14:42 PDT
Comment on attachment 8645797 [details] [diff] [review]
1187220_2.patch

Review of attachment 8645797 [details] [diff] [review]:
-----------------------------------------------------------------

r=dylan it completed!
Comment 8 User image Byron Jones ‹:glob› 2015-08-10 23:16:57 PDT
To ssh://gitolite3@git.mozilla.org/webtools/bmo/bugzilla.git
   9124d9a..c9094a5  master -> master

Note You need to log in before you can comment on or make changes to this bug.