The default bug view has changed. See this FAQ.

develop a script to remove non-public data from the bmo database

RESOLVED FIXED

Status

()

bugzilla.mozilla.org
General
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: glob, Assigned: glob)

Tracking

Production

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

2 years ago
develop a script to remove non-public data from the bmo database, so we have a dump that can be shared with researchers.

this will work using a whitelist of table columns, and the resulting database will not be able to drive a bugzilla installation.
(Assignee)

Comment 1

2 years ago
Created attachment 8639197 [details] [diff] [review]
1187220_1.patch

adds script/remove-non-public-data.pl. which:
- runs sanitizeme.pl
- drops tables and columns which are not listed in the whitelist
- deletes users with no activity public activity

with regards to the actual data i've chosen to include, i've only included data which is public, or is required in order to satisfy database relationships (eg. the internal IDs, linkage tables).

if i had any doubts about a field's inclusion i've dropped it.  it's much easier to add data later upon request than it is to remove already released data.
Attachment #8639197 - Flags: review?(dkl)
Comment on attachment 8639197 [details] [diff] [review]
1187220_1.patch

Review of attachment 8639197 [details] [diff] [review]:
-----------------------------------------------------------------

Codewise looks good. Making a run of it to make sure. r=dkl
Attachment #8639197 - Flags: review?(dkl) → review+
Fresh import of 2015.04.12.sanitized.oneoff.sql.gz and then checksetup.pl to get it up to date.

Ran into the following error:

[snip]
dropping login_failure
dropping logincookies
dropping references to longdescs.isprivate
dropping references to longdescs.already_wrapped
dropping references to longdescs.edit_count
dropping columns from longdescs
DBD::mysql::db do failed: Incorrect key file for table 'longdescs'; try to repair it [for Statement "ALTER TABLE longdescs DROP COLUMN isprivate, DROP COLUMN already_wrapped, DROP COLUMN edit_count"] at scripts/remove-non-public-data.pl line 161.

Could be an issue with my environment. Any idea?

dkl
(Assignee)

Comment 4

2 years ago
(In reply to David Lawrence [:dkl] from comment #3)
> DBD::mysql::db do failed: Incorrect key file for table 'longdescs'; try to
> repair it [for Statement "ALTER TABLE longdescs DROP COLUMN isprivate, DROP
> COLUMN already_wrapped, DROP COLUMN edit_count"] at
> scripts/remove-non-public-data.pl line 161.

yes, it's a problem with your env -- you may not have enough free disk to create the temp table.
Good think I don't run my mysql on OSX:

DBD::mysql::db do failed: Table 'bugs_bmo_clean.PROFILES' doesn't exist [for Statement "
    DELETE FROM PROFILES
     WHERE (SELECT COUNT(*) FROM bugs_activity WHERE bugs_activity.who = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM bugs WHERE bugs.reporter = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM bugs WHERE bugs.assigned_to = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM bugs WHERE bugs.qa_contact = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM bugs WHERE bugs.qa_contact = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM longdescs WHERE longdescs.who = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM longdescs_tags_activity WHERE longdescs_tags_activity.who = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM attachments WHERE attachments.submitter_id = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM flags WHERE flags.setter_id = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM flags WHERE flags.requestee_id = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM flag_state_activity WHERE flag_state_activity.setter_id = profiles.userid) = 0
           AND (SELECT COUNT(*) FROM flag_state_activity WHERE flag_state_activity.requestee_id = profiles.userid) = 0
"] at scripts/remove-non-public-data.pl line 174.

This looks suspiciously like a case-sensitive filesystem thing.
(Assignee)

Comment 6

2 years ago
Created attachment 8645797 [details] [diff] [review]
1187220_2.patch
Attachment #8639197 - Attachment is obsolete: true
Attachment #8645797 - Flags: review?(dylan)
Comment on attachment 8645797 [details] [diff] [review]
1187220_2.patch

Review of attachment 8645797 [details] [diff] [review]:
-----------------------------------------------------------------

r=dylan it completed!
Attachment #8645797 - Flags: review?(dylan) → review+
(Assignee)

Comment 8

2 years ago
To ssh://gitolite3@git.mozilla.org/webtools/bmo/bugzilla.git
   9124d9a..c9094a5  master -> master
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.