Closed Bug 564568 Opened 14 years ago Closed 14 years ago

Delete old user accounts

Categories

(addons.mozilla.org Graveyard :: Code Quality, defect, P4)

defect

Tracking

(Not tracked)

RESOLVED FIXED
5.11.8

People

(Reporter: jbalogh, Assigned: clouserw)

References

Details

(Whiteboard: [z])

Can we get rid of all our stale user accounts?  I'd rather have 614,565 accounts than 4.4 million.  This will help us go faster and get ready for an influx of real users in the marketplace.  We wouldn't remove any users that are linked to anything, even if they have an old password.

The newest user to have an old password is from 2009-04-09.  That's when we changed to a better hashing scheme.

Security angle: all the old passwords used md5 without a salt.  That's not great for our users.

mysql> select count(*) from users;
+----------+
| count(*) |
+----------+
|  4447873 | 
+----------+
1 row in set (2.06 sec)

mysql> select count(*) from users where password not like 'sha512%';
+----------+
| count(*) |
+----------+
|  3833308 | 
+----------+
1 row in set (33.87 sec)
Perhaps. I wouldn't want to do this without formalizing our dormant account policy somewhere where people curious could find it. Can you quantify the performance improvement in going from 4.4M to 614K?
I made some charts and graphs, and my calculations show that I don't really know.  I'm merely speculating that if we have less data in the users table, we have less data to index and keep in memory, which gives us more space for things we care about.

Note: since I didn't explicitly mention it, we switched our password scheme to a more secure hashing scheme last April.  When people log in we upgrade their old password to the new scheme automatically.  Anyone who hasn't logged in since the switch has the old password, so we know they haven't been to AMO.
Assignee: nobody → fligtar
-> fligtar for a policy
Target Milestone: 5.11 → 5.11.1
Ok. I'm fine with deleting users with old passwords as long as we add a new field that keeps track of the last time someone logged in. Also, we should not delete any users that have reviews, collections, or add-ons associated with them.

I'll work on a blog post announcing we are going to do this.
Assignee: fligtar → nobody
Sounds good.
Priority: -- → P4
Target Milestone: 5.11.1 → 4.x (triaged)
SELECT user_id FROM addons_users UNION DISTINCT SELECT user_id FROM approvals UNION DISTINCT SELECT user_id FROM api_auth_tokens UNION DISTINCT SELECT user_id FROM editor_subscriptions UNION DISTINCT SELECT user_id FROM groups_users UNION DISTINCT SELECT user_id FROM reviews UNION DISTINCT SELECT user_id FROM reviews_moderation_flags UNION DISTINCT SELECT user_id FROM versioncomments UNION DISTINCT SELECT user_id FROM users_versioncomments UNION DISTINCT SELECT user_id FROM collection_subscriptions UNION DISTINCT SELECT user_id FROM collections_users UNION DISTINCT SELECT user_id FROM collections_votes UNION DISTINCT SELECT user_id FROM howto_votes UNION DISTINCT SELECT user_id FROM users_tags_addons UNION DISTINCT SELECT user_id FROM hubrsskeys

+----------+
| count(*) |
+----------+
|   247783 |
+----------+

tables = """
addons_users
approvals
api_auth_tokens
editor_subscriptions
groups_users
reviews
reviews_moderation_flags
versioncomments
users_versioncomments
collection_subscriptions
collections_users
collections_votes
howto_votes
users_tags_addons
hubrsskeys
""".split()

print ' UNION DISTINCT '.join('SELECT user_id FROM %s' % t for t in tables)
I think a robot threw up in comment 6
So is comment #6 saying that we'd only delete 247,783 users out of 4.4 million?

Is the last-logged-in field in place?
(In reply to comment #8)
> So is comment #6 saying that we'd only delete 247,783 users out of 4.4 million?

Those are the users we're keeping.
Looking forward to the press buzz around it ;)
I think we should only delete users who have old passwords and aren't linked to anything. Once we have the last-logged-in field in place we can set a policy of closing accounts after 2 years without logging in.
mysql> SELECT COUNT(DISTINCT user_id) FROM (SELECT user_id FROM addons_users UNION DISTINCT SELECT user_id FROM approvals UNION DISTINCT SELECT user_id FROM api_auth_tokens UNION DISTINCT SELECT user_id FROM editor_subscriptions UNION DISTINCT SELECT user_id FROM groups_users UNION DISTINCT SELECT user_id FROM reviews UNION DISTINCT SELECT user_id FROM reviews_moderation_flags UNION DISTINCT SELECT user_id FROM versioncomments UNION DISTINCT SELECT user_id FROM users_versioncomments UNION DISTINCT SELECT user_id FROM collection_subscriptions UNION DISTINCT SELECT user_id FROM collections_users UNION DISTINCT SELECT user_id FROM collections_votes UNION DISTINCT SELECT user_id FROM howto_votes UNION DISTINCT SELECT user_id FROM users_tags_addons UNION DISTINCT SELECT user_id FROM hubrsskeys UNION DISTINCT SELECT id as user_id FROM users WHERE password like 'sha512%') J;        
+-------------------------+
| COUNT(DISTINCT user_id) |
+-------------------------+
|                  810057 | 
+-------------------------+
1 row in set (17.77 sec)

mysql> select count(1) from users;
+----------+
| count(1) |
+----------+
|  4577564 | 
+----------+
1 row in set (1.10 sec)
I got suckered into doing this, because otherwise our queries take forever to add display names and user names (and I'm hoping this prevents some duplicates).

Current plan is to use today's db dump to get the IDs to delete and then whack those in production with the next push.  Doing the queries on the live site just takes way too long.
Assignee: nobody → clouserw
Target Milestone: 4.x (triaged) → 5.11.8
I'm using the query from comment 12 to get the `active_users` table.  After that I'm doing this to get a list of users to delete:

`select id from active_users` > users_to_save
`select id from users` > all_users
sort all_users users_to_save | uniq -u > delete_these_users

someone want to double check that logic please?
Looks fine to me.  What's the `wc -l` on those files?
clouserw@khan:~$ wc -l all_users users_to_save delete_these_users 
 4622327 all_users
  859523 users_to_save
 3762806 delete_these_users
 9244656 total
The only flaw with this method is if someone jumps into users_to_save after you've made the files.
Yeah, I'm fine with that.  I can recreate the files fairly easily at this point, so I can make them again before we do the final run.
Alright, list of users to delete is here:

http://people.mozilla.com/~clouserw/temp/delete_these_users

That's a 27M text file, you might not want to click on it in a browser.  If anyone sees anything amiss, speak now or forever hold your peace.
I spot checked a few accounts in that list and they seem fine to me.
This is still running on preview.  2 million user accounts to go.  We'll have to start this early in production.
Blocks: 582727
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Blocks: 591207
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.