Closed
Bug 564568
Opened 14 years ago
Closed 14 years ago
Delete old user accounts
Categories
(addons.mozilla.org Graveyard :: Code Quality, defect, P4)
Tracking
(Not tracked)
RESOLVED
FIXED
5.11.8
People
(Reporter: jbalogh, Assigned: clouserw)
References
Details
(Whiteboard: [z])
Can we get rid of all our stale user accounts? I'd rather have 614,565 accounts than 4.4 million. This will help us go faster and get ready for an influx of real users in the marketplace. We wouldn't remove any users that are linked to anything, even if they have an old password. The newest user to have an old password is from 2009-04-09. That's when we changed to a better hashing scheme. Security angle: all the old passwords used md5 without a salt. That's not great for our users. mysql> select count(*) from users; +----------+ | count(*) | +----------+ | 4447873 | +----------+ 1 row in set (2.06 sec) mysql> select count(*) from users where password not like 'sha512%'; +----------+ | count(*) | +----------+ | 3833308 | +----------+ 1 row in set (33.87 sec)
Comment 1•14 years ago
|
||
Perhaps. I wouldn't want to do this without formalizing our dormant account policy somewhere where people curious could find it. Can you quantify the performance improvement in going from 4.4M to 614K?
Reporter | ||
Comment 2•14 years ago
|
||
I made some charts and graphs, and my calculations show that I don't really know. I'm merely speculating that if we have less data in the users table, we have less data to index and keep in memory, which gives us more space for things we care about. Note: since I didn't explicitly mention it, we switched our password scheme to a more secure hashing scheme last April. When people log in we upgrade their old password to the new scheme automatically. Anyone who hasn't logged in since the switch has the old password, so we know they haven't been to AMO.
Assignee | ||
Updated•14 years ago
|
Assignee: nobody → fligtar
Assignee | ||
Comment 3•14 years ago
|
||
-> fligtar for a policy
Assignee | ||
Updated•14 years ago
|
Target Milestone: 5.11 → 5.11.1
Comment 4•14 years ago
|
||
Ok. I'm fine with deleting users with old passwords as long as we add a new field that keeps track of the last time someone logged in. Also, we should not delete any users that have reviews, collections, or add-ons associated with them. I'll work on a blog post announcing we are going to do this.
Assignee: fligtar → nobody
Assignee | ||
Comment 5•14 years ago
|
||
Sounds good.
Priority: -- → P4
Target Milestone: 5.11.1 → 4.x (triaged)
Reporter | ||
Comment 6•14 years ago
|
||
SELECT user_id FROM addons_users UNION DISTINCT SELECT user_id FROM approvals UNION DISTINCT SELECT user_id FROM api_auth_tokens UNION DISTINCT SELECT user_id FROM editor_subscriptions UNION DISTINCT SELECT user_id FROM groups_users UNION DISTINCT SELECT user_id FROM reviews UNION DISTINCT SELECT user_id FROM reviews_moderation_flags UNION DISTINCT SELECT user_id FROM versioncomments UNION DISTINCT SELECT user_id FROM users_versioncomments UNION DISTINCT SELECT user_id FROM collection_subscriptions UNION DISTINCT SELECT user_id FROM collections_users UNION DISTINCT SELECT user_id FROM collections_votes UNION DISTINCT SELECT user_id FROM howto_votes UNION DISTINCT SELECT user_id FROM users_tags_addons UNION DISTINCT SELECT user_id FROM hubrsskeys +----------+ | count(*) | +----------+ | 247783 | +----------+ tables = """ addons_users approvals api_auth_tokens editor_subscriptions groups_users reviews reviews_moderation_flags versioncomments users_versioncomments collection_subscriptions collections_users collections_votes howto_votes users_tags_addons hubrsskeys """.split() print ' UNION DISTINCT '.join('SELECT user_id FROM %s' % t for t in tables)
Comment 8•14 years ago
|
||
So is comment #6 saying that we'd only delete 247,783 users out of 4.4 million? Is the last-logged-in field in place?
Reporter | ||
Comment 9•14 years ago
|
||
(In reply to comment #8) > So is comment #6 saying that we'd only delete 247,783 users out of 4.4 million? Those are the users we're keeping.
Comment 10•14 years ago
|
||
Looking forward to the press buzz around it ;)
Comment 11•14 years ago
|
||
I think we should only delete users who have old passwords and aren't linked to anything. Once we have the last-logged-in field in place we can set a policy of closing accounts after 2 years without logging in.
Reporter | ||
Comment 12•14 years ago
|
||
mysql> SELECT COUNT(DISTINCT user_id) FROM (SELECT user_id FROM addons_users UNION DISTINCT SELECT user_id FROM approvals UNION DISTINCT SELECT user_id FROM api_auth_tokens UNION DISTINCT SELECT user_id FROM editor_subscriptions UNION DISTINCT SELECT user_id FROM groups_users UNION DISTINCT SELECT user_id FROM reviews UNION DISTINCT SELECT user_id FROM reviews_moderation_flags UNION DISTINCT SELECT user_id FROM versioncomments UNION DISTINCT SELECT user_id FROM users_versioncomments UNION DISTINCT SELECT user_id FROM collection_subscriptions UNION DISTINCT SELECT user_id FROM collections_users UNION DISTINCT SELECT user_id FROM collections_votes UNION DISTINCT SELECT user_id FROM howto_votes UNION DISTINCT SELECT user_id FROM users_tags_addons UNION DISTINCT SELECT user_id FROM hubrsskeys UNION DISTINCT SELECT id as user_id FROM users WHERE password like 'sha512%') J; +-------------------------+ | COUNT(DISTINCT user_id) | +-------------------------+ | 810057 | +-------------------------+ 1 row in set (17.77 sec) mysql> select count(1) from users; +----------+ | count(1) | +----------+ | 4577564 | +----------+ 1 row in set (1.10 sec)
Comment 13•14 years ago
|
||
draft announcement: http://etherpad.mozilla.org:9000/amo-user-changes
Assignee | ||
Comment 14•14 years ago
|
||
I got suckered into doing this, because otherwise our queries take forever to add display names and user names (and I'm hoping this prevents some duplicates). Current plan is to use today's db dump to get the IDs to delete and then whack those in production with the next push. Doing the queries on the live site just takes way too long.
Assignee: nobody → clouserw
Target Milestone: 4.x (triaged) → 5.11.8
Assignee | ||
Comment 15•14 years ago
|
||
I'm using the query from comment 12 to get the `active_users` table. After that I'm doing this to get a list of users to delete: `select id from active_users` > users_to_save `select id from users` > all_users sort all_users users_to_save | uniq -u > delete_these_users someone want to double check that logic please?
Reporter | ||
Comment 16•14 years ago
|
||
Looks fine to me. What's the `wc -l` on those files?
Assignee | ||
Comment 17•14 years ago
|
||
clouserw@khan:~$ wc -l all_users users_to_save delete_these_users 4622327 all_users 859523 users_to_save 3762806 delete_these_users 9244656 total
Reporter | ||
Comment 18•14 years ago
|
||
The only flaw with this method is if someone jumps into users_to_save after you've made the files.
Assignee | ||
Comment 19•14 years ago
|
||
Yeah, I'm fine with that. I can recreate the files fairly easily at this point, so I can make them again before we do the final run.
Assignee | ||
Comment 20•14 years ago
|
||
Alright, list of users to delete is here: http://people.mozilla.com/~clouserw/temp/delete_these_users That's a 27M text file, you might not want to click on it in a browser. If anyone sees anything amiss, speak now or forever hold your peace.
Comment 21•14 years ago
|
||
I spot checked a few accounts in that list and they seem fine to me.
Assignee | ||
Comment 22•14 years ago
|
||
This is still running on preview. 2 million user accounts to go. We'll have to start this early in production.
Assignee | ||
Updated•14 years ago
|
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•8 years ago
|
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•