Closed Bug 464087 Opened 16 years ago Closed 14 years ago

Clean up users_users table

Categories

(support.mozilla.org :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: cww, Assigned: paulc)

References

()

Details

Attachments

(2 files, 1 obsolete file)

Since doing anything in tikiwiki requires a polling of users_users and users_usergroups, if we can clean up old entries in these table, it'd probably really help with performance.

A quick query reveals that we have about 54000 people in users_users, about 32000 of whom either registered 90+ days ago and haven't logged in ever or last logged in over 90+ days ago.  It's probably a safe bet that we can delete these.

If we cut the time to 30 days, the number of entries we can delete is almost 49000, which is over 90%.
I think the ideal solution in this case isn't necessarily deleting the accounts.  It'd be great if we could have a separate mechanism somehow for dormant accounts.d

I'd agree with deleting the accounts that have never been used, but due to the nature of what we do I don't think we can say 90 days is a reasonable amount of time before someone must log in again.

Maybe we can have a version of "active" users that gets polled for the every day things, but have a more complete version that is accessed when someone tries to log in.

In the mean time I think we should definitely remove accounts that have never been used, and I think we can be pretty aggressive about that. When a user makes an account we can let them know that if they don't "activate their account" (login) within a week, even, that the account will be removed.

Unless of course what you mean here is that they've never logged in since creating the account, but they did log in intially, then I'd agree with removing accounts that weren't used to create a post or leave a comment or an edit etc.
Assignee: nobody → cwwmozilla
Target Milestone: --- → 1.3
Here's what I'll do.  If they register but NEVER logged in, I'll delete after 4 weeks.  (That's REALLY generous).

If they register and log in but haven't posted/edited/commented EVER then I'll delete after 12 weeks from their log in date.
Here are the numbers: 117763 registered users total, 21058 have done ANYTHING.

Query for part I:

select count(*) FROM users_users uu WHERE registrationDate < (UNIX_TIMESTAMP(NOW())-(7*24*3600*4)) AND lastLogin IS NULL;

(9010 users)

The query for part II is taking well over 3 hours to run so I think if I can't optimize it, we should run a select on sumotools to get a list of names and then use that to do the deletion so that we don't hurt production.  Although I'll take suggestions for optimization:

select count(*) FROM users_users WHERE registrationDate < (UNIX_TIMESTAMP(NOW())-(7*24*3600*4)) AND lastLogin < (UNIX_TIMESTAMP(NOW())-(7*24*3600*12)) AND login NOT IN (SELECT distinct(user) from tiki_actionlog);
Scratch that, it's took over 3 days and still didn't finish.  I'll write a script to go through the data instead.
Attached file script (obsolete) —
Scripting it via a temp mysql column.  Does the above in much less than 20 minutes all told.  (20 is with a lot of print statements, it's probably a lot faster now that I took those out.)  All told, 83714 rows dropped.

Perl: 1, mysql: 0

Laura: you can decide if this is safe for running on prod or if I should run it on a tools dump and extract a list of userIds to drop from the prod table.
Status: NEW → ASSIGNED
Attached file new script
Fixed error handling so it doesn't warn even on successes.  Takes < 5 minutes to run.

Running this will require perl's DBI module but that should be simple enough.  

We need to figure out a way to QA this to make sure no table dependencies break etc.
Attachment #393621 - Attachment is obsolete: true
Attachment #393787 - Flags: review?(laura)
Just a quick concern here. Is data on these users stored in other tables as well? E.g. history stuff that might not be needed after their accounts disappear. Not that I want to break stuff, but if we don't erase these now we probably never will, and we'll have useless data in the db.
Hmm... good point... yeah, there are about a dozen tiki_user_* tables.  I'll give them a quick look through and add a loop to drop them.
URL:
Target Milestone: 1.3 → 1.5
Assignee: cwwmozilla → paul.craciunoiu
Attachment #393787 - Flags: review?(laura)
Is this still 1.5?
This is more like 1.5.1, aimed for the Tiki/LiveChat login integration.
Cool, let's get that milestone set up soon.
Target Milestone: 1.5 → 1.5.1
Here's an example of users that have:
* registered before '09-01-2009' -- strtotime('1 september 2009') = 1251788400
* last logged in less than 1 month after they registered (so their last login was at best 1 october 2009) -- strtotime("1 september 2009")-strtotime("1 august 2009") = 2678400


-- The SQL:
SELECT login, registrationDate, lastLogin FROM users_users WHERE registrationDate < 1254380400 AND ((lastLogin - registrationDate) < 2678400 OR lastLogin IS NULL)
-- End query:
128,327 total, Query took 0.0008 sec

Cheng, we can adjust this further if you wish. I really look forward to having 100k less users, if possible.
I've talked with paul about the users we want to take out.  But my point was we wanted to keep anyone who had done anything (forum post or KB edit).  There's a little more to this bug which is to make sure that preferences are removed and watches are converted to anon watches.
OS: Windows XP → All
Hardware: x86 → All
Target Milestone: 1.5.1 → 1.6
Attached patch php scriptSplinter Review
This bug will require very much testing. Given the status of 1.5.1 currently (esp. QA & dev availability) I'm pushing this out. Also talked to James and he agrees. Laura, is that ok with you too?

The patch adds a command-line script which can be cronned. I based the SQL on Cheng's perl script to get the data. It ran in just under 4 minutes on my machine.

Thorough testing should be done to ensure none of the functionality is broken by the absence of data in these tables: users_usergroups, tiki_user_preferences, users_users.
Attachment #421967 - Flags: review?(laura)
Comment on attachment 421967 [details] [diff] [review]
php script

Looks good, but testing++
Attachment #421967 - Flags: review?(laura) → review+
Should we add this to 1.5.2?
Talked to Laura and it's ok to add this.
On trunk in r63369.
Filed bug 548922 for IT.

Needs significant QA.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Depends on: 548922
Resolution: --- → FIXED
Target Milestone: 1.6 → 1.5.2
For QA: the SQL patch deletes user data for inactive users. The most important part is to make sure active accounts behave as before, and all the basic functionality is still there. According to affected tables, at these features should be tested:
* logins
* user groups
* user preferences
* user watches (email notifications)
* create/edit forum thread
* create/edit article
It would be nice to test for any other basic functionality, but the above should be enough to make sure nothing is broken.

To check that the accounts have been deleted, you can look at the number of pages at tiki-adminusers.php, mine says: "Page: 1/15282"
For privacy reasons, you may PM me on irc if you want to check specific accounts.


Also, I just realized this code could land on the fennec branch too. Laura, James: should it?
Verified, FIXED
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: