Closed
Bug 1171274
Opened 9 years ago
Closed 9 years ago
cull the MozReview user database and prevent it from bloating again
Categories
(MozReview Graveyard :: General, defect, P1)
MozReview Graveyard
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gps, Unassigned)
Details
Attachments
(4 files)
We have tons of bugs around user selection not working. Enough is enough. While there may certainly be bugs in Review Board and MozReview causing issues, we know one thing: our user database is large and most of the users in it will never interact with MozReview. Here are some stats: Total users: 338,290 Users w/ '+' (no IRC nick): 336,748 Users w IRC nick: 1,542 Users submitting requests: 105 Users leaving reviews: 192 Let's say we've had 300 unique users of MozReview to date (which is probably a bit high). That's 0.09% of the total user database. If you throw in people with IRC syntax, that's 0.46%. The overwhelming majority of users in the database are dormant users and are making it harder to find people in MozReview. I think we should purge the MozReview database of users that aren't active and we should prevent them from returning. Here's what I'm thinking: 1) Change the user creation code to not import a new user automatically unless they have IRC nick syntax in their Bugzilla account 2) Write a script to delete users that don't have IRC nick syntax *and* have never interacted with MozReview 3) Bugzilla users can still have accounts created by logging in or by pushing changesets Mozillians without IRC nicks in Bugzilla and who have never used MozReview will not be automatically imported into MozReview. This is unfortunate. The recourse is they log in once and problem solved. I don't think that's a major concern. If we're worried about pruning the user database too far, perhaps we could ask the BMO people for a dump of active accounts within the past year and we could not delete them. I suspect that may add a few thousand more users, but it will be a far cry from 330,000. Progress is progress.
Reporter | ||
Comment 1•9 years ago
|
||
mozreview: limit which users are auto populated (bug 1171274); r?smacleod, r?mcote MozReview currently has over 300,000 users and over 99.0% of them are inactive. This is because MozReview currently populates all Bugzilla users returned from search results. Unfortunately, this mountain of inactive users creates chaos. They take up space in autocomplete results. They conflict with valid active users. They are a royal pain. This commit establishes limits on what users are auto imported from Bugzilla during searches. Instead of importing every user found by a search, we instead only import users that have IRC nick syntax or that exactly match the search query. This should drastically reduce the number of users that are imported from Bugzilla during searches. This commit is only half the solution to our excessive user database. The other half is to purge the user database of inactive users. This will be done in a separate commit.
Attachment #8615005 -
Flags: review?(smacleod)
Attachment #8615005 -
Flags: review?(mcote)
Reporter | ||
Comment 2•9 years ago
|
||
mozreview: INCOMPLETE ability to prune inactive users (bug 1171274) We have over 300,000 next to worthless user accounts sitting around inside MozReview. This commit adds functionality to prune them. THIS COMMIT IS HALF FINISHED. THE CODE IS UNTESTED.
Reporter | ||
Comment 3•9 years ago
|
||
https://reviewboard.mozilla.org/r/10143/#review8901 smacleod could probably finish this commit in 1/4 of the time it will take me to figure out Django's ORM. Feel free to have it.
Comment 4•9 years ago
|
||
https://reviewboard.mozilla.org/r/10143/#review8925 ::: pylib/mozreview/mozreview/bugzilla/models.py:125 (Diff revision 1) > + for u in User.objects.all(): > + if u.id in active_uids: > + continue > + > + print('will delete user %s' % (u.username)) > + #BugzillaUserMap.objects.filter(user=u).delete() > + #u.delete() this should work: ```python BugzillaUserMap.objects.exclude(user_id__in=active_uids).delete() User.objects.exclude(id__in=active_uids).delete() ```
Comment 5•9 years ago
|
||
https://reviewboard.mozilla.org/r/10143/#review8927 ::: pylib/mozreview/mozreview/bugzilla/models.py:118 (Diff revision 1) > + for rr in ReviewRequest.objects.all(): > + active_uids.add(rr.submitter.id) > + for u in rr.target_people: > + active_uids.add(u.id) > + for u in rr.participants: > + active_uids.add(u.id) We should do this for both ReviewRequest and ReviewRequestDraft
Comment 6•9 years ago
|
||
Comment on attachment 8615005 [details] MozReview Request: mozreview: limit which users are auto populated (bug 1171274); r?smacleod, r?mcote https://reviewboard.mozilla.org/r/10141/#review9027 Hm, I'm a bit concerned that we might be breaking functionality that users like. Do we have any idea how often a user searches for a reviewer by name? If it's at all frequent, we might want to do what I suggest in https://bugzilla.mozilla.org/show_bug.cgi?id=1160419 instead. That will return more hits, but it would make autocomplete a lot more useful, and you don't need to know the person's username/nick. ::: pylib/rbbz/rbbz/auth.py:15 (Diff revision 1) > -from mozreview.models import get_or_create_bugzilla_users > +from mozreview.bugzilla.models import ( > + BZ_IRCNICK_RE, > + get_or_create_bugzilla_users, > +) I believe we normally structure these as from foo import (bar, baz)
Attachment #8615005 -
Flags: review?(mcote)
Comment 7•9 years ago
|
||
https://reviewboard.mozilla.org/r/10141/#review9033 > I believe we normally structure these as > > from foo import (bar, > baz) Hm messed up my indentation somehow, but I think you know what I meant.
Reporter | ||
Comment 8•9 years ago
|
||
https://reviewboard.mozilla.org/r/10141/#review9039 I'm fine with adding a time-based component to the importing. **anything** is better than what we have now. And, there should be no harm pruning users that have never used MozReview.
Comment 9•9 years ago
|
||
https://reviewboard.mozilla.org/r/10141/#review9297 As I understand the code, searching by name will still work perfectly fine (As long as the user has the ircnick syntax). The only previous use case this will break is finding non convention following BMO users when searching by a name - which I am fine with.
Comment 10•9 years ago
|
||
https://reviewboard.mozilla.org/r/10141/#review9299 Ah perhaps I misunderstood then. If you can still search users with irc nicks by full name, then this is probably fine.
Comment 11•9 years ago
|
||
Comment on attachment 8615005 [details] MozReview Request: mozreview: limit which users are auto populated (bug 1171274); r?smacleod, r?mcote https://reviewboard.mozilla.org/r/10141/#review9399 This looks fine to me for the most part - I'm curious if this will have a large perf impact though, since we query for users quite a bit.
Attachment #8615005 -
Flags: review?(smacleod) → review+
Comment 12•9 years ago
|
||
https://reviewboard.mozilla.org/r/10141/#review9027 > Hm messed up my indentation somehow, but I think you know what I meant. I actually prefer this style over the currently used one (Review Board's convention). Let's switch. Going to drop this.
Comment 13•9 years ago
|
||
mozreview: limit which users are auto populated (bug 1171274); r=smacleod MozReview currently has over 300,000 users and over 99.0% of them are inactive. This is because MozReview currently populates all Bugzilla users returned from search results. Unfortunately, this mountain of inactive users creates chaos. They take up space in autocomplete results. They conflict with valid active users. They are a royal pain. This commit establishes limits on what users are auto imported from Bugzilla during searches. Instead of importing every user found by a search, we instead only import users that have IRC nick syntax or that exactly match the search query. This should drastically reduce the number of users that are imported from Bugzilla during searches. This commit is only half the solution to our excessive user database. The other half is to purge the user database of inactive users. This will be done in a separate commit.
Comment 14•9 years ago
|
||
mozreview: add function to prune inactive users (bug 1171274). r?gps We have over 300,000 next to worthless user accounts sitting around inside MozReview. This commit adds functionality to prune them.
Attachment #8627377 -
Flags: review?(gps)
Comment 15•9 years ago
|
||
Comment on attachment 8627377 [details] MozReview Request: mozreview: add function to prune inactive users (bug 1171274). r?gps mozreview: add function to prune inactive users (bug 1171274). r?gps We have over 300,000 next to worthless user accounts sitting around inside MozReview. This commit adds functionality to prune them.
Reporter | ||
Updated•9 years ago
|
Attachment #8627377 -
Flags: review?(gps)
Reporter | ||
Comment 16•9 years ago
|
||
Comment on attachment 8627377 [details] MozReview Request: mozreview: add function to prune inactive users (bug 1171274). r?gps https://reviewboard.mozilla.org/r/12217/#review10705 ::: pylib/mozreview/mozreview/bugzilla/models.py:153 (Diff revision 2) > + for u in User.objects.filter(email__in=SPECIAL_EMAILS): > + active_uids.add(u.id) > + > + for u in User.objects.filter(username__in=SPECIAL_USERNAMES): > + active_uids.add(u.id) > + > + for u in User.objects.all(): > + if (u.date_joined - u.last_login) > MAX_LOGIN_DIFFERENCE: > + active_uids.add(u.id) I think it would be useful if we printed how many users came from where. That way we know how important each item is to keeping users around. ::: pylib/mozreview/mozreview/bugzilla/models.py:167 (Diff revision 2) > + for u in User.objects.filter(id__in=active_uids): Can we sort this? ::: pylib/mozreview/mozreview/bugzilla/models.py:181 (Diff revision 2) > + User.objects.exclude(id__in=active_uids).delete() I believe there are query length limits in MySQL. Want to take a bet on whether Django's ORM is smart enough to work around it? What I'm saying is be prepared to rewrite this.
Comment 17•9 years ago
|
||
Comment on attachment 8627377 [details] MozReview Request: mozreview: add function to prune inactive users (bug 1171274). r?gps mozreview: add function to prune inactive users (bug 1171274). r?gps We have over 300,000 next to worthless user accounts sitting around inside MozReview. This commit adds functionality to prune them.
Attachment #8627377 -
Flags: review?(gps)
Reporter | ||
Comment 18•9 years ago
|
||
Comment on attachment 8627377 [details] MozReview Request: mozreview: add function to prune inactive users (bug 1171274). r?gps https://reviewboard.mozilla.org/r/12217/#review10707 ::: pylib/mozreview/mozreview/bugzilla/models.py:190 (Diff revisions 2 - 3) > - User.objects.exclude(id__in=active_uids).delete() > + Kill the whitespace.
Attachment #8627377 -
Flags: review?(gps) → review+
Comment 19•9 years ago
|
||
https://hg.mozilla.org/hgcustom/version-control-tools/pushloghtml?changeset=5cbe65b797ec Leaving this open until we've actually performed the user removal on production.
Reporter | ||
Comment 20•9 years ago
|
||
We pruned the user database a few weeks back and all looks well.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•8 years ago
|
Product: Developer Services → MozReview
You need to log in
before you can comment on or make changes to this bug.
Description
•