Closed Bug 929373 Opened 11 years ago Closed 11 years ago

By default, don't display User: pages in results

Categories

(developer.mozilla.org Graveyard :: Search, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: teoli, Assigned: groovecoder)

References

()

Details

Attachments

(1 file)

Attached image Useless results.
Do the search 'test' (ok, it is not a real search, but it shows the problem). First result is a User page. Except in some specific cases, the User pages are not what the user is looking for. By default, don't display them. Maybe add a facet to include them in the Topic list.
Blocks: 930047
No longer blocks: 910513
I fully agree with Jean-Yves. Excluding some document types (such as user talk pages) from MDN search engine results would be a great improvement... What about allowing the user to check a few search options in its user references? Thanks in advance for your attention!
Assignee: nobody → lcrouch
Still working on this, but the issue with both this bug and bug 928302 is that the post_save index task indexes these documents regardless of the exclude filter in Document.get_indexable().
Thanks in advance for any improvement to search results, Luke! What about providing an "advanced search" that would allow easy filtering using predefined tags (such as "user: pages") in document titles? And allowing users to save this filter in their preferences?
We have a bunch of filtering features planned for MDN search on bug 915760 and some more in the general search component of bugzilla.
Commits pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/d8ce6fc87c3cd82c3cfa8a0173e22fecee989433 fix bug 928302, 929373, 931412 - Check if the search index entry of a wiki document should really be updated during saving or deleting. This fixes the problem of accidentally indexing wiki documents which aren't supposed to be index when saving or deleting them. https://github.com/mozilla/kuma/commit/334a8cee56d8947fab213ce2a02424f7c346fb1d Merge pull request #1631 from jezdez/improved-indexables-928302 fix bug 928302, 929373, 931412 - Check if the search index entry of a wiki document should really be updated during saving or deleting.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
I just investigated why bug 952532 was filed. So, I searched for "Regexp" in our search. The second result is: /docs/User:Potappo/Core_JavaScript_1.5_Reference/Global_Objects/RegExp_(members) So, "User:" pages aren't excluded from results. See also, the original "test" search JYP provided. See https://github.com/mozilla/kuma/blob/master/apps/search/models.py#L229 We need to exclude: .exclude(slug__icontains='Talk:') .exclude(slug__icontains='User:') .exclude(slug__icontains='User_talk:') At least in get_indexable() and in should_update() there. I would write a PR for this, but I don't know how to test it properly.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Jannis, any chance you can have a look at my previous comment? Otherwise I could also just open a PR and ask for reviewing this. It's just that you know the search stuff way better than me :-)
Flags: needinfo?(jezdez)
:fscholz You're right, this needs an update (and afterwards a reindex), I'm not sure why "User:" pages were not excluded to be honest. Can you make sure the list of page prefixes we'd exclude is correct this time? (e.g. what is the difference between "Talk:" and "User_talk:"?) I can work up a patch rather quick once that list is definite.
Flags: needinfo?(jezdez)
(In reply to Jannis Leidel [:jezdez] from comment #8) > Can you make sure the list of page prefixes we'd exclude is correct this time? (e.g. what is > the difference between "Talk:" and "User_talk:"?) These are artifacts from back when we used MediaWiki as our wiki engine: http://meta.wikimedia.org/wiki/Help:Namespace#List_of_namespaces So, as far as I can tell, we used the following namespaces from that list: .exclude(slug__icontains='Talk:') .exclude(slug__icontains='User:') .exclude(slug__icontains='User_talk:') .exclude(slug__icontains='Template_talk:') .exclude(slug__icontains='Project_talk:') This should at least catch cruft from the MediaWiki days. There might be more unrelated to MediaWiki, but the writing team has just started to examine what we have on MDN overall. So for now this should be the complete list. Once we have the delete feature, more content in the normal namespace will go and we want to move some old content to an "Archive/" zone, which should probably be excluded from search, too. But work on this has not been started yet, it's something we will look into in 2014, though. I think we are good to go with excluding MediaWiki related pages for now and might have more excludes later in the year, if that's okay for you.
Commits pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/d77fb4ffac52df31df820d32aac7945402f2aecb Fix bug 929373 - Stop indexing even more documents. We now exclude documents whose slug starts with any of the following strings: Talk:, User:, User_talk:, Template_talk:, Project_talk: https://github.com/mozilla/kuma/commit/e4b8b1fab7099f0c11807e261fcde14c111730b2 Merge pull request #1923 from jezdez/search-exclude-more Fix bug 929373 - Stop indexing even more documents.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
FWIW, this is "fixed" in that the search indexer will skip these pages in the future. But, a full-site reindex will need to be done in order to address the current contents of the index that includes these pages. However, a full-site reindex is currently very disruptive in that it basically makes search useless until it's done. See this github issue for more discussion: https://github.com/mozilla/kuma/issues/1930
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: