Ensure search v3 accounts for {REDIRECT} articles

VERIFIED FIXED in 1.5

Status

support.mozilla.org
Search
P2
normal
VERIFIED FIXED
8 years ago
8 years ago

People

(Reporter: paulc, Assigned: jsocol)

Tracking

unspecified

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: sumo_only, URL)

Attachments

(1 attachment)

(Reporter)

Description

8 years ago
We have some articles that are simply redirects of others. These shouldn't show up in the search results. We previously did this in the XML indexer, but we won't be using that anymore.
(Reporter)

Updated

8 years ago
Summary: Ensure new search accounts for {REDIRECT} articles → Ensure search v3 accounts for {REDIRECT} articles
(Reporter)

Comment 1

8 years ago
I was thinking, worst case, we can do a filtering by page id for existing redirects. That wouldn't add much to indexing time.

Chris: why do we need article redirects? could you explain more about that? could we do them in htaccess instead?
Two reasons:
* To preserve old URLs of renamed articles.
* To provide alternative names to articles, where we think it is needed. The articles that have text before the redirect are supposed to show up in search results. See bug 489046.
(Assignee)

Comment 3

8 years ago
If we just add

AND content NOT LIKE '{REDIRECT(%'

to the WHERE clause for wiki_pages in sphinx.conf, we should be good, right?
(Reporter)

Comment 4

8 years ago
That sounds good to me. I hope that won't triple indexing time :)

Comment 5

8 years ago
To clarify, the two reasons Ilias listed are also two very distinct use cases:

* To preserve old URLs of renamed articles -- these redirects shouldn't be included in search results

* To provide alternative names to articles where we think it is needed -- these redirects SHOULD be included in search results

Typically they're distinguished by being _just_ the {REDIRECT} code in the formst, and also having an actual article summary/description paragraph in the latter.

Chris: can you please clarify how we distinguish the two so we don't end up with a situation where our alternate article names are ignored by the search engine, as that would defeat the purpose of having them?
(Assignee)

Comment 6

8 years ago
I talked to Chris and he gave me a couple example articles. It shouldn't be a problem.
Assignee: paulc → james
Priority: -- → P2
(Assignee)

Updated

8 years ago
Priority: P2 → P1
(Assignee)

Updated

8 years ago
Priority: P1 → P2
(Assignee)

Comment 7

8 years ago
Created attachment 413229 [details] [diff] [review]
tell the indexer to ignore pure {REDIRECT()} articles

This just tells the indexer that if the page content starts with "{REDIRECT" to ignore the whole page. I tested this by searching for "Shockwave".
Attachment #413229 - Flags: review?(paulc)
(Reporter)

Comment 8

8 years ago
Comment on attachment 413229 [details] [diff] [review]
tell the indexer to ignore pure {REDIRECT()} articles

Looks good, no more "Shockwave" in results for me either.
Attachment #413229 - Flags: review?(paulc) → review+
(Assignee)

Comment 9

8 years ago
r56386. Should be affecting the results on stage by now.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Verified FIXED on https://sumo-forumux.stage.mozilla.com/search.php?locale=en-US&q=shockwave&sa=.
Status: RESOLVED → VERIFIED
(Assignee)

Updated

8 years ago
Blocks: 532156
(Assignee)

Updated

8 years ago
Whiteboard: sumo_only
You need to log in before you can comment on or make changes to this bug.