Closed Bug 710469 Opened 13 years ago Closed 7 years ago

Remove wikimarkup from search excerpts and indexes for elastic search

Categories

(support.mozilla.org :: Search, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Future

People

(Reporter: erik, Unassigned)

References

Details

(Whiteboard: u=user c=search s= p=2)

We can feed the index HTML rather than wikimarkup for Questions and Forums, then use the lovely HTML Strip charfilter (http://www.elasticsearch.org/guide/reference/index-modules/analysis/htmlstrip-charfilter.html). That way, we won't have any more embarrassing equal signs in our search results:

Firefox Crashes at http://de.rs-online.com...com == Crash ID(s) == Crash at http://de.rs-online.com ...To look up....
Whiteboard: u=user c=search s=2012.1 p=
This is:
* Index html for questions and answers
* Improve display of question results (if necessary)
Whiteboard: u=user c=search s=2012.1 p= → u=user c=search s=2012.1 p=2
Assignee: nobody → erik
Nah, forums are full of [links] (https://support.mozilla.org/en-US/search?a=1&updated=0&e=sph&created=0&sortby=0&w=4&page=3). For the life of me, I can't get a test question to show up in search.
Erik: I think you need to give the test question an answer, then mark that answer as helpful. That'll put it in the search results. Otherwise it gets filtered out.
Yeah, that wasn't cutting it for some reason. :-(
Whiteboard: u=user c=search s=2012.1 p=2 → u=user c=search s=2012.2 p=2
QA: Make sure items with wiki links in them, like [[SomePage|some link text]] aren't found by a search for "SomePage". Likewise, searching for "br" shouldn't find all the wiki pages with <br> tags in them. Exerpts on the search results page should likewise lack both wiki markup ("== Header ==") and HTML.
Summary: Feed Elastic HTML, not wikimarkup → Remove wikimarkup from search excerpts and indexes
Depends on: 720935
Clarifying that this is specific to Elastic Search searches--not Sphinx searches.
Summary: Remove wikimarkup from search excerpts and indexes → Remove wikimarkup from search excerpts and indexes for elastic search
This seems like a duplicate of bug #557451, though that's really old and probably covers whatever search system we had back then.

Should we mark that one as a duplicate of this one and move forward? Or is there something about that one that isn't covered in this one?
Moving to next sprint due to the added dependency on Bug 720935.
Whiteboard: u=user c=search s=2012.2 p=2 → u=user c=search s=2012.3 p=2
Priority: -- → P1
We should probably bump this from the 2012.3 sprint. We could look at landing it after the read/write index split lands, but it's not as high priority as the gazillion other things on our plates right now, so maybe it's better to push it off a bit?
(In reply to Will Kahn-Greene [:willkg] from comment #9)
> so maybe it's better to push it off a bit?

Agreed. Let's come back to this once we are done with blockers and higher priority ES bugs.
Priority: P1 → --
Whiteboard: u=user c=search s=2012.3 p=2 → u=user c=search s= p=2
I'm going to make this block bug #729688 since it'll be a lot easier to get this into production before we ditch Sphinx rather than afterwards.

Having said that, this shouldn't be at the tippy top of the priority list.
Blocks: 729688
Erik isnt around anymore, Q1 is way over, not sure if this still needs to be done
Assignee: erik → nobody
Target Milestone: 2012Q1 → Future
My vote is we keep it around. It's definitely something lame and it probably affects the results scoring, but I think there's still a decent amount of low-hanging fruit that'd affect things more.
This looks like it's fixed? I don't see HTML in the results anymore. Please re-open if it's still an issue.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.