Closed Bug 738375 Opened 14 years ago Closed 14 years ago

elasticsearch search for "yahoo search address bar" brings up most of the kb

Categories

(support.mozilla.org :: Search, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

Details

(Whiteboard: u=user c=search p=)

If you do a search with elasticsearch for "yahoo search address bar", you get most of the kb. If you do the same search with sphinx, you get two kb articles. Making this a blocker because this really messes up phrase queries for users. My first blush guess is that our elasticsearch code is doing a boolean OR between the terms where our sphinx code did an AND. Putting this in the 2012.7 sprint because there's no way I'm getting to it this sprint.
Whoops--meant to grab it.
Assignee: nobody → willkg
Kadir, James, Ricky, Michael and I just talked about this. We probably want to solve this by switching to a unified search so we're not showing all the kb bucket before the search forums (aka questions) bucket. However, that's a non-trivial project. So as a stopgap, we want to play with this: If we're showing results from more than one bucket and one of the buckets is the kb then limit the kb result set to 5 articles maximum. Further, we want to put that in a waffle so we can test it to figure out whether it affects CTR adversely. (Did I summarize our conversation right?)
This bug is actually about the fact that it's doing an "or" search instead of an "and" search, I think. What are we doing about that?
(In reply to [:Cww] from comment #3) > This bug is actually about the fact that it's doing an "or" search instead > of an "and" search, I think. What are we doing about that? We know from old tests with Sphinx that AND searches tend to overly restrict results and not help. In Sphinx, we were using the most sophisticated match mode. We haven't tuned matching in ES at all yet--I think we're still using a naive OR. But naive AND almost certainly isn't the solution. Of course we could A/B test that and look at CTR. The goal should be to get the most relevant results on the first page, not worry about how many KB articles are matching.
(In reply to [:Cww] from comment #3) > This bug is actually about the fact that it's doing an "or" search instead > of an "and" search, I think. What are we doing about that? Well, actually no. This bug is about how searching for a phrase brings up a large portion of the kb. My theory about what's going on is that it's an "or" vs. an "and", but I haven't verified that, yet. But let's assume the theory is correct-ish for some loose version of "correct". James and I talked about this and it's not clear that the ES behavior is worse than the Sphinx behavior for the first couple of pages of search results and maybe it's one of the reasons we're getting a higher CTR. So my comment 2 is about alleviating the big problem that this creates (i.e. it brings up a large portion of the kb which pushes support forum (aka questions) do a distant double-digit page that no one will ever get to) rather than changing the general behavior.
Will, your comment in 2 is what we talked about indeed.
Target Milestone: 2012.7 → 2012.6
Why did this get put in the 2012.6 milestone? It should probably be 2012.7 or 2012.8.
This bug is forcing us to document even smaller/short-lived things in the KB because users aren't going to find the forum threads where the issues are solved. We should make fixing this a high priority (i.e. can we get to it this week?). (PS, personally, I think the better fix is to use "AND" searches from search boxes and leave "OR" searches for the AAQ form. That matches user expectations that adding more terms narrows a search rather than broadening it.)
First off, I never should have done a first blush guess. That first blush guess caused us to fixate on the wrong thing (AND vs. OR) and waste a lot of time talking about it. I'm really sorry about that. Second, this has nothing to do with AND vs. OR. I spent some time looking into elastic search query types last week after we had discussed the stop gap solution which I have icky feelings about. Third, regarding priority, I do think this is something that needs to get fixed sooner rather than later. That's why it's a P1. That's why I wrote what I wrote in previous comments. The bookkeeping of which sprint it goes in is just bookkeeping. I thought we were done with the 2012.6 sprint and that this week was a limbo week between sprints. Ergo I thought things that were in progress shouldn't be put in the 2012.6 sprint any more. Ricky updated me on what's going on sprint-wise just now. As a semi-related side note, when fiddling with the bookkeeping, it'd be really nice to add a comment as to why you're changing things. It's better for project archaeology and it keeps everyone cc:d on the bug in the loop. Fourth, elasticutils (the library we use to interface with elastic search) currently uses text queries for the search box. After spending time on this last week, I now believe that's a mistake: we should be using query_string, which amongst other things also supports "phrase syntax". I don't really want to get into gory details. If you want to read about them, the links are here: http://www.elasticsearch.org/guide/reference/query-dsl/text-query.html http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html I need another couple of days to make sure this will solve the issues and to implement query_string support in elasticutils. I can spend a day to do the stop-gap solution if you want, but I think: a) it was a stop-gap designed to use for now until we unified the search buckets which is a big project and wouldn't happen for a month. b) the query_string solution is promising enough to push doing a stop-gap off until I discover that the query_string idea is also not going to fix this issue.
Target Milestone: 2012.6 → 2012.7
Whiteboard: u=user c=search p=
So, the specific problem here "elasticsearch search for "yahoo search address bar" brings up most of the kb" is gone now that we're capping the kb results at 10 per bug #742437. Therefore, I'm going to mark this as FIXED.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: 2012.7 → ---
You need to log in before you can comment on or make changes to this bug.