Return only search results that contain all keywords in the user's search query

RESOLVED INCOMPLETE

Status

RESOLVED INCOMPLETE
3 years ago
3 years ago

People

(Reporter: jsavage, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
Longer search queries mean that the user wants narrower search results. We'd like to return only search results that match all terms in the user's query (except for stop words).

Example:
Type "delete history and cache" in site search.

Expected result: See "delete browsing, search and download history on Firefox", or "Forget button..." at the top of the search results page.   

Actual result: We're seeing the search page cluttered with articles that contain either "delete", "history" or "cache" (and their synonyms) at the top of the results. The "uninstall..." articles outrank the relevant articles.

Trello user story:https://trello.com/c/PCew5doQ
I think this is actually an issue with the synonym system. The synonym-words are considered just as important as the original word in the search term, and apparently the word "uninstall" is a more powerful word when searching our KB, so those articles come up. Removing the synonyms would probably help this search.
Mike is this not possibly, mainly a case that Sumo uses boolean OR
Whereas by comparison boolean AND is used by  Google.

Add more keywords in a support search and the hits grow, unless they are already maxed out at an unuseable 1000 hits

Add more keywords on a Google search and you get a more precise hit and fewer of them.
Even new or naive Firefox users are probably familiar with Google and have learnt a greater number of  words make better searches. 

I know Joni has given a very specific example, but I am not sure Joni is trying to only improve that specific search (NI Joni). I am interested in other searches of the KB and of support  questions, and I see little point in using OR.

I know I often  view search attempts from a contributors point of view, but it seems likely ordinary users have similar problems because of the use of OR.

To my mind problems tend to be with too many unhelpful hits.

Take another example try looking at anything related to bookmarks just the single word and it maxes out at 1000 hits. (IIRC we have about 100 KB articles a good proportion having the word in the title -- and a policy leaning towards to writing an increased number of short articles, so they could proliferate !)
Flags: needinfo?(jsavage)
We've talked about adjusting the way search queries work several times over the last few years. Every time this happens, people start suggesting technical solutions with no understanding of Elasticsearch which makes it really difficult for the discussion to go anywhere useful.

Whether a query provides 1000 hits or not is irrelevant since the thing we're looking at are the results on the first page. If the results you're looking for aren't on the first page, then we should be focusing on what queries users are doing and what results they're getting and hone in on that.

Also, "users" and "contributors" are two completely different groups with two completely different sets of needs. We shouldn't be mixing them.

In previous iterations of this problem, I was told we didn't have a good idea of what users are searching for. We had a list of like 100,000 items, but we couldn't normalize it and we weren't able to do any qualitative analysis on the nature of the queries. Thus we weren't able to proceed.

I've been working on Input for a while, so I don't know what's happened in the last year.

I'm inferring from the bug description that this is changed now and that we've done some research and decided that users are searching using phrases where they're looking for all the words in the query in that order and not searching for individual terms which is what we previously thought when we implemented this originally.

Is that correct?
(Reporter)

Comment 4

3 years ago
Thanks, Mike. Removing the synonym for that particular search worked. As John99 noted, I was also interested in improving searches (particularly the top ones) in general, but willkg's got a good point about suggesting technical solutions.

Willkg, we did determine that search phrases collectively outnumber individual terms. I don't about the particular word order, but the user intent doesn't really change that much. Google Analytics hasn't been picking up our searches for a while, but I've been using the Google Adwords keyword tool to figure out the top searches, which are pretty much the same as the top searches recorded in Analytics a year ago (https://docs.google.com/spreadsheets/d/1-zG0c4UWoYZMa4g2Fvh0FzCUNw7vaBf7pcS3Q78XOls/edit#gid=0).

I haven't worked with ElasticSearch before, so I'll avoid filing technical requests and focus on collecting examples of poor first page results first. Then we'll see what solutions would work with ElasticSearch.

John99, would you like to help me gather specific first page examples? I've set up a spreadsheet here: https://docs.google.com/spreadsheets/d/1FlsZHSbU2a-DgL8FhasgcdlbJULWiCaPXzVnGfbVVyc/edit#gid=1294459069

Closing this bug for now until we can provide examples.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Flags: needinfo?(jsavage)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.