Closed Bug 443944 Opened 16 years ago Closed 15 years ago

Forum search seems to search terms as OR not AND

Categories

(support.mozilla.org :: General, defect, P1)

x86
Windows Vista
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: Kensie, Unassigned)

References

Details

It seems to me when I search on the forums I get a lot of results that don't have to do with both of my search terms. 

Searching on "options" turns up over 452 pages of results.

Searching on "flash options" turns up over 3012 pages of results.
Definitely wanted for 0.7
Target Milestone: --- → 0.7
Setting P1 on bugs that are "must haves" for 0.7. This bug should be targeted to 0.6.3 or 0.6.4, or kept in 0.7 which will be the last milestone for Q3.
Priority: -- → P1
It's neither OR nor AND.  The search is a full text search, which means results are returned in order of relevance; hence you are generally receiving results where the whole phrase is contained, then results that contain less of the whole phrase.

The alternative here is to switch to BOOLEAN MODE searches.  This would allow only returning results where all of the words are in the returned results.  However, Boolean mode does *not* allow sorting of results by relevance, so you would not receive results in such a useful order.  I believe this is a worse outcome, so I think we should call this WONTFIX.  a=David?
This is assuming the current results are actually more useful. What's the math that goes into relevance? There's a huge loss in usability I think in returning too many results, especially if maybe only 1 or 2 have both terms and then it defaults to either or.

Would the Boolean mode sort results by date? I think this combined with the fact that the results would have all terms search for would make the results much easier for people to use. From my experience the big factors in relevance are that all terms are present and recency so I don't think this would be a huge loss in those terms.
Yeah it might be interesting to try the boolean search. The current search is really broken, with 35k+ search results on simple terms. :(

How is search in the forum different from the KB? Or are we having similar problems there?
The relevance calculation is internal to MySQL but is based on "number of words in the index, the number of unique words in that row, the total number of words in both the index and the result, as well as the weight of the word".  Weight is based on frequency of the word across the document collection.

In addition, all fulltext searches exclude words with <=3 characters and exclude all stopwords.

It's the same search in both.  I'm not sure why returning a lot of results in a useful order is bad - it works for Google.  Perhaps we can put the alternative up somewhere for testing so you can try it out.  I'm loathe to change it in production until you're happier with the alternative, because I suspect you won't be.
(In reply to comment #6)
> The relevance calculation is internal to MySQL but is based on "number of words
> in the index, the number of unique words in that row, the total number of words
> in both the index and the result, as well as the weight of the word".  Weight
> is based on frequency of the word across the document collection.

is this confined to each post (is that why we get multiple results per thread?) or is it per thread?


> It's the same search in both.  I'm not sure why returning a lot of results in a
> useful order is bad - it works for Google.

Except it also fails for google when you start getting into search results from pages that cover multiple topics, which forum threads can do. The thing is I don't think the order is useful, I'll have to come back with some examples next time I try searching the forum.
Here's a specific example:

Just had a user on IRC who had problems with Firefox launching slow on Vista, but only when logged in with a specific user account in Vista (logging into another account it worked fine).

Searching the forums for "launch slow" returned 2600 results.  Not seeing anything that sounded remotely like this user's problem in the top page of search results, I put in "launch slow vista" and got 6047 results, with the same set of posts in the top page that still didn't sound like her problem (and many of them specifically mentioned XP and didn't mention Vista).
Would it be possible to try to switch to binary search on the staging server and see if this actually produces a better result? That way we can at least rule that option out and focus solely on replacing the search engine altogether. Or, we can switch to the binary search while transitioning if that produces a better result.

Dave's example seems to point out that the relevance calculation isn't very efficient.
Target Milestone: 0.7 → 0.8
Target Milestone: 0.8 → 0.9
Should we consider maybe using Google custom search?
New search should fix this.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
Depends on: 405028
You need to log in before you can comment on or make changes to this bug.