Created attachment 660832 [details]
Search using a good search term with a junk search term should return same as only the good search term.
Steps to reproduce:
1. Go to https://support.mozilla.org/en-US/search
2. Search for "firefox" and look at the first result.
3. On the same page, after the existing "firefox" string write a random string of chars (in the test we use "werpadfjka") and click "Search Mozilla Support" button.
The first result form step 1 is different than the first result from step 3.
The first result in both cases should be the same.
Here is the stacktrace from the Selenium test in Jenkins http://qa-selenium.mv.mozilla.com:8080/job/sumo.prod/1814/testReport/tests.desktop.test_search/TestSearch/test_search_returns_either_term/
Why do you expect two different searches to show the same results in the same order? Where is this expectation coming from?
This test was provided by a contributor.
I can see where he was coming from with it but also I'm happy to see it refactored or removed to the requirements of the search algorithm.
I'm not sure this is a valid expectation, but I'll look into it.
So... I spent far too long looking into this because it started out innocent enough and then turned into a rabbit hole into a parallel universe where things are weird.
I say "far too long" because while it's weird, I'm still not sure I think it's a valid expectation.
Anyhow, I'm pretty sure it's a bug in ElasticSearch. Long story short is that if you search for a single word like "firefox", then the boosts get applied. If you search for more than one word like "firefox werpadfjka", then the boosts don't get applied and thus the results get scored differently and show up in a different order.
I opened up a bug in ES for it:
However, assuming they agree it's a bug and fix it, that'd be in 0.19.10 at the earliest and we won't have access to that for a long time. So I don't think we can fix this.
I finally got a reply to my issue and the part that was tripping me up is that the one word and two word queries get translated into different Lucene queries and the second one absorbs the boost into the score, so it doesn't show up in the explains text (lame).
Ergo the boosts are getting applied in all cases. See the github issue for the details.
What that means is that the expectation that doing a search for "firefox" and "firefox nonsense" should produce the same results in the same order isn't going to hold true for non-trivial reasons. Additionally, I don't think this behavior impacts users aversely. Given that, I'm going to close this out with a WONTFIX.