Closed Bug 1197251 Opened 9 years ago Closed 9 years ago

[Super Search] Inconsistent results when using several versions including one ending in -b

Categories

(Socorro Graveyard :: Middleware, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: adrian, Assigned: adrian)

References

Details

An example from Peter using the API: 

> >>> api.get(product='Firefox', version=[u'43.0a1', u'42.0a2'])['total']
> 355
> >>> api.get(product='Firefox', version=[u'43.0a1', u'42.0a2', u'41.0b'])['total']
> 0
Blocks: 1196867
What are the chances that you can find some time in the last couple of days of the quarter to work on this?
I'm after all blocked by this one my one and only Q deliverable.
Priority: -- → P1
Good news: I figured how to solve this.
Less-good news: it's not trivial. 

The reason of this bug is multiple.

Reason 1
--------

SuperSearch will mix filters in a weird ways. For each filter's name, if that filter's data type is "enum", it will use an "OR". For every other data type, it will use an "AND". And then between filter's names, it will use an "AND". 

So something like this: 
> version=1.0 & version=2.0 & product=Firefox

translates to:
> ( (version is 1.0) OR (version = 2.0) ) AND (product = Firefox)

I now believe that it should always use "OR" between filters with the same name, and never use "AND". That would be a lot easier to understand, and makes more sense in my opinion.

Reason 2
--------

The advanced string operators (contains, starts with, ends with) use a feature of Elasticsearch called "wildcards". That feature can only be set in a "query", as opposed to a "filter", in the query DSL, and queries and filters do not mix. The query is executed first, and then the filters are applied to the results of that query. That means that if a single filter has a "simple" rule and an advanced one (so for example, "has terms" and "starts with"), what gets executed is indeed an "AND". So if we want to keep the same behavior as above, we need to be able to use an "OR" there. Good thing is, recent versions of Elasticsearch have a "query" filter, which can contain anything a query can and still be used as a filter, alongside other filters. 

There is a bit of code rewrite to do here, but it will make the code simpler, since there will be no more query / filter distinction. 

Reason 3
--------

The relationship between my previous points and this bug is that when passing the version filter a value ending in 'b', that value gets removed from the list of versions (which is a simple "has terms" operator) and is replaced by a "starts with" operator (an advanced one, requiring a wildcard). We thus end up having this conversion: 

> version=1.0 & version=2.0 & version=3b

becomes
> ( (version = 1.0) OR (version = 2.0) ) AND (version = 3b*)

which returns no results, of course. 


Now, I've written the theory down. All that's left is to code it! :)
Status: NEW → ASSIGNED
Commit pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/6a223e190699e66327ffff14a9d878c8d6c9a597
Fixes bug 1197251 - Rewrote SuperSearch filters combination. r=peterbe
Replaced the wilcard and match_phrase queries with filter queries, allowing them to be combined with other filters in boolean filters.
Values of a filter now combine with "OR" instead of "AND", except for ranges.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Socorro → Socorro Graveyard
You need to log in before you can comment on or make changes to this bug.