Closed
Bug 1197251
Opened 9 years ago
Closed 9 years ago
[Super Search] Inconsistent results when using several versions including one ending in -b
Categories
(Socorro Graveyard :: Middleware, defect, P1)
Socorro Graveyard
Middleware
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: adrian, Assigned: adrian)
References
Details
An example from Peter using the API:
> >>> api.get(product='Firefox', version=[u'43.0a1', u'42.0a2'])['total']
> 355
> >>> api.get(product='Firefox', version=[u'43.0a1', u'42.0a2', u'41.0b'])['total']
> 0
Comment 1•9 years ago
|
||
What are the chances that you can find some time in the last couple of days of the quarter to work on this? I'm after all blocked by this one my one and only Q deliverable.
Assignee | ||
Updated•9 years ago
|
Priority: -- → P1
Assignee | ||
Comment 2•9 years ago
|
||
Good news: I figured how to solve this. Less-good news: it's not trivial. The reason of this bug is multiple. Reason 1 -------- SuperSearch will mix filters in a weird ways. For each filter's name, if that filter's data type is "enum", it will use an "OR". For every other data type, it will use an "AND". And then between filter's names, it will use an "AND". So something like this: > version=1.0 & version=2.0 & product=Firefox translates to: > ( (version is 1.0) OR (version = 2.0) ) AND (product = Firefox) I now believe that it should always use "OR" between filters with the same name, and never use "AND". That would be a lot easier to understand, and makes more sense in my opinion. Reason 2 -------- The advanced string operators (contains, starts with, ends with) use a feature of Elasticsearch called "wildcards". That feature can only be set in a "query", as opposed to a "filter", in the query DSL, and queries and filters do not mix. The query is executed first, and then the filters are applied to the results of that query. That means that if a single filter has a "simple" rule and an advanced one (so for example, "has terms" and "starts with"), what gets executed is indeed an "AND". So if we want to keep the same behavior as above, we need to be able to use an "OR" there. Good thing is, recent versions of Elasticsearch have a "query" filter, which can contain anything a query can and still be used as a filter, alongside other filters. There is a bit of code rewrite to do here, but it will make the code simpler, since there will be no more query / filter distinction. Reason 3 -------- The relationship between my previous points and this bug is that when passing the version filter a value ending in 'b', that value gets removed from the list of versions (which is a simple "has terms" operator) and is replaced by a "starts with" operator (an advanced one, requiring a wildcard). We thus end up having this conversion: > version=1.0 & version=2.0 & version=3b becomes > ( (version = 1.0) OR (version = 2.0) ) AND (version = 3b*) which returns no results, of course. Now, I've written the theory down. All that's left is to code it! :)
Status: NEW → ASSIGNED
Comment 3•9 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/6a223e190699e66327ffff14a9d878c8d6c9a597 Fixes bug 1197251 - Rewrote SuperSearch filters combination. r=peterbe Replaced the wilcard and match_phrase queries with filter queries, allowing them to be combined with other filters in boolean filters. Values of a filter now combine with "OR" instead of "AND", except for ranges.
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•8 years ago
|
Product: Socorro → Socorro Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•