Closed Bug 921479 Opened 11 years ago Closed 10 years ago

Supersearch should support string searches for the "cpu info" field

Categories

(Socorro :: Webapp, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: benjamin, Assigned: adrian)

References

Details

I wanted to construct a search for

* CPU info starts with "AuthenticAMD family 20 model 1" or "AuthenticAMD family 20 model 2"

But currently the cpu info field in supersearch only support "is" relations and not string relations. The "is" relation isn't that useful because there are so many variations.
Also I tried to facet on "cpu info" and got some weird results: it appears to be faceting on individual tokens?

1 	stepping 	528 	100.00 %
2 	model 	528 	100.00 %
3 	family 	528 	100.00 %
4 	6 	425 	80.49 %

This is from the new supersearch currently on stage.
I was expecting something like this, but couldn't guess what that list would be. 

Benjamin, do you mind making a list of all the fields that you would like to use string operators on? 

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #1)
> Also I tried to facet on "cpu info" and got some weird results: it appears
> to be faceting on individual tokens?

Yes, it does indeed facet on individual tokens, as this is the default behavior of elasticsearch. Same thing here, I'll need a list of all the fields that would need to be taken as entire strings instead of individual tokens. (Note that those two lists might be identical, but don't have to. )
Assignee: nobody → adrian
OS: Linux → All
Priority: -- → P2
Hardware: x86_64 → All
Depends on: 872547, 926874
I would imagine that by default fields should be string operations unless we know something about them. In any case, here's a table of fields and their type:

https://docs.google.com/spreadsheet/ccc?key=0Apbc4eh5_A9wdFFEakY2U1pWaEJZR3VRTjd4WUlsN3c&usp=sharing
I'm not too keen on having fields with strings operators by default, because that would imply some overhead in our elasticsearch storage that we might want to avoid. 

I started working on bug 872547 and I went through all the fields we have in our processed JSON, and I think I have a quite coherent mapping now. I'm going to go with that, and if you find that something is missing we can definitely improve it again. 

Thanks for the spreadsheet, it's going to be very useful!
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 73
You need to log in before you can comment on or make changes to this bug.