mac_crash_info "contains" search is case-sensitive
Categories
(Socorro :: General, defect)
Tracking
(Not tracked)
People
(Reporter: smichaud, Assigned: willkg)
Details
User Story
"Contains" searches are always case-sensitive. Use "has terms" for case-insensitive searches.
Attachments
(1 file)
This may be intended behavior, and not a bug. But it's annoying when searching in a large, free-form field like "mac crash info".
Here's a search on "Graphics hardware" that works:
And one on "graphics hardware" that doesn't:
Assignee | ||
Comment 1•4 years ago
|
||
Pretty sure "contains" is doing a wildcard query, so it's doing strict string matching.
- https://github.com/mozilla-services/socorro/blob/7a82dcee5f88c25ec03c4e1e3afcbc10fa019772/socorro/external/es/supersearch.py#L326-L337
- https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-wildcard-query.html
Sorry it's annoying! It'd really help if you were more specific about what kinds of questions you're trying to answer. I thought I had covered everything you were looking for in comment 18, but apparently not?
As an aside, I'm on PTO until mid-July, so I'm not going to be able to make any changes until then. Let me know more about what you're trying to do, what questions you're trying to answer, how you're trying to use search, and then maybe we can figure out what changes we need to make to facilitate that.
Reporter | ||
Comment 2•4 years ago
|
||
It's not a big deal. The other bug 1709658 and bug 1577886 followup bugs I've opened (bug 1713355 and bug 1711956) are much more important.
The reason I find it annoying is that, when the "mac crash info" field is large, I can forget the exact case of what I'm searching on. But I can learn to live with it. And case-sensitive searches might sometimes be necessary.
Would it be possible to have two different "contains" searches, one case-sensitive and one case-insensitive?
Enjoy your PTO! If I notice a bug that's truly urgent, I'll try to find someone else to work on it.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
|
||
If you switch to "has terms", then it does a search using the analyzed field which tokenizes the value and lower cases the tokens.
For example, this now works fine:
"mac_crash_info has terms 'Graphics'":
"mac_crash_info" has terms 'graphics'":
"contains" is a strict substring search against the non-analyzed field value, so it will always be case-sensitive.
The docs for this are terrible and I keep forgetting this. I'll fix them now.
Assignee | ||
Comment 4•3 years ago
|
||
Assignee | ||
Comment 5•3 years ago
|
||
Reporter | ||
Comment 6•3 years ago
|
||
Thanks. I consider this issue fixed.
I'll put something in User Story to help others who bump into this issue.
Reporter | ||
Updated•3 years ago
|
Description
•