Applying sorting with searches is returning no results

RESOLVED FIXED

Status

RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: mjschranz, Assigned: mjschranz)

Tracking

Details

Attachments

(1 attachment)

(Assignee)

Description

6 years ago
Right now on master applying sorting on a specific field is broken and returning incorrect results.
(Assignee)

Comment 1

6 years ago
I've done debugging here and the problem seems to be related to how elastic search sorts.

If the mapping has fields that are string types with more than one value (IE, our array of strings for tags). This I've done some searching and this is what I have found http://blog.wiercinski.net/2011/uncategorized/elasticsearch-sorting-on-string-types-with-more-than-one-value-per-doc-or-more-than-one-token-per-field/

In order to do this we basically can't use Mongo/Mongoosastic as middleware and would have to do a lot of the internal mapping ourselves. We also would have to switch to a new NPM module, like https://github.com/phillro/node-elasticsearch-client. Problem with that NPM module is it isn't nearly as feature complete and has no docs and is basically a PITA to use.

At this point, not sure where else we want to go with it all.
CC'ing some ES folks for thoughts.  We're using ES to index "makes" in webmaker (code is in https://github.com/mozilla/MakeAPI).

The issue Matt is hitting seems to be that ES is having trouble dealing with our concept of tags when sorting results.  Our use case is basically, "Take a URL and an array of tags (Strings) along with a bit of other data."  When he's sorting he sees:

“Can’t sort on string types with more than one value per doc, or more than one token per field”

I want to make sure the way we're doing our data into ES is good for the search + sort case.  A typical way we'll want to get the data back out is to say, "Give me all makes with tag X sorted on updatedDate" or the like.

Any tips or insights you can give Matt here would be greatly valued.
(Assignee)

Comment 3

6 years ago
To give a little more information, this is currently what the mapping for our index looks like http://pastebin.mozilla.org/2346270.
Moving Matt's paste into the bug so pastebin doesn't lose it on us:

make:
{
  properties:
  {
    tags:
    {
      index: not_analyzed
      omit_norms: true
      index_options: docs
      type: string
    }
    locale:
    {
      type: string
    }
    contentType:
    {
      type: string
    }
    url:
    {
      index: not_analyzed
      omit_norms: true
      index_options: docs
      type: string
    }
    updatedAt:
    {
      type: long
    }
    author:
    {
      index: not_analyzed
      omit_norms: true
      index_options: docs
      type: string
    }
    title:
    {
      type: string
    }
    thumbnail:
    {
      type: string
    }
    remixedFrom:
    {
      type: string
    }
    email:
    {
      index: not_analyzed
      omit_norms: true
      index_options: docs
      type: string
    }
    createdAt:
    {
      type: long
    }
    description:
    {
      type: string
    }
    deletedAt:
    {
      type: long
    }
    published:
    {
      type: object
    }
  }
}
I've not quite arrived at an understanding of your problem. So you have an array of unanalyzed tags in your "tags" field. And you're sorting by them? Sorting by an array doesn't make any sense to me. What behavior would you like to see?
(Assignee)

Comment 6

6 years ago
We actually don't apply the sort to the tags field at all. I'm finding the problem basically occurs on any field I have tried with our setup, although specifically tried with description.
(Assignee)

Comment 7

6 years ago
Created attachment 742415 [details] [review]
https://github.com/mozilla/MakeAPI/pull/41

Turns out the root of the problem was how default analyzers were being applied when we would perform sorting on fields where we didn't specify that they weren't analyzed.
Attachment #742415 - Flags: review?(chris)
Right. As that material you cited above says, you can't sort on any multi-token field. If your analyzer breaks up a field into multiple terms, you're out of luck. :-) Fortunately, you can always analyze a field a bunch of different ways and hang onto them all.
Staged on Master: https://github.com/mozilla/MakeAPI/commit/b77ae6f02977d237988768625a2b29930ce80ab5
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Attachment mime type: text/plain → text/x-github-pull-request
You need to log in before you can comment on or make changes to this bug.