Closed Bug 864353 Opened 12 years ago Closed 12 years ago

Applying sorting with searches is returning no results

Categories

(Webmaker Graveyard :: MakeAPI, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mjschranz, Assigned: mjschranz)

Details

Attachments

(1 file)

Right now on master applying sorting on a specific field is broken and returning incorrect results.
I've done debugging here and the problem seems to be related to how elastic search sorts. If the mapping has fields that are string types with more than one value (IE, our array of strings for tags). This I've done some searching and this is what I have found http://blog.wiercinski.net/2011/uncategorized/elasticsearch-sorting-on-string-types-with-more-than-one-value-per-doc-or-more-than-one-token-per-field/ In order to do this we basically can't use Mongo/Mongoosastic as middleware and would have to do a lot of the internal mapping ourselves. We also would have to switch to a new NPM module, like https://github.com/phillro/node-elasticsearch-client. Problem with that NPM module is it isn't nearly as feature complete and has no docs and is basically a PITA to use. At this point, not sure where else we want to go with it all.
CC'ing some ES folks for thoughts. We're using ES to index "makes" in webmaker (code is in https://github.com/mozilla/MakeAPI). The issue Matt is hitting seems to be that ES is having trouble dealing with our concept of tags when sorting results. Our use case is basically, "Take a URL and an array of tags (Strings) along with a bit of other data." When he's sorting he sees: “Can’t sort on string types with more than one value per doc, or more than one token per field” I want to make sure the way we're doing our data into ES is good for the search + sort case. A typical way we'll want to get the data back out is to say, "Give me all makes with tag X sorted on updatedDate" or the like. Any tips or insights you can give Matt here would be greatly valued.
To give a little more information, this is currently what the mapping for our index looks like http://pastebin.mozilla.org/2346270.
Moving Matt's paste into the bug so pastebin doesn't lose it on us: make: { properties: { tags: { index: not_analyzed omit_norms: true index_options: docs type: string } locale: { type: string } contentType: { type: string } url: { index: not_analyzed omit_norms: true index_options: docs type: string } updatedAt: { type: long } author: { index: not_analyzed omit_norms: true index_options: docs type: string } title: { type: string } thumbnail: { type: string } remixedFrom: { type: string } email: { index: not_analyzed omit_norms: true index_options: docs type: string } createdAt: { type: long } description: { type: string } deletedAt: { type: long } published: { type: object } } }
I've not quite arrived at an understanding of your problem. So you have an array of unanalyzed tags in your "tags" field. And you're sorting by them? Sorting by an array doesn't make any sense to me. What behavior would you like to see?
We actually don't apply the sort to the tags field at all. I'm finding the problem basically occurs on any field I have tried with our setup, although specifically tried with description.
Turns out the root of the problem was how default analyzers were being applied when we would perform sorting on fields where we didn't specify that they weren't analyzed.
Attachment #742415 - Flags: review?(chris)
Right. As that material you cited above says, you can't sort on any multi-token field. If your analyzer breaks up a field into multiple terms, you're out of luck. :-) Fortunately, you can always analyze a field a bunch of different ways and hang onto them all.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Attachment mime type: text/plain → text/x-github-pull-request
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: