Closed Bug 928117 Opened 12 years ago Closed 11 years ago

Take advantage of Elasticsearch's synonym support

Categories

(Marketplace Graveyard :: Search, defect, P2)

Avenir
x86
macOS
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: robhudson, Unassigned)

Details

(Whiteboard: [better-search])

Elasticsearch supports synonyms, thought the documentation is a bit lacking: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html But it allows a couple things from my understanding. 1. Index time synonyms: Adding to the list of tokens based on a synonym list. For example, if we have synonyms of: ['cards', 'solitaire', 'poker'] and we index an app named "Poker", it'll also get the tokens "cards" and "solitaire". 2. Query time synonyms: Via the analyzers, we can add our synonym list to the query so searching for "solitaire" could also search for the tokens "cards" and "poker" automatically. Or both of the above. The only downside is the management of synonyms requires a re-index. And the synonym list looks like it has to be in a specific format, so we'd need to generate it somehow. It can also impact search performance so this should be used wisely and minimally. But it could help surface some apps that otherwise would not get surfaced if app names/descriptions aren't written with search in mind. It's something we could definitely experiment with on -dev and see how if it helps. Note: Here's a helpful gist from one of the Elasticsearch devs: https://gist.github.com/clintongormley/4095280
OMG this is awesome! Let's do it!
Whiteboard: [better-search]
are there pre-made synonym lists we can use or would that be a bad idea for perf?
(In reply to Wil Clouser [:clouserw] from comment #2) > are there pre-made synonym lists we can use or would that be a bad idea for > perf? There are some pre-made synonym lists but it is highly recommended to build your own for the domain of the search (apps). They are also not in every language. Since we have custom analyzers per language we support it makes sense to do these at query time. The query time synonyms are also not as bad on performance since it's similar to adding more keywords in the query (e.g. typing "cards poker solitude" in the query box instead of "cards"). Depending on the number of synonyms based on the source query this could end up being a lot but probably (and ideally) it won't normally. Setting this up I'm imagining would involve some admin page where we manage the synonyms per locale. That then, on save, gets pushed to the index settings. Doing it this way doesn't require a re-index I'm 99% sure -- I could test locally to verify. This bug can likely be broken out into multiple bugs for the admin side and search side. However, I'm curious if there's a small test case we could set up and test on -dev to verify this is the direction we want to go.
Priority: P4 → P2
I'm going to WONTFIX this. As we've discussed it more a site-wide synonym list seems like it would hurt search more than help. E.g. if I search "card games", I could mean poker or solitaire or all card games - we can't and shouldn't guess. The inverse is if I search "poker" I may not want to find all card games and may consider that a search for the worse when I've entered a nice specific search term. Also, the job of localizing synonyms seemed to be a huge daunting task with no clear approach. We do plan to move towards allowing keywords for individual apps and offer a hint to the developers to add synonyms or alternate search terms if they make sense for their app.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.