Closed Bug 894686 Opened 11 years ago Closed 11 years ago

Implement localized KB search.

Categories

(support.mozilla.org :: Search, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED
2013Q3

People

(Reporter: mythmon, Assigned: mythmon)

References

Details

(Whiteboard: p=3 u=user c=search s=2013.15)

This is the implementation bug for the research bug 889890.

Quick overview:

1. Add new analyzers.
2. Make a mapping for locales to those analyzers.
3. Make document indexing use that mapping to pick the right analyzer based on document locale.
4. Make search use that mapping to pick the right analyzer based on UI locale.

---

There are a few parts to this, but they are all interwoven, which is why this is just one big bug. 

1. Set up new analyzers for each of the snowball languages. Snowball many languages, but we have to make a new analyzer to switch to them. Something like: the top option in http://www.elasticsearch.org/guide/reference/index-modules/analysis/snowball-analyzer/.

This has to happen at index creation time, and would look something like this in raw ES:

index" : {
   "analysis" : {
      "analyzer" : {
         "snowball-german" : {
            "type" : "snowball",
            "language" : "german"
         },
         <repeat for each snowball language>
      }

   },
   <mapping stuff goes here>
}

I think this can be done by adding an appropriate 'analysis' field to the settings object in kitsune.search.es_utils:recreate_index.


2. Add a list of locale -> analyzer mappings. Bug 889890, comment 9 has a proposed mapping. This should also include information about whether a plugin is needed for the language (Polish).


3. Change document indexing to use the mapping from #2 to set the _analyze field of the document to one of the analyzers from #1. ES Docs: http://www.elasticsearch.org/guide/reference/mapping/analyzer-field/

This should take into account whether the analyzer in question is in a plugin that is unavailable. AMO's solution for this was to have a ES_USE_PLUGINS setting that could be set to False when plugins were not available.


4. Change the analyzer for search terms to match the current locale. I can't find ES docs about this, but something like the below is the goal. :willkg tells me that the way to do this is to add a process_query_<something> method to our Sphilastic class. elasticutils docs: http://elasticutils.readthedocs.org/en/latest/api.html#the-s-class.

curl -XGET localhost:9200/test-idx/_search?pretty=true -d '{
    "query": {
        "match": {
            "body": {
                "query": "домашню",
                "analyzer": "russian"
            }
        }
    }
}'

---

This is a big task, and is worth at least 3 points. If it proves to be too big, here are some ideas for spinning off sub bugs:

If the plugin stuff for Polish is not ready from IT (bug 894649), or the toggling of plugin-based-analyzers proves difficult, it could be spun off into another bug.

If this bug ends up being a *lot* bigger than expected, the indexing changes (step 3) could probably be added before the search changes (step #4) without breaking anything. This should be avoided though.

Note about flags: In the past, big ES changes like this were put behind a flag. After talking to Will about it, I don't think that putting this behind a flag is an appropriate thing to do for this. It would add a lot of development time, and although it is a big scary change, the risk of something going horribly wrong when we get to prod (but after it looks good on stage) is pretty low. The most common failure case I have thought of is that search in non-English languages remains just as lame as it is today.

Note about questions: This bug doesn't cover doing the above for questions. We only have a few languages for those, so they are lower priority. We can do this for questions in another bug after this has landed.

Adding this to the next sprint, and estimating at 3 points because that is as high as our point scale goes.

\o/
woot
Target Milestone: --- → 2013Q3
This is still blocked, but I assume we can move on with all locales minus polish?
Yes, locales except Polish can be implemented before the blocker bug is resolved.
Assignee: nobody → mcooper
Status: NEW → ASSIGNED
Depends on: 900282
You need to log in before you can comment on or make changes to this bug.