Closed Bug 838638 Opened 11 years ago Closed 11 years ago

overhaul search to use separate mapping types

Categories

(support.mozilla.org :: Search, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
2013Q1

People

(Reporter: willkg, Assigned: willkg)

References

Details

(Whiteboard: u=dev c=search p=10 s=2013.9)

Way back when we switched from bucketed search (where the code would do three separate ES searches and then show the results for the kb, then the results for the support forum (aka questions), then the results for the contributor forum (aka forums)) to unified search (where the code would do one single ES search and get back all the data sorted by _score), we decided to implement it using a unified mapping type (aka doctype). The primary reason for this (as I recall) was problems with searching across multiple mapping types (aka doctypes) with ElasticUtils which was at the time using pyes.

Times have changed! ElasticUtils now uses pyelasticsearch and it's possible to search across multiple mapping types.

The current implementation of a unified doctype and the code surrounding it is pretty adamant about things and not very flexible. It'll be difficult to add new mapping types and increasingly difficult to adjust the existing ones. Plus we wrote a bunch of scaffolding for Kitsune that does the same thing that ElasticUtils can do, so we've got a lot of additional code we should remove.

This bug covers:

1. updating to ElasticUtils/pyelasticsearch
2. overhauling search to use multiple mapping types again (one for each Django model)
3. removing the scaffolding in Kitsune that also exists in ElasticUtils
I think we should do it this quarter!
Whiteboard: u=dev c=search p= s=
Target Milestone: --- → 2013Q1
I'm taking it now!
Assignee: nobody → willkg
I've already covered item 1 from the list. That was easy-ish and in a different bug.

I'm going to tackle item 2 now. I think it entails:

Stage 1:

1. write a KitsuneMappingType class that handles Kitsune peculiarities like the index we write to and keeps track of all registered mapping types
2. write DocumentMappingType, QuestionMappingType and ThreadMappingType subclasses
3. rewrite the command line indexing code to use the MappingType classes
4. rewrite the admin indexing code to use the MappingType classes
5. fix the cron jobs that do indexing
6. rewrite anything else that either does indexing or looks at index stats
7. rewrite all the indexing-related tests

Stage 2:

1. rewrite the view code to search across the various doctypes (this is one step, but is actually a non-trivial project)
2. adjust the "articles like this" code to use the right classes


Because we're writing all new MappingType-based indexing code that is parallel to the SearchMixin-based indexing code, I think we can have both simultaneously. That should make it easier to transition from the old ways to the new ways without taking search down and without writing a bunch of transition code.


Ricky, Mike: Does that look right? Am I missing anything?
Stage 1 is more complicated than I thought. In order for the tests to pass, I need to keep the existing indexing system and build a parallel indexing system along-side it rather than replace the existing indexing system with the new one.

State 2 is interesting in that creating an S with a mapping type doesn't let us search across doctypes, so we have to create an untyped S. That's fine. It just means we can't put any business logic in our mapping type classes. We didn't really do that before, so it's not a big deal.
In the interests of time, I think I'm going to go with the original plan and we'll expect all the tests to fail after the stage 1 commit which will require us to do manual testing (which we do anyhow). I'll write a command to make that easier.

As a side note, while I'm overhauling search, I'm fixing the infrastructure to allow for the following:

1. document-level boosting at index time
2. setting index settings when we create the index
3. indexing things other than support questions, wiki documents and forum threads
I'm pretty sure I'm done this:

Stage 1:

1. write a SearchMappingType class that handles Kitsune peculiarities like the index we write to and keeps track of all registered mapping types
2. write DocumentMappingType, QuestionMappingType and ThreadMappingType subclasses
3. rewrite the command line indexing code to index both to the unified doctype and the new MappingType-based doctypes (i.e. we're indexing everything twice for a bit)
4. rewrite the admin indexing code index both to the unified doctype and the new MappingType-based doctypes
5. fix the cron jobs that do indexing
6. rewrite anything else that either does indexing or looks at index stats

I'm left with this:

7. write tests for MappingType-based doctype indexing

I need to write some tests that test the new indexing code to make sure it's working properly.

Once I'm done that, I'll move on to stage 2.
The deployment plan is this:

1. ask the other devs to land anything incoming now
2. after that, rebase against master and run tests
3. land the code
4. deploy stage 1 commit to support.allizom.org (stage site)
5. go into admin and create new index

   wait forever because staging server is slow

   meanwhile, make sure the site continues to work

   * search from the front page
   * aaq suggestions
   * related documents in the kb
   * questions/stats/ histogram

6. when that's done, deploy stage 2 commit to support.allizom.org (stage site)

   make sure the site continues to work

   * search from the front page
   * aaq suggestions
   * related documents in the kb
   * questions/stats/ histogram


If that's all good, let's do production, but during "off hours". Possibly later tonight (Wednesday May 8th).
Adjusted plan:

1. ask the other devs to land anything incoming now
2. after that, rebase against master and run tests
3. land the code
4. deploy stage 1 commit to support.allizom.org (stage site)
5. go into admin and create new index

   wait forever because staging server is slow

   meanwhile, make sure the site continues to work

   * search from the front page
   * aaq suggestions
   * related documents in the kb
   * questions/stats/ histogram
   * https://support.mozilla.org/en-US/products/firefox/get-started
   * run qa staging tests

6. when that's done, deploy stage 2 commit to support.allizom.org (stage site)

   make sure the site continues to work

   * search from the front page
   * aaq suggestions
   * related documents in the kb
   * questions/stats/ histogram
   * https://support.mozilla.org/en-US/products/firefox/get-started
   * run qa staging tests

If that's all good, let's do production, but during "off hours". Possibly later tonight (Wednesday May 8th).
Landed in production:

https://github.com/mozilla/kitsune/commit/e9c14eb
https://github.com/mozilla/kitsune/commit/72a50c1
https://github.com/mozilla/kitsune/commit/32966f5
https://github.com/mozilla/kitsune/commit/8726ee3

YAY!!!!!11!

Making this a 10 pointer because it took me the greater part of this quarter to unblock this work and then finally do it.
Status: NEW → RESOLVED
Closed: 11 years ago
Priority: -- → P1
Resolution: --- → FIXED
Whiteboard: u=dev c=search p= s= → u=dev c=search p=10 s=2013.9
lulz 10pt omg
You need to log in before you can comment on or make changes to this bug.