Closed Bug 793010 Opened 12 years ago Closed 12 years ago

Research: Elastic search

Categories

(developer.mozilla.org Graveyard :: Dashboards, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: openjck, Unassigned)

References

Details

(Whiteboard: [tracker] p=1)

We talked about using Elastic Search to power some of the items on the MDN dashboard. We should look into this.

We decided to timebox this at about 1-point worth of work. If you find yourself spending more time than that, let me know and we can set aside more time for research next Sprint.

Please capture your findings at the following page. Again, this is sort of a experiment in using the Wiki to help with planning. Let me know if this helps, or if it's just a distraction.

https://wiki.mozilla.org/MDN/Development/Components/Dashboards
The first (and maybe only) action on this is to start an email thread with SUMO to get advice on what to do next.
Let's get the advice here so it's more public and accessible.

James & Will, how do you suggest we start with ElasticSearch on MDN?
Whiteboard: p=1 → [tracker] p=1
First, you'll need a library to talk to ElasticSearch. It's got a REST api and everything is in JSON which is nice, but ... you'll want an abstraction layer to make it easier to use.

There are (IMO) two good alternatives for ElasticSearch libraries:

1. django-haystack: http://haystacksearch.org/
2. ElasticUtils: http://elasticutils.readthedocs.org/en/latest/

The former works across a bunch of different search systems and acts like the Django ORM and tries to make it really easy to add search to your Django app. I'm using it with richard, but that's it.

The latter is maintained by Mozilla. We've been fleshing out issues with the project and slowly restructuring it to be easy to use and convenient, but still allow you "bare-metal" access to ElasticSearch. We've succeeded in that in some ways and sucked in others. It's undergoing a lot of rapid improvement, still. That might slow down soon. SUMO, Input, Mozillians and AMO all use ElasticUtils. Probably some other projects, too.

My vote is that the first thing you do is figure out which library you want to use.

After doing that, you need to figure out what you're indexing and how you want to access it and then it's the IR equivalent to DB stuff.
I had a so-so experience with haystack & solr about a year ago - it's nice for doing basic search in a django-like way, but I quickly reached for pysolr to do stuff like facets and boosting.

So, I vote for elasticutils.

We will index docs first for sure - Document objects/records with HTML content and metadata including tags. I'd say we do:

* https://github.com/mozilla/kitsune/blob/master/apps/wiki/models.py#L558
* https://github.com/mozilla/kitsune/blob/master/apps/wiki/models.py#L583

-product, -is_archived, -recent_helpful_votes
+tags, +review_tags, +creator

decide: 

* use categories on docs and add them to index?
* restore keywords and add them to index?
Blocks: 794525
Does this have a rest interface we can hook up an autocompleter to?
Sure does. http://www.elasticsearch.org/guide/reference/api/

Closing this research bug now that I've created https://bugzilla.mozilla.org/showdependencytree.cgi?id=794525&hide_resolved=1
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.