Closed Bug 851588 Opened 12 years ago Closed 12 years ago

Refactor Elasticsearch code to use ES result instead of app id + mysql query

Categories

(Marketplace Graveyard :: Search, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED
2013-05-23

People

(Reporter: hschlichting, Assigned: robhudson)

References

Details

(Whiteboard: [u=dev c=mkt-search p=3])

As noted by Rob in an email thread: I will point out that currently the Marketplace uses elasticsearch inefficiently since we query ES and ask for only the IDs of the apps, which we then use in an `ids__in=[...]` query to the database. But this can be optimized to not hit the database at all given that we have all the data we need in the index (and the signals to fire index updates are more reliable). This is something that's easily solved with some code refactoring and indexing changes.
I wouldn't mind taking this since it's in the zamboni and doesn't necessarily need a lot of deep elasticsearch knowledge. I've already been talking to willkg (author of elasticutils) about this too and have been mentally structuring the code. I think it would be helpful to split the AMO/MKT indexes. If we could do that I can start from scratch with exactly the fields we need in ES. We could maybe do this along with the work to add the new binary flags for app feature sets?
Yeah, I'm not sure how exactly to split / combine this with the other tasks. I just filled this bug, so the idea doesn't get lost in email nirvana.
Assignee: hschlichting → robhudson.mozbugs
I thought about this some this morning and I think this bug could be thought about as the following steps: 1. Upgrade to elasticutils 0.7 (which is current in development) 1.a. Install pyelasticsaerch required by elasticutils now. 2. Come up with a new schema for apps (a mapping type) to be used for the apps index. Code this also as a elasticutils mapping type (https://elasticutils.readthedocs.org/en/latest/queries.html#mapping-types). Then connect everywhere we use search to use this. Most importantly this is the Search API, but there are a few other areas in marketplace that hit ES outside of this (e.g. lookup tool, stats?) 3. Make sure AMO doesn't break. With EU 0.7 the backend switches from pyes to pyelasticsearch (which is better), but this means the `get_es` call that AMO uses no longer will exist. It should be trivial write our own `get_es` to get a connection that AMO then uses.
Number 3 above has been filed as bug 857156, and can be done first to AMO side is isolated away from elasticutils. Also, this will be useful to know what the breaking change are: https://elasticutils.readthedocs.org/en/latest/changelog.html
Whiteboard: [u=dev c=mkt-search p=2] → [u=dev c=mkt-search p=3]
Depends on: 857156
Depends on: 869595
Just posting a curl comparison I did with this waffled off then on. Tested by averaging 5 total times via curl: OFF: q=twitter, results=27, avg=2.4236 ON: q=twitter, results=27, avg=0.764 OFF: q=app, results=100, avg=3.2982 ON: q=app, results=100, avg=0.800
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
The waffle flag is "search-api-es".
You need to log in before you can comment on or make changes to this bug.