Closed Bug 763722 Opened 12 years ago Closed 11 years ago

Come up with an offline reindexing solution

Categories

(addons.mozilla.org Graveyard :: Search, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
2012-11-15

People

(Reporter: robhudson, Assigned: tarek)

References

Details

After search mapping changes or when we encounter odd problems, it'd be nice to perform a full re-index without breaking search while it happens. I'm thinking it'd be nice to do from the existing admin panel.

kitsune has both read/write indexes so they can re-index, then switch over.  This commit (plus some fix-ups after) has most of the work for that: https://github.com/mozilla/kitsune/commit/2faae150

We should do something similar. We may be able to use elasticsearch index aliases (http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html) for swapping indexes once a full re-index is complete. 

This bug is to come up with the plan and spawn off other bug(s).
Prod is using an aliased index now. We can make this bug to come up with an automated solution to:

1. Create a new index based on a timestamp.
2. Set up the mapping and re-index into that index.
3. Perform a remove/add alias to switch the alias to the new index once indexing is complete.
4. Remove the old index that is no longer aliased.

And wire this up to the admin's "reindex" button and/or have a nice management comment to do this.
Assignee: nobody → robhudson.mozbugs
Target Milestone: --- → 2012-09-06
Got most of the way. Needs a bit of refactor and wiring up to celery task trees so we don't move the alias before indexing is done.

WIP branch:
https://github.com/robhudson/zamboni/tree/763722-offline-es-indexing
Target Milestone: 2012-09-06 → 2012-09-13
Target Milestone: 2012-09-13 → 2012-09-20
Target Milestone: 2012-09-20 → ---
Tarek & Alexis are going to check this out.  Thanks!
Assignee: robhudson.mozbugs → tarek
Target Milestone: --- → 2012-10-04
Okay, so I have a few more questions to be sure to grab all the sense out of this.

In the implementation, I see you're using cronjobs (from django-cronjobs) where I thought we would be using celery tasks. Is there any reason for this? (Maybe it's only historical?).

What I wanted to do is to split this thing in multiple celery tasks and make them inter-dependent using celery-tasktree. This means that one needs to convert the actual cronjob tasks to a celery task

Is this the way to go or am I missing something?
The code is in cron.py but the intention is to use celery tasks, yes. Most of the changes in that commit are to pass the index in to the various methods rather than act on a default index so we can index into a new index while the site is still using the old index.
I have updated Rob's branch with the current master and started to work at https://github.com/tarekziade/zamboni/tree/es-indexing

Will address Kumar & Rob remarks there and make it a management command etc
so, the branch is a nightmare to read - I suspect I'll redo a new branch later but in the meantime you can follow the dev here : https://github.com/tarekziade/zamboni/blob/es-indexing/mkt/zadmin/management/commands/reindex.py

(the rest of the code is just what Rob did to make the app aware of the reindexing dance)
Thanks to Kumar - I have improved my git Fu and now have a clean branch up-to-date https://github.com/tarekziade/zamboni/compare/763722-es-indexing
So the initial idea is implemented and "works" - but we found out that we did not take care of one edge case: stuff that get indexed while the reindexation is going on are lost.

we're discussing the option here : https://github.com/tarekziade/zamboni/commit/69a117258d3145b1831f7d9622c3b270a98f2961#commitcomment-1953923
Target Milestone: 2012-10-04 → 2012-10-11
The branch is ready & functional here - https://github.com/tarekziade/zamboni/compare/es-indexing-3

But I think we definitely want to fix bug 785414 before we push my branch into master
Depends on: 785414
(In reply to Tarek Ziadé (:tarek) from comment #10)
> The branch is ready & functional here -
> https://github.com/tarekziade/zamboni/compare/es-indexing-3
> 
> But I think we definitely want to fix bug 785414 before we push my branch
> into master

Ok, I bumped that bug and will bump this one.
Target Milestone: 2012-10-11 → 2012-10-25
thx. Will try to fix bug 785414 because we don't want this big patch to wait for too long
I propose that we just merge this after the 18th and work with Jenkins & Ops to validate that everything runs smoothly. 

Trying to fix the ES test cases seems to be a much bigger work.
No longer depends on: 785414
(In reply to Tarek Ziadé (:tarek) from comment #13)
> I propose that we just merge this after the 18th and work with Jenkins & Ops
> to validate that everything runs smoothly. 

I agree.
Once Bug 805236 is merged and Bug 805861 done I will run a new reindexing on -dev
Depends on: 805236, 805861
Target Milestone: 2012-10-25 → 2012-11-01
Depends on: 808505
Target Milestone: 2012-11-01 → 2012-11-15
Is this bug worth keeping open?  Is there more to do?
Nope, I'd say it's fixed.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.