After search mapping changes or when we encounter odd problems, it'd be nice to perform a full re-index without breaking search while it happens. I'm thinking it'd be nice to do from the existing admin panel. kitsune has both read/write indexes so they can re-index, then switch over. This commit (plus some fix-ups after) has most of the work for that: https://github.com/mozilla/kitsune/commit/2faae150 We should do something similar. We may be able to use elasticsearch index aliases (http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html) for swapping indexes once a full re-index is complete. This bug is to come up with the plan and spawn off other bug(s).
Prod is using an aliased index now. We can make this bug to come up with an automated solution to: 1. Create a new index based on a timestamp. 2. Set up the mapping and re-index into that index. 3. Perform a remove/add alias to switch the alias to the new index once indexing is complete. 4. Remove the old index that is no longer aliased. And wire this up to the admin's "reindex" button and/or have a nice management comment to do this.
Assignee: nobody → robhudson.mozbugs
Target Milestone: --- → 2012-09-06
Got most of the way. Needs a bit of refactor and wiring up to celery task trees so we don't move the alias before indexing is done. WIP branch: https://github.com/robhudson/zamboni/tree/763722-offline-es-indexing
Target Milestone: 2012-09-06 → 2012-09-13
Tarek & Alexis are going to check this out. Thanks!
Assignee: robhudson.mozbugs → tarek
Target Milestone: --- → 2012-10-04
Okay, so I have a few more questions to be sure to grab all the sense out of this. In the implementation, I see you're using cronjobs (from django-cronjobs) where I thought we would be using celery tasks. Is there any reason for this? (Maybe it's only historical?). What I wanted to do is to split this thing in multiple celery tasks and make them inter-dependent using celery-tasktree. This means that one needs to convert the actual cronjob tasks to a celery task Is this the way to go or am I missing something?
The code is in cron.py but the intention is to use celery tasks, yes. Most of the changes in that commit are to pass the index in to the various methods rather than act on a default index so we can index into a new index while the site is still using the old index.
I have updated Rob's branch with the current master and started to work at https://github.com/tarekziade/zamboni/tree/es-indexing Will address Kumar & Rob remarks there and make it a management command etc
so, the branch is a nightmare to read - I suspect I'll redo a new branch later but in the meantime you can follow the dev here : https://github.com/tarekziade/zamboni/blob/es-indexing/mkt/zadmin/management/commands/reindex.py (the rest of the code is just what Rob did to make the app aware of the reindexing dance)
Thanks to Kumar - I have improved my git Fu and now have a clean branch up-to-date https://github.com/tarekziade/zamboni/compare/763722-es-indexing
So the initial idea is implemented and "works" - but we found out that we did not take care of one edge case: stuff that get indexed while the reindexation is going on are lost. we're discussing the option here : https://github.com/tarekziade/zamboni/commit/69a117258d3145b1831f7d9622c3b270a98f2961#commitcomment-1953923
The branch is ready & functional here - https://github.com/tarekziade/zamboni/compare/es-indexing-3 But I think we definitely want to fix bug 785414 before we push my branch into master
Depends on: 785414
(In reply to Tarek Ziadé (:tarek) from comment #10) > The branch is ready & functional here - > https://github.com/tarekziade/zamboni/compare/es-indexing-3 > > But I think we definitely want to fix bug 785414 before we push my branch > into master Ok, I bumped that bug and will bump this one.
Target Milestone: 2012-10-11 → 2012-10-25
thx. Will try to fix bug 785414 because we don't want this big patch to wait for too long
I propose that we just merge this after the 18th and work with Jenkins & Ops to validate that everything runs smoothly. Trying to fix the ES test cases seems to be a much bigger work.
No longer depends on: 785414
(In reply to Tarek Ziadé (:tarek) from comment #13) > I propose that we just merge this after the 18th and work with Jenkins & Ops > to validate that everything runs smoothly. I agree.
Once Bug 805236 is merged and Bug 805861 done I will run a new reindexing on -dev
Is this bug worth keeping open? Is there more to do?
Nope, I'd say it's fixed.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.