Come up with an offline reindexing solution

RESOLVED FIXED in 2012-11-15

Status

addons.mozilla.org Graveyard
Search
RESOLVED FIXED
6 years ago
2 years ago

People

(Reporter: robhudson, Assigned: tarek)

Tracking

unspecified
2012-11-15
Dependency tree / graph

Details

(Reporter)

Description

6 years ago
After search mapping changes or when we encounter odd problems, it'd be nice to perform a full re-index without breaking search while it happens. I'm thinking it'd be nice to do from the existing admin panel.

kitsune has both read/write indexes so they can re-index, then switch over.  This commit (plus some fix-ups after) has most of the work for that: https://github.com/mozilla/kitsune/commit/2faae150

We should do something similar. We may be able to use elasticsearch index aliases (http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html) for swapping indexes once a full re-index is complete. 

This bug is to come up with the plan and spawn off other bug(s).
(Reporter)

Comment 1

6 years ago
Prod is using an aliased index now. We can make this bug to come up with an automated solution to:

1. Create a new index based on a timestamp.
2. Set up the mapping and re-index into that index.
3. Perform a remove/add alias to switch the alias to the new index once indexing is complete.
4. Remove the old index that is no longer aliased.

And wire this up to the admin's "reindex" button and/or have a nice management comment to do this.
Assignee: nobody → robhudson.mozbugs
Target Milestone: --- → 2012-09-06
(Reporter)

Comment 2

6 years ago
Got most of the way. Needs a bit of refactor and wiring up to celery task trees so we don't move the alias before indexing is done.

WIP branch:
https://github.com/robhudson/zamboni/tree/763722-offline-es-indexing
Target Milestone: 2012-09-06 → 2012-09-13
(Reporter)

Updated

6 years ago
Target Milestone: 2012-09-13 → 2012-09-20
(Reporter)

Updated

6 years ago
Target Milestone: 2012-09-20 → ---
Tarek & Alexis are going to check this out.  Thanks!
Assignee: robhudson.mozbugs → tarek
Target Milestone: --- → 2012-10-04
Okay, so I have a few more questions to be sure to grab all the sense out of this.

In the implementation, I see you're using cronjobs (from django-cronjobs) where I thought we would be using celery tasks. Is there any reason for this? (Maybe it's only historical?).

What I wanted to do is to split this thing in multiple celery tasks and make them inter-dependent using celery-tasktree. This means that one needs to convert the actual cronjob tasks to a celery task

Is this the way to go or am I missing something?
(Reporter)

Comment 5

6 years ago
The code is in cron.py but the intention is to use celery tasks, yes. Most of the changes in that commit are to pass the index in to the various methods rather than act on a default index so we can index into a new index while the site is still using the old index.
(Assignee)

Comment 6

6 years ago
I have updated Rob's branch with the current master and started to work at https://github.com/tarekziade/zamboni/tree/es-indexing

Will address Kumar & Rob remarks there and make it a management command etc
(Assignee)

Comment 7

6 years ago
so, the branch is a nightmare to read - I suspect I'll redo a new branch later but in the meantime you can follow the dev here : https://github.com/tarekziade/zamboni/blob/es-indexing/mkt/zadmin/management/commands/reindex.py

(the rest of the code is just what Rob did to make the app aware of the reindexing dance)
(Assignee)

Comment 8

6 years ago
Thanks to Kumar - I have improved my git Fu and now have a clean branch up-to-date https://github.com/tarekziade/zamboni/compare/763722-es-indexing
(Assignee)

Comment 9

6 years ago
So the initial idea is implemented and "works" - but we found out that we did not take care of one edge case: stuff that get indexed while the reindexation is going on are lost.

we're discussing the option here : https://github.com/tarekziade/zamboni/commit/69a117258d3145b1831f7d9622c3b270a98f2961#commitcomment-1953923
(Assignee)

Updated

6 years ago
Target Milestone: 2012-10-04 → 2012-10-11
(Assignee)

Comment 10

6 years ago
The branch is ready & functional here - https://github.com/tarekziade/zamboni/compare/es-indexing-3

But I think we definitely want to fix bug 785414 before we push my branch into master
Depends on: 785414
(In reply to Tarek Ziadé (:tarek) from comment #10)
> The branch is ready & functional here -
> https://github.com/tarekziade/zamboni/compare/es-indexing-3
> 
> But I think we definitely want to fix bug 785414 before we push my branch
> into master

Ok, I bumped that bug and will bump this one.
Target Milestone: 2012-10-11 → 2012-10-25
(Assignee)

Comment 12

6 years ago
thx. Will try to fix bug 785414 because we don't want this big patch to wait for too long
(Assignee)

Comment 13

6 years ago
I propose that we just merge this after the 18th and work with Jenkins & Ops to validate that everything runs smoothly. 

Trying to fix the ES test cases seems to be a much bigger work.
No longer depends on: 785414
(Reporter)

Comment 14

6 years ago
(In reply to Tarek Ziadé (:tarek) from comment #13)
> I propose that we just merge this after the 18th and work with Jenkins & Ops
> to validate that everything runs smoothly. 

I agree.
(Assignee)

Comment 15

6 years ago
Once Bug 805236 is merged and Bug 805861 done I will run a new reindexing on -dev
Depends on: 805236, 805861
Target Milestone: 2012-10-25 → 2012-11-01
(Assignee)

Updated

6 years ago
Depends on: 808505
Target Milestone: 2012-11-01 → 2012-11-15
Is this bug worth keeping open?  Is there more to do?
(Reporter)

Comment 17

5 years ago
Nope, I'd say it's fixed.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.