Note: There are a few cases of duplicates in user autocompletion which are being worked on.

Full elastic reindexing trips over live indexing

VERIFIED FIXED in 2012-01-24

Status

support.mozilla.org
Search
P1
normal
VERIFIED FIXED
6 years ago
4 years ago

People

(Reporter: erik, Assigned: erik)

Tracking

unspecified
2012-01-24

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: u=dev c=search s=2012.2 p=1 post-sprint)

(Assignee)

Description

6 years ago
Full reindexing runs can blow up when they set up the mappings if conflicting mappings have already been inferred due to live indexing. Here's one such blow up:

Task search.tasks.reindex_with_progress with id 1069624b-5651-4141-a2fe-6505bf35fe62 raised exception:
"ElasticSearchException(u'MergeMappingException[Merge failed with failures {[mapper [question_content] has different store values, mapper [question_content] has different term_vector values, mapper [question_content] has different index_analyzer, mapper [question_content] has different search_analyzer, mapper [answer_content] has different index_analyzer, mapper [answer_content] has different search_analyzer, mapper [title] has different index_analyzer, mapper [title] has different search_analyzer, mapper [replies] of different type, current_type [long], merged_type [integer], mapper [question_votes] of different type, current_type [long], merged_type [integer], mapper [answer_votes] of different type, current_type [long], merged_type [integer]]}]',)"


Task was called with args: [] kwargs: {'waffle_when_done': False}.

The contents of the full traceback was:

Traceback (most recent call last):
 File "/data/www/support.mozilla.com/kitsune/vendor/src/celery/celery/execute/trace.py", line 34, in trace
   return cls(states.SUCCESS, retval=fun(*args, **kwargs))
 File "/data/www/support.mozilla.com/kitsune/vendor/src/celery/celery/task/base.py", line 227, in __call__
   return self.run(*args, **kwargs)
 File "/data/www/support.mozilla.com/kitsune/vendor/src/celery/celery/app/__init__.py", line 141, in run
   return fun(*args, **kwargs)
 File "/data/www/support.mozilla.com/kitsune/apps/search/tasks.py", line 38, in reindex_with_progress
   for ratio in es_reindex_with_progress():
 File "/data/www/support.mozilla.com/kitsune/apps/search/es_utils.py", line 101, in <genexpr>
   return (float(done) / total for done, _ in
 File "/data/www/support.mozilla.com/kitsune/apps/search/models.py", line 128, in index_all
   es.put_mapping(doc_type, mapping, index)
 File "/data/www/support.mozilla.com/kitsune/vendor/packages/pyes/pyes/es.py", line 541, in put_mapping
   return self._send_request('PUT', path, mapping)
 File "/data/www/support.mozilla.com/kitsune/vendor/packages/pyes/pyes/es.py", line 223, in _send_request
   raise_if_error(response.status, decoded)
 File "/data/www/support.mozilla.com/kitsune/vendor/packages/pyes/pyes/convert_errors.py", line 77, in raise_if_error
   raise pyes.exceptions.ElasticSearchException(error, status, result)
ElasticSearchException: MergeMappingException[Merge failed with failures {[mapper [question_content] has different store values, mapper [question_content] has different term_vector values, mapper [question_content] has different index_analyzer, mapper [question_content] has different search_analyzer, mapper [answer_content] has different index_analyzer, mapper [answer_content] has different search_analyzer, mapper [title] has different index_analyzer, mapper [title] has different search_analyzer, mapper [replies] of different type, current_type [long], merged_type [integer], mapper [question_votes] of different type, current_type [long], merged_type [integer], mapper [answer_votes] of different type, current_type [long], merged_type [integer]]}]


The easy solution: do all the mapping-putting up front. Then at least indexing runs will fail fast if at all.

The harder, more correct one: find an atomic way to delete an index and put the mappings so no live indexing can happen in between.
About a week ago, I added some code that allows us to specify index by doctype and the code is completely oblivious to it. That might make fixing this funky, so I figured I'd mention it.
(Assignee)

Comment 2

6 years ago
I was going to ask what that was for and how badly we want to preserve it. So far, I haven't tripped over it.
Adding to current sprint so we can iterate quickly on getting ES working in prod.
Whiteboard: u=dev c=search s=2012.2 p=1 post-sprint
(Assignee)

Comment 4

6 years ago
Landed on master: https://github.com/mozilla/kitsune/commit/8ec3c41a75668b34c8de83060422da5f8d1e69c1

Coming on next.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Assignee)

Updated

6 years ago
Target Milestone: 2012Q1 → 2012-01-24
(Assignee)

Comment 5

6 years ago
Landed on next: http://github.com/mozilla/kitsune/commit/f2b4463c6255058db36c1e7462cfeb51dec58584
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.