Closed Bug 1380442 Opened 8 years ago Closed 8 years ago

Resolve reindexing errors from ES 1.x=>5.x

Categories

(Socorro :: Infra, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: miles, Assigned: miles)

References

Details

We are having difficulty reindexing the data from the stage elasticsearch 1.x cluster into the new 5.x cluster. We are encountering cryptic errors that prevent the import from successfully importing all of the data from most of the indices. More documentation to come.
Is it still that error about the key being, supposedly, just an empty string?
Yes - the error looks like this, and occurs on all of these failing indices: { "index": "socorro201720", "type": "crash_reports", "id": "ebf4796e-42d1-4ff5-8afe-3d1d50170517", "cause": { "type": "mapper_parsing_exception", "reason": "failed to parse", "caused_by": { "type": "illegal_argument_exception", "reason": "object field starting or ending with a [.] makes object resolution ambiguous: []" } }, "status": 400 }
One thing we can try is to disable automatic mapping types creation: https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-mapping.html#_disabling_automatic_type_creation I am not sure if this is going to solve our issues, but I think it is worth a try. It won't hurt us since that data we are probably not interested in anyway, and if I understand the feature correctly, it should make ES drop all unknown keys.
This blocks the ES migration, so adding the blocker.
Blocks: 1322630
Assigning this to Miles. Miles and I sat on vidyo and chatted about this for a bit. We did some tests and see this error show up for keys that are the empty string as well as keys that start with a period. Miles wrote a prototype painless inline script that drops keys from the document that are the empty string and we're testing that out. If that works, then we may be "done" in regards to migrating data from the ES 1.x to 5.x clusters. We should also fix the Elasticsearch code in Socorro such that it never indexes documents that have bad keys. Once we're done with this bug, I'll create a follow-up bug to do that work.
Assignee: nobody → miles
We need to fix the code that creates indices (in the processor?) so that it contains ignore_above per https://bugzilla.mozilla.org/show_bug.cgi?id=1322630#c44. I'm pretty confident that this is why I'm having issues with socorro201729, because it was created without those settings. Example of the error I'm seeing: { "index" : "socorro201729", "type" : "crash_reports", "id" : "e0e7673d-8c6d-4ca2-bb1a-72bba0170719", "cause" : { "type" : "illegal_argument_exception", "reason" : "Document contains at least one immense term in field=\"raw_crash.TelemetryEnvironment\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[123, 34, 98, 117, 105, 108, 100, 34, 58, 123, 34, 97, 112, 112, 108, 105, 99, 97, 116, 105, 111, 110, 73, 100, 34, 58, 34, 123, 101, 99]...', original message: bytes can be at most 32766 in length; got 35082", "caused_by" : { "type" : "max_bytes_length_exceeded_exception", "reason" : "max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 35082" } }, "status" : 400 } Peter and I ran into this when we were first working on reindexing and solved it by adding ignore_above.
I've also run into this error while reindexing socorro201729: { "index" : "socorro201729", "type" : "crash_reports", "id" : "5ec2b2a3-96e8-4fd8-a072-9e53f0170719", "cause" : { "type" : "mapper_parsing_exception", "reason" : "failed to parse [raw_crash.UptimeTS]", "caused_by" : { "type" : "number_format_exception", "reason" : "For input string: \"\"" } }, "status" : 400 } This looks to be malformed data - which would be ignored if we used `"index.mapping.ignore_malformed": true` in our index creation settings. Needinfoing adrian to look into and fix index creation.
Flags: needinfo?(adrian)
We were able to escape some of the reindexing issues with a combination of the following painless scripts: ctx._source.remove("") ctx._source.raw_crash.remove("")
We should definitely ignore malformed values, but only if (as I understand it) it means only the impacted field is removed, not the entire document.
Flags: needinfo?(adrian)
The only indices that have problems importing at present are: socorro201711 <- timed out weirdly, will retry (made it to 482255 docs / 10.7gb) socorro201717 <- eventually had a batch size error, will retry (made it to 480000 docs / 22.3gb) socorro201718 <- had a batch size error, will retry (made it to 52500 docs / 2.7gb) socorro201719 <- eventually had a batch size error, will retry (made it to 272500 docs / 12.7gb) socorro201725 <- [1] socorro201727 <- lost to the sands of time socorro201728 <- marooned on a desert isle The available indices on the stage cluster are 05-26 and 29 (27 and 28 were the weeks we were on the 5.x cluster). [1]: { "index" : "socorro201725", "type" : "crash_reports", "id" : "596e0551-a5c4-48be-9a7a-f1c7f0170621", "cause" : { "type" : "mapper_parsing_exception", "reason" : "failed to parse", "caused_by" : { "type" : "array_index_out_of_bounds_exception", "reason" : "-1" } }, "status" : 400 } There's a key raw_crash["."] in the crash. Adding that to the Painless filtering.
OK, at this point I have been able to successfully reindex and import all of the stage data into the new ops stage ES 5.3.3 cluster. I'm confident that if we run into similar issues with fields that we cannot handle when doing the prod import, we will be able to resolve them. I'm going to go ahead and start importing data from prod over the weekend.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.