Bug 810313 describes a crash reprocessing scenario involving 380K crashes. 19K of those crashes failed to get reprocessed. I reprocessed them in batches on Saturday evening. On reprocessing these 19K crashes 81 failed to reprocess. I traced that to a temp failure of HBase that was not a connection failure. For a period of a few moments during processing, the HBase Python library was returning KeyErrors on the trying to read the binary dump from HBase. Since the Exception is not retriable IO error, the processor logs it in the processor notes and moves on to the next crash. I sent these through reprocessing again and this time HBase had no trouble finding the binary dumps and raised no KeyError exceptions. The Python HBase client library shouldn't remap whatever thrift error it was getting to a python KeyError. It loses its context and the processor can't differentiate from a retriable operational error and a non-retriable programming error. Though, I must admit, who would have expected a missing key would be something that one would resolve by just trying again later.
This problem appears to no longer happen with our re-written HBase client library.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.