Closed Bug 556690 Opened 14 years ago Closed 14 years ago

Socorro Processor cascade failure

Categories

(Socorro :: General, task)

x86
Linux
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lars, Unassigned)

References

Details

Attachments

(1 file)

Bug 556679 - we noticed that processors failing

After aravind used the logs to get me an offending crash, I discovered a new type of failure by minidump_stackwalk.  It was returning some garbage outside of ASCII range.  The processor was unable to parse it and logged it.  Then it tried to log the error into the database in the 'processor_notes' field.  This failed at the database because the characters were not UTF-8.  This raised another exception, which the processor caught and tried to log to the database, which failed.  Running out of error handlers, the processor terminated the thread that was experiencing this cascade of errors.  There never has been code to replace failed threads in the processor.  After all the threads encountered these crashes, the processors were rendered inert.

The quick patch was to not try to quote the failed output in the 'processor_notes'.  There may be a better solution.

This fix _must_ be propagated on to version 1.6
Severity: normal → blocker
Target Milestone: --- → 1.6
Can you get me a copy of the offending dump?
This is the dump that aravind sent.  I used it to reproduce the problem with the processor.  The processor reports that it is unable to parse an output line and then quotes it as, "w���".  Just ask if you want the corresponding json.
revision 1922 on trunk resolves this problem in this manner:

On reading lines from the output of minidump_stackwalk, the processor immediately tries to convert the line to UTF-8.  If that fails, it catches the exception and logs the error using the exceptions own seven bit ascii translation of the offending character as the text of the log.  Processor then throws the line away to continue to the next one.  

This technique insures that no debug information is lost and the processor can go on to try to make a signature.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Interestingly, my fix works great on my own system and on khan, but fails on staging.   It is easy to see why it fails on staging, the more difficult problem is why it succeeds on the other two systems...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 556888
what a nightmare this has become.  The problem in Comment #4 is from using different versions of the Python module, 'simplejson'.  It appears that behavior of that module changed over several versions in its response to malformed (neither ascii nor utf-8) characters.

The Socorro processor has now been changed to escape in a hex pattern (\xff) characters that are received that are higher than chr(127).  This means that processor is writing correct json.

Unfortunately, while the json is correct, the form for the module section of the minidump-stackwalk is not correct.  The Socorro UI appears to not be able to handle it and fails. see Bug 556888
fixed and awaiting deployment in 1.6
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: