Closed Bug 1362434 Opened 5 years ago Closed 4 years ago
Send ALL processor exceptions to Sentry
We have Raven installed and working in the processor thanks to https://github.com/mozilla/socorro/commit/02ed76354658367fe17766da7382ad9d476874f7 but that commit message is just not accurate. We do get Sentry errors from the processor (in stage and prod) but they're only around the crash storage piece. Earlier this week we had unexpected exceptions within a processor rule (either in the _predicate() or the _action() call) and nobody noticed until, by chance, I watched the /var/log/messages on one of the processor prod nodes. * Will's point is that much of the code within the processor rules system is NOT ready to be pointed to something like Sentry. The code is too OK with exceptions happening. (Which is odd in my view because if it's OK that exceptions happen, then why bother to log them?) * We might get a crazy flood of Sentry exceptions if we start injecting Raven in the processor rule execution loop. * Sentry is pretty good at just incrementing a counter. * Generally, as a team, we prefer to surface errors and deal with them rather than relying on logging. If an error isn't important then the code isn't important. * Most likely, Processor 2017 will be done very differently and have a different approach to exception handling. A better one. But Processor 2017 is many months away. * This week we had a terrible exception happening in processors that required quick feet and a slightly rushed afternoon production hotfix.
By the way, later that same day I found another exception repeatedly happening in the processor logs. https://bugzilla.mozilla.org/show_bug.cgi?id=1362125 This one might be an example of something that's NOT worth correcting. Perhaps it's fine to fail the processing if the json_dump doesn't have a `crashing_thread` key. Who knows?
PR https://github.com/mozilla/socorro/pull/3762 See https://github.com/mozilla/socorro/pull/3762#issuecomment-300297587 for why there's more work to do on this topic.
Commit pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/60f074c1bec1c2357e2345266406e0d98febcdaa bug 1362434 - send processor exceptions to sentry (#3762) * bug 1362434 - send processor exceptions to sentry * addressing review comments
In Comment #2 I pointed out at least two more places where exceptions are swallowed. Comment #3 is just one of these "holes" fixed. I'm not intending to own this bug any more for the other places in the processor. Perhaps someone else takes over or we decide to not bother more until we re-write the processor.
Assignee: peterbe → nobody
I redid Rule in bug #1505233 so it's not doing any error handling at all anymore and instead Processor2015.process_crash is now doing the error handling. When a rule throws an error--regardless of whether it's in predicate or action--it will send an item to sentry and add a note to the processor notes in the processed crash. This lets us reprocess crashes that were affected. Previously rules sent errors to sentry, but it was a lot of code to do it and it wasn't clear if there were cases where errors might slip through. Comment #2 points out these two places where errors might get swallowed: 1. https://github.com/mozilla-services/socorro/blob/fc6841672e8179cd6721a52cdc1a34d233338bf2/socorro/processor/processor_2015.py#L243-L252 2. https://github.com/mozilla-services/socorro/blob/fc6841672e8179cd6721a52cdc1a34d233338bf2/socorro/processor/processor_app.py#L103-L113 In redoing Rule and moving the error handling to Processor2015.process_crash, I covered the first one. The second one is covered by bug #1426257. I think we should work on that there, so I'm going to have this bug depend on that one.
I looked into bug #1426257 and that's now resolved. I think this is all set now. Yay!
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.