All users were logged out of Bugzilla on October 13th, 2018

All Socorro Processors failing on memory error

RESOLVED FIXED

Status

RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: lars, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
At approximately 7:30am PST, all the Socorro processors died within moments of each other.  Investigation showed that the python module ujson was core dumping and killing the entire processor process.  For a reason that is not clear, the automatic restart was failing.  Interestingly, while the problem is recurring every few minutes on all of the processor, the automatic restart is being successful 95% of the time.

The problem appears to happen in the processor rule "OutOfMemoryBinaryRule" when it attempts to read the json "memory report" submitted by the client.  ujson raises an unrecoverable "double free" or "memory corruption" error.  

There is no history of this problem happening in the past.  It began suddenly and continues to recur every few minutes. It is not the same single crash repeating over and over, each crash that triggers the problem is new.  Something changed on the Web that induces a crash in Firefox that in turn induces a crash in ujson, which brings the processors down.  

Experimenting with a workaround, substituting the 'json' module for 'json' forestalls the problem entirely.  PR pending...

collectors and crashmovers are not affected.  No crashes are being lost.
(Reporter)

Comment 1

3 years ago
Interestingly, the problem ceased almost exactly at noon, 12pm DST 2015-12-18.
(Reporter)

Comment 2

3 years ago
and then it came back for a couple hours on 12/28
   3 processors died of it
   7 recovered but gained no immunity
How can we get access to these kinds of blobs for local testing/debugging?
By the way we're running a release of ujson from April 2014. https://bugzilla.mozilla.org/show_bug.cgi?id=1237386
Upgrading ujson is unlikely to solve this. We still need something to reproduce against. 

But I want to connect the bugs. Which we might want to re-evaluate later.
Depends on: 1237386
It happened again. ujson 1.35 "solved it" but we don't want 1gb processed crash data in so we filed https://bugzilla.mozilla.org/show_bug.cgi?id=1248610 which took care of it.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.