Flag JS out-of-memory crashes as OOM
Categories
(Socorro :: Signature, task, P2)
Tracking
(Not tracked)
People
(Reporter: gsvelto, Assigned: willkg)
References
Details
Attachments
(2 files)
I noticed that some crashes are caused by the JS engine running out of memory, these often aren't reported as OOM but maybe they should. In particular crashes that have the JSLargeAllocationFailure annotation set, and the ones with the JSOutOfMemory annotation set to "Reporting" or "Reported".
Nicholas, am I interpreting those crash annotations correctly?
Reporter | ||
Comment 1•3 years ago
|
||
This crash is an example.
Comment 2•3 years ago
|
||
Looking at the code, we seems to have multiple OOMState:
// We are currently reporting the given condition.
//
// Suppose a crash report contains "JSLargeAllocationFailure:
// Reporting". This means we crashed while executing memory-pressure
// observers, trying to shake loose some memory. The large allocation in
// question did not return null: it is still on the stack. Had we not
// crashed, it would have been retried.
Reporting,
// The condition has been reported since the last GC.
//
// If a crash report contains "JSOutOfMemory: Reported", that means a small
// allocation failed, and then we crashed, probably due to buggy
// error-handling code that ran after allocation returned null.
//
// This contrasts with "Reporting" which means that no error-handling code
// had executed yet.
Reported,
The Reported
case seems to be a case where failure from the JS engine are not processed properly.
(In reply to Gabriele Svelto [:gsvelto] from comment #1)
This crash is an example.
In this particular case, the problems comes from RemoteObjectProxyBase::GetOrCreateProxyObject
which does not check the error reporting from the JSContext
, and simply return without setting aNewObjectCreated
, which does not carry the failure condition.
I do not know much about the DOM to answer more on the error handling.
One thing we could consider in the JS engine is to make more API based on mozilla::Result
, to carry the OOM
errors this way.
Reporter | ||
Comment 3•3 years ago
|
||
Thanks for the detailed explanation, so the JSLargeAllocationFailure: Reporting
case sounds like a genuine OOM, while JSLargeAllocationFailure: Reported
is a different kind of bug that should be fixed first, right?
Comment 4•3 years ago
|
||
This is what I understand from the comment. Yes.
Assignee | ||
Comment 5•3 years ago
|
||
I can't tell if this is something we want to fix in Socorro or somewhere else. If it's in Socorro, what do we want to do about it?
Reporter | ||
Comment 6•3 years ago
|
||
The idea is to add the OOM | large
prefix to crash reports that have the JSLargeAllocationFailure
annotation set to Reporting
. Those should be out-of-memory crashes that haven't been flagged as such correctly (i.e. they don't also have the OOMAllocationSize
annotation populated). That being said since that annotation is not indexed I'm not sure if this is a common occurrence or not.
Assignee | ||
Comment 7•3 years ago
|
||
Grabbing this to do this week if I can.
Assignee | ||
Comment 8•3 years ago
|
||
Here's an example crash report: bp-da370cb0-2395-4587-a897-6b3250221205
Assignee | ||
Comment 9•3 years ago
|
||
Assignee | ||
Comment 10•3 years ago
|
||
Assignee | ||
Comment 11•3 years ago
|
||
This was deployed to prod just now in bug #1804294, but I'm not sure what's going on since it didn't seem to change the signature.
Oh, I did "Reported" and not "Reporting". That's a problem. There are probably others. I'll look int oit.
Assignee | ||
Comment 12•3 years ago
|
||
I looked through a few weeks of reports and can't find any examples where JSLargeAllocationFailure=Reporting. I added it to the search index, so I can fix the code now and verify it when we get a crash report that meets the criteria.
Assignee | ||
Comment 13•3 years ago
|
||
Assignee | ||
Comment 14•3 years ago
|
||
Assignee | ||
Comment 15•2 years ago
|
||
The last change was pushed to production in bug #1804294.
I checked Crash Stats:
Looks like they're getting marked as OOM. Given that I think this is working.
Description
•