Closed Bug 1426574 Opened 2 years ago Closed 9 months ago

Crash in OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::SparseBitmap::createBlock

Categories

(Core :: JavaScript Engine, defect, P1, critical)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox-esr60 64+ fixed
firefox59 --- wontfix
firefox60 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- fixed
firefox65 --- fixed

People

(Reporter: yoasif, Assigned: sfink)

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is
report bp-b242dca7-5f3b-4d81-a950-bee950171221.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll js::AutoEnterOOMUnsafeRegion::crash js/src/jscntxt.cpp:1651
1 xul.dll js::SparseBitmap::createBlock js/src/ds/Bitmap.cpp:35
2 xul.dll js::Atomize js/src/jsatom.cpp:491
3 xul.dll js::SavedStacks::getLocation js/src/vm/SavedStacks.cpp:1576
4 xul.dll js::SavedStacks::insertFrames js/src/vm/SavedStacks.cpp:1340
5 xul.dll js::SavedStacks::saveCurrentStack js/src/vm/SavedStacks.cpp:1175
6 xul.dll JS::CaptureCurrentStack js/src/jsapi.cpp:7651
7 xul.dll mozilla::dom::exceptions::CreateStack dom/bindings/Exceptions.cpp:681
8 xul.dll mozilla::dom::GetCurrentJSStack dom/bindings/Exceptions.cpp:231
9 xul.dll nsXPCComponents_Utils::ReportError js/xpconnect/src/XPCComponents.cpp:2143

=============================================================

138 crashes in the last week.
We probably can't just crash if we fail to grow the atoms table. It could be arbitrarily large.

Jan, what do you think?
Flags: needinfo?(jdemooij)
Priority: -- → P1
(In reply to Jason Orendorff [:jorendorff] from comment #2)
> We probably can't just crash if we fail to grow the atoms table. It could be
> arbitrarily large.
> 
> Jan, what do you think?

I think that would mean all markAtom calls become fallible and that might be complicated. Or we could split it in fallible vs infallible variants...
Flags: needinfo?(jdemooij)
OS: Windows 10 → Windows
(correction) Around May 8, which is when *60.0* shipped, the crash rate trippled within a couple weeks.
All the crash reports nbp and I looked at happened during atom creation, not GC mark phase. So I think we could make this crash rare by adding a version of SparseBitmap::setBit that is fallible (does not crash on OOM), and somehow using that here: https://searchfox.org/mozilla-central/source/js/src/vm/JSAtom.cpp#689

I think this would work because

- if the call succeeds there, then whatever later uses the bit (e.g. GC marking) will find it already allocated.

- if the call fails there, we can handle it gracefully, and the Atom will never be marked... unless we later try again to create the same atom, and find it already in the AtomsTable. We could fix this, too, by removing the new atom from the AtomsTable on OOM.
Steve, do you think the solution described in comment 6 would work?
Flags: needinfo?(sphink)
Sounds sensible to me, but I think Jon's take on this is important since he's more familiar with the ins and outs of atom marking.
Flags: needinfo?(sphink) → needinfo?(jcoppeard)
(In reply to Steve Fink [:sfink] [:s:] from comment #8)
I agree.  We should make a fallible version of inlineMarkAtom() and use it in AtomizeAndCopyChars().
Flags: needinfo?(jcoppeard)
Found while scanning post-PTO bugmail, and implemented in a mindless fashion.
Attachment #9026802 - Flags: review?(jcoppeard)
Assignee: nobody → sphink
Status: NEW → ASSIGNED
Comment on attachment 9026802 [details] [diff] [review]
Use fallible bitmap ops for AtomizeAndCopyChars

Review of attachment 9026802 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.
Attachment #9026802 - Flags: review?(jcoppeard) → review+
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e8eb60450f8c
Use fallible bitmap ops for AtomizeAndCopyChars, r=jonco
https://hg.mozilla.org/mozilla-central/rev/e8eb60450f8c
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
This looks pretty high-volume on release & esr60. How scary do you think uplifting this to Beta/ESR60 would be, Steve? Note that it'd need a rebased patch for ESR60 (Beta grafts cleanly).
Comment on attachment 9026802 [details] [diff] [review]
Use fallible bitmap ops for AtomizeAndCopyChars

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: Bug 1341006

User impact if declined: Definite OOM crash instead of a probable (slightly later) OOM crash in non-debug builds. This introduces noise for testing, and adds some OOM crashes that could potentially have been handled. (It violates the usual expectation for JS engine OOM handling; it crashes when we OOM on potentially large allocations heavily dependent on content behavior.)

Is this code covered by automated tests?: Yes

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): There was already handling for failing here, it just wasn't plumbed through all the way.

String changes made/needed: none
Flags: needinfo?(sphink)
Attachment #9026802 - Flags: approval-mozilla-beta?
Comment on attachment 9027803 [details] [diff] [review]
Use fallible bitmap ops for AtomizeAndCopyChars,  (esr60 backport)

[ESR Uplift Approval Request]

If this is not a sec:{high,crit} bug, please state case for ESR consideration: It's showing up with a fairly large number of crashes, apparently. This patch gives a greater chance of properly handling an OOM, especially since the crash is dependent on content behavior.

User impact if declined: Some number of avoidable OOM crashes.

Fix Landed on Version: 65

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): There was already OOM error handling here, this just catches another case that previously forced a crash.

String or UUID changes made by this patch: none
Attachment #9027803 - Flags: approval-mozilla-esr60?
Comment on attachment 9026802 [details] [diff] [review]
Use fallible bitmap ops for AtomizeAndCopyChars

Fix for high volume crash, seems worth the potential risk.
Attachment #9026802 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9027803 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
You need to log in before you can comment on or make changes to this bug.