Crash in OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::jit::TempAllocator::allocate

RESOLVED FIXED in Firefox 65

Status

()

defect
P2
critical
RESOLVED FIXED
9 months ago
8 months ago

People

(Reporter: marcia, Assigned: iain)

Tracking

({crash, regression})

Trunk
mozilla65
Unspecified
Windows 10
Points:
---

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox63 unaffected, firefox64 unaffected, firefox65 fixed)

Details

(crash signature)

Attachments

(1 attachment, 1 obsolete attachment)

This bug was filed from the Socorro interface and is
report bp-ca3a53cd-ca98-480b-86ad-5fb480181031.
=============================================================

Seen while looking at nightly crash stats: https://bit.ly/2Q8bEI4. Crashes started using 20181030224027. Two similar signatures. 

Possible regression range based on build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f7a97b344fa59bd3b01ea81ebd5b150aa63bfb12&tochange=be32f4014f92d0ab717621997e0d36c9bc1c479b

Looks as if code was touched in Bug 1499607. ni on :iain

Top 10 frames of crashing thread:

0 xul.dll js::AutoEnterOOMUnsafeRegion::crash js/src/vm/JSContext.cpp:1661
1 xul.dll js::jit::TempAllocator::allocate js/src/jit/JitAllocPolicy.h:62
2 xul.dll js::jit::BacktrackingAllocator::buildLivenessInfo js/src/jit/BacktrackingAllocator.cpp:787
3 xul.dll js::jit::BacktrackingAllocator::go js/src/jit/BacktrackingAllocator.cpp:902
4 xul.dll js::jit::GenerateLIR js/src/jit/Ion.cpp:1814
5 xul.dll js::jit::CompileBackEnd js/src/jit/Ion.cpp:1889
6 xul.dll js::HelperThread::handleIonWorkload js/src/vm/HelperThreads.cpp:2199
7 xul.dll js::HelperThread::threadLoop js/src/vm/HelperThreads.cpp:2643
8 xul.dll static unsigned int js::detail::ThreadTrampoline<void  js/src/threading/Thread.h:236
9 ucrtbase.dll thread_start<unsigned int > 

=============================================================
Flags: needinfo?(iireland)
https://bugzilla.mozilla.org/show_bug.cgi?id=1499607#c8

"Ted and I agree that this is too rare a situation to be worried about."

Oops. Hitting this so quickly after delivering the code kind of implies that we underestimated the frequency of this.

This is kind of gross. We are hitting the narrow window where a big allocation has succeeded but eaten up the last of our memory, and now we can't replenish our ballast. In an ideal world, this OOMUnsafeRegion wouldn't kick in here, because we are about to immediately fail the compilation. The problem is that we have no way of knowing that, and I really don't want to add even more complexity to this situation by introducing a new AutoMaySwallowAnOOMFailureAndWaitToPropagateItUntilLaterBecauseIAmFullOfSpite object.

We'll have to think about the right fix here.
Assignee: nobody → iireland
Flags: needinfo?(iireland)
I think I've come up with a better solution.

Instead of allocating alloc_size bytes out of our ballast and then trying to replenish it back up to BallastSize, we should start by ensuring that we have |BallastSize + alloc_size| bytes, and then allocating out of the larger ballast. The advantage of this approach is that we always leave ourselves in a consistent state:
1. If the ensureBallast succeeds, then we have enough ballast for our fallible allocation, plus a full ballast left over.
2. If the ensureBallast fails, then our fallible allocation fails, but we have not decreased the size of our ballast, so we won't cause any "infallible" allocations to fail.

This should eliminate the crashes that we're seeing from the last change. It might also eliminate some crashes without clear signatures that were hitting the problem we were trying to fix.
Attachment #9022752 - Flags: feedback?(nicolas.b.pierron)
In review, nbp pointed out a corner case caused by LifoAlloc fragmentation.

The general approach stills seems correct; we're just doing it in the wrong place. The right solution is to stop trying to find a sequence of calls that will convince the LifoAlloc to do the right thing, and instead to move this code into the LifoAlloc itself. This should be part of the LifoAlloc API anyway: if LifoAlloc is going to support both fallible and infallible allocation, then it should provide a safe and ergonomic interface to do so.

This patch adds |allocEnsureUnused|, which allocates N bytes iff it can guarantee that M bytes will be available afterwards. If it can't make that guarantee, then the allocation will fail without touching the unused memory.
Attachment #9022752 - Attachment is obsolete: true
Attachment #9022752 - Flags: feedback?(nicolas.b.pierron)
Attachment #9023630 - Flags: feedback?(tcampbell)
Priority: -- → P2
Attachment #9023630 - Flags: feedback?(tcampbell) → feedback+
The last patch proposal is suggesting to add an allocEnsureUnused, and while reviewing this made me wonder.

Why shouldn't the ballast be part of the LifoAlloc configuration?

In the LifoAlloc we could always ensure that every fallible allocation are reserving ballast space for the upcoming infallible allocations?  Not doing so with ensureUnusedApproximate but with a simple check that we either have enough space or any chunk in the unused list.  This only drawback would be for all LifoAlloc uses which are always fallible.
Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a668206f59fb
Implement LifoAlloc::allocEnsureUnused r=nbp,tcampbell
(In reply to Nicolas B. Pierron [:nbp] from comment #6) 
> Why shouldn't the ballast be part of the LifoAlloc configuration?

Sounds reasonable to me.
https://hg.mozilla.org/mozilla-central/rev/a668206f59fb
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
(In reply to Nicolas B. Pierron [:nbp] from comment #6)
> This only drawback would be for all LifoAlloc uses which are always fallible.

If there are uses like that, would it make sense to add an opt-out that statically disallows infallible allocation? (e.g. with a template parameter and some static asserts in the infallible allocation functions)
You need to log in before you can comment on or make changes to this bug.