crash in OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured

RESOLVED WONTFIX

Status

()

defect
--
critical
RESOLVED WONTFIX
3 years ago
Last month

People

(Reporter: khuey, Unassigned)

Tracking

({crash, topcrash})

unspecified
x86
Windows NT
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox46- wontfix, firefox47 wontfix, firefox48 wontfix, firefox49 wontfix, firefox-esr45 affected, firefox-esr52 affected, firefox53 wontfix, firefox54 affected, firefox55- affected, firefox59 affected)

Details

(Whiteboard: [tbird crash][OOM | small - see comment 6], crash signature)

Attachments

(2 attachments)

This bug was filed from the Socorro interface and is 
report bp-79e22ac7-ecc1-4270-8e0b-3c9322160312.
=============================================================

Waldo tells me this is OOM.  Gecko has a way to annotate crash reports as OOMs, we might need to add a callback for Spidermonkey to call back into Gecko to do that.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #0)
> Waldo tells me this is OOM.  Gecko has a way to annotate crash reports as
> OOMs, we might need to add a callback for Spidermonkey to call back into
> Gecko to do that.

We can easily get it to be represented as an "OOM | unknown | ..." by a small Socorro change. If we want to have small allocations (general OOM that we can't fix in that specific place) and large allocations (which we should make fallible and fail gracefully on) differentiated, we'll need a OOMAllocationSize annotation on the crash though.
Depends on: 1257572
[Tracking Requested - why for this release]:
#8 overall top crasher in beta 46, untracked and unassigned?
Component: JavaScript Engine → JavaScript: GC
I looked at about 6 of these crashes, and while the memory usage looked pretty high (around 70 to 90%), there seemed to often be an okay amount of memory, so maybe it would be worth having Terrence at least take a glance at it. Also, it looks like that about half of these crashes (56%) are on Fennec.
See Also: → 1233481
Terrence, can you take a look a this? Its our #8 top crash
Flags: needinfo?(terrence)
See Also: → 1258745
(In reply to Andrew McCreight [:mccr8] from comment #4)
> I looked at about 6 of these crashes, and while the memory usage looked
> pretty high (around 70 to 90%), there seemed to often be an okay amount of
> memory, so maybe it would be worth having Terrence at least take a glance at
> it. Also, it looks like that about half of these crashes (56%) are on Fennec.

This particular allocation is for a GC thing, so we needed a new backing chunk. It would have required 1MiB of continuous, aligned space. We generally start having trouble finding such spans when the user still has 10-20% of free space remaining. This is in line with our expectations for an OOM|small in this location.
Flags: needinfo?(terrence)
Crash Signature: [@ js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured] → [@ js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured] [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured ]
Blocks: e10s-oom
Duplicate of this bug: 1260453
Crash Signature: [@ js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured] [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured ] → [@ js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured] [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured ] [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::traverse…
Is this an increase in crashes for fennec recently (for 45/46)?  
Is anything actionable here?
Duplicate of this bug: 1260470
Crash Signature: js::TenuringTracer::traverse<T>] → js::TenuringTracer::traverse<T>] [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::gc::StoreBuffer::MonoTypeBuffer<T>::trace]
Summary: crash in js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured → crash in OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured
From talking with Andrew sounds like we don't have enough info on this issue and the signature has been around for a while.    It could be something like video using too much memory or a particular website leaking memory.     Untracking for 46. 

Margaret or snorp, do you want to take a closer look at the fennec crashes or are you already aware of them?
Flags: needinfo?(snorp)
Flags: needinfo?(margaret.leibovic)
Crash Signature: js::TenuringTracer::traverse<T>] [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::gc::StoreBuffer::MonoTypeBuffer<T>::trace] → js::TenuringTracer::traverse<T>] [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::gc::StoreBuffer::MonoTypeBuffer<T>::trace] [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::traceSlots]
Redirecting to kbrosnan to help prioritize this for Android.
Flags: needinfo?(margaret.leibovic) → needinfo?(kbrosnan)
Assignee: nobody → kchen
Flags: needinfo?(snorp)
Flags: needinfo?(kbrosnan)
It would be helpful if we could get a proper OOM signature annotation on this.
Whiteboard: OOM small - see comment 6
Terrence, if we want to annotate the allocation size via js::AutoEnterOOMUnsafeRegion::crash, can I get the data in AutoEnterOOMUnsafeRegion or from the call site?
Flags: needinfo?(terrence)
Not generally, no, or we would have already done so.
Flags: needinfo?(terrence)
So Terrence, is there something we can do here to mitigate this specific oom, or should we rely on reducing the oom problem in general to help with this?
Flags: needinfo?(terrence)
(In reply to Jim Mathies [:jimm] from comment #15)
> So Terrence, is there something we can do here to mitigate this specific
> oom, or should we rely on reducing the oom problem in general to help with
> this?

I don't really understand the question? The allocator can't really do anything to control the allocation patterns of the code running on it.

The attached patch annotates the 3 largest sites we actually have a plausible size for.
Flags: needinfo?(terrence)
Attachment #8744446 - Flags: review?(jcoppeard)
Comment on attachment 8744446 [details] [diff] [review]
annotate_allocation_failures_where_possible-v0.diff

Review of attachment 8744446 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jscntxt.cpp
@@ +1246,5 @@
> +        annotateOOMSizeCallback(size);
> +    char msgbuf[1024];
> +    JS_snprintf(msgbuf, sizeof(msgbuf), "[unhandlable oom] %s", reason);
> +    MOZ_ReportAssertionFailure(msgbuf, __FILE__, __LINE__);
> +    MOZ_CRASH();

Let's just call the other version of crash() here and merge the tails.
Attachment #8744446 - Flags: review?(jcoppeard) → review+
There has been a spike of these on Nightly recently:

- Nightly 20160420030213: 0
- Nightly 20160421030302: 0
- Nightly 20160422030223: 16
- Nightly 20160423030220: 13
- Nightly 20160424030601: 0
- Nightly 20160425030548: 5
Assignee: kchen → nobody
In release, the crash with signature "OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured" is spiking since 2016-05-03 (increased by ~1458% until 2016-05-08) and is #2 in top-crashes for 46.0.1.
Hi, guys.
Trying to reproduce the steps for bug 1271580 I have come across the same Crash Signature as the one in this bug.

Steps to reproduce:
1. Start with a clean profile.
2. Open multiple tabs from another app. (i.e. Gmail).

Expected: All tabs load and work properly.

Actual: A crash occurs.

https://crash-stats.mozilla.com/report/index/aa7c7279-8bcf-4adf-9098-8eeeb2160510
This crash signature is just a symptom of running out of memory.
Duplicate of this bug: 1270298
Depends on: 1270298
Hi Andrew, Terrence, should we review crash reports on Nightly since the commit on m-c (comment 27) to see if this can be root caused? I am just wondering if there is a speculative fix that is worth uplifting to Beta47 (we are getting close to the finish line for 47 release). Thanks!
Flags: needinfo?(terrence)
Flags: needinfo?(continuation)
This is just going to tell us how large the memory allocation is. I don't think it will really help us fix anything.

Also, Terrence, this is only going to work on the main thread, not on workers. You should put the callback registration in CycleCollectedJSRuntime, not XPCJSRuntime.
Flags: needinfo?(continuation)
No longer blocks: e10s-oom
Adding [@ OOM | large | js::AutoEnterOOMUnsafeRegion::crash | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::traverse<T>] signature since this showed up on the Beta 2 crash stats as #9.
Crash Signature: js::TenuringTracer::traverse<T>] [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::gc::StoreBuffer::MonoTypeBuffer<T>::trace] [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::traceSlots] → js::TenuringTracer::traverse<T>] [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::gc::StoreBuffer::MonoTypeBuffer<T>::trace] [@ OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::traceSlots] [@ OOM | large | js::AutoEn…
Would any of these keywords also apply here?

csectype-oom
csectype-uninitialized
footprint
mlk
This is not a security issue, so the csectype- flags do not apply. "footprint" and "mlk" don't really apply either, because this isn't a particular leak, just a side effect of running low on memory. I'm not sure how much people look at those two flags anyways.
Crash Signature: js::AutoEnterOOMUnsafeRegion::crash | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::traverse<T>] → js::AutoEnterOOMUnsafeRegion::crash | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::traverse<T>] [@ OOM | large | js::AutoEnterOOMUnsafeRegion::crash | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured]
Here is the move you requested: I should have asked you for review on this in the first place.
Flags: needinfo?(terrence)
Attachment #8789426 - Flags: review?(continuation)
Comment on attachment 8789426 [details] [diff] [review]
move_oom_callback_to_right_runtime-v0.diff

Review of attachment 8789426 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks.
Attachment #8789426 - Flags: review?(continuation) → review+
Hi Terrence, does this patch need to land in Nightly? I'd be happy to uplift to Fx50 if the crash data from Nightly51 looks promising after this gets committed to m-c.
Flags: needinfo?(terrence)
(In reply to Ritu Kothari (:ritu) from comment #36)
> Hi Terrence, does this patch need to land in Nightly? I'd be happy to uplift
> to Fx50 if the crash data from Nightly51 looks promising after this gets
> committed to m-c.

This is just a fix to the diagnostics. It will give us more information on OOM in workers. I'd say we should only uplift if nightly gives us some indication that this instrumentation is likely to be useful.
Flags: needinfo?(terrence)
Whiteboard: OOM small - see comment 6 → [tbird crash][OOM | small - see comment 6]
[Tracking Requested - why for this release]:

Signature> OOM | large | js::AutoEnterOOMUnsafeRegion::crash | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured


Top Crashers for Firefox 53.0.2
Top 50 Crashing Signatures. 7 days ago through 3 minutes ago. 

Top browser crasher #4.


Top Crashers for Firefox 54.0b
Top 50 Crashing Signatures. 7 days ago through 2 minutes ago. 

Top browser crasher #19.
See Also: → 1365207
not tracking for 55, this is a long standing issue with no obvious lead towards a fix as far as I can tell.
Maybe throw a bunch of interns at it, to get some fresh eyes on the bug.
(In reply to Julien Cristau [:jcristau] from comment #41)
This is caused by running out of memory.  There's not much we can do.
Duplicate of this bug: 1416276
Crash Signature: js::AutoEnterOOMUnsafeRegion::crash | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::traverse<T>] [@ OOM | large | js::AutoEnterOOMUnsafeRegion::crash | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured] → js::AutoEnterOOMUnsafeRegion::crash | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::traverse<T>] [@ OOM | large | js::AutoEnterOOMUnsafeRegion::crash | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured] [@ OOM | large …
This is the #2 crash in the 11-8 Nightly, which I think is a bad regression. There are hundreds of crashes. However, almost all of them are from a single installation. Hopefully this is just a problem with some VM running Nightly and going haywire.
Here is another report: https://crash-stats.mozilla.com/report/index/2e6d254e-0186-4c5e-918a-90e920180302#tab-details

With dozens of open tabs, Firefox uses around 3GB RAM. The other programs combined use less than 0.5GB. Yet for some reason, the Task Manager says >90% of RAM is being used, while it's more like 50% when I manually add the numbers.

Yesterday I reduced the size of PageFile from 9GB to 1GB. Today I opened dozens of YouTube tabs, caused a spike in RAM usage, and when I opened another tab it crashed.

The laptop has 8GB RAM (+1GB PageFile) in total. Never thought it wouldn't be enough for browsing.
I experienced this bug as a user.  Out of memory (OOM) with System Memory at 94% at time of crash.  Page file had only 30MB free at time of crash.  https://crash-stats.mozilla.com/report/index/e3d10009-890b-4d29-ba2d-732ac0180906#tab-details

The reason for the low page file free allocation, and therefore the reason for the crash, in my case and also in the above case, is that in both cases we manually changed the default Windows page file allocation strategy  (from the default windows automatically managed page file strategy)  to a manually set minimum and maximum page file size.

Turns out that by forcing Windows to have a maximum page file size, it's easy to run out of memory in certain situations, hence the crash.

Whereas leaving the setting as default will allow Windows to keep consuming more and more disk space, as much as needed, to allow for more memory allocation.

The other situation where this crash could occur is due to extreme low disk free space, whereby Windows can no longer allocate any more disk space to the page file, and simultaneously a low-memory situation occurs.

There's probably no remedial action required to fix this bug.  However if I can make a suggestion, a notice to the user indicating that crashes are occurring due to low memory, would help, as users may be able to find ways to make more memory available (e.g. changing their Page File strategy / freeing up disk space).

--
Off topic:  my Firefox uses around 400MB of memory with only a couple of plugins enabled and no webpages open.  After opening 6 tabs (bugzilla and a few very basic websites), Firefox is consuming 800MB of memory (from Task Manager's "Processes" tab).
Duplicate of this bug: 1026545

The leave-open keyword is there and there is no activity for 6 months.
:jonco, maybe it's time to close this bug?

Flags: needinfo?(jcoppeard)

This is an OOM crash.

Status: NEW → RESOLVED
Closed: Last month
Flags: needinfo?(jcoppeard)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.