Assertion failure: [unhandlable oom] preparing weak cache sweeping task list

RESOLVED FIXED in Firefox 48

Status

()

Core
JavaScript: GC
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: Tomcat, Assigned: jonco)

Tracking

(Blocks: 1 bug, {crash})

unspecified
mozilla50
crash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox48+ fixed, firefox49+ fixed, firefox50+ fixed)

Details

(URL)

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

2 years ago
In bughunter we discovered opt crashes like :

Assertion failure: [unhandlable oom] preparing weak cache sweeping task list

with signatures like js::AutoEnterOOMUnsafeRegion::crash js::gc::GCRuntime::beginSweepingZoneGroup JSScript::traceChildren js::gcstats::AutoPhase::~AutoPhase js::gc::GCRuntime::endMarkingZoneGroup

It seems that such crashes also happen in firefox beta opt. One url where this happend seems to be was http://wilmas.zoomin.se/blogg/allmant.html
(Reporter)

Comment 1

2 years ago
Created attachment 8762814 [details]
bughunter crash report
Component: JavaScript Engine → JavaScript: GC
:jorendorff mentions to needinfo? :terrence as a start.
Flags: needinfo?(terrence)
(Reporter)

Comment 3

2 years ago
bc: can we extract a list of urls where this crash/oom happens
Flags: needinfo?(bob)

Comment 4

2 years ago
You can get the list if you click on the fatal message, but I have also extracted the urls from the db directly. They are all very slow loading since they use a large amount of ram and I haven't finished reviewing them for nsfw content. If you want, I'll send them to you via email but talk to me before sharing.

fwiw, I haven't been able to reproduce so for on win 7 32bit and nightly debug build.
Flags: needinfo?(bob)

Comment 5

2 years ago
Created attachment 8763001 [details]
test-crash-on-load.js

This appears to require Spider and the userhook. It also occurs in an opt build which I don't understand. This does not occur in Firefox 47 but does in Firefox Beta 48 and Firefox Nightly 50.

One of the urls reproduces on a Windows 7 32bit vm with 2G ram with the following:

firefox -spider -start -quit -hook http://sisyphus.bughunter.ateam.phx1.mozilla.com/bughunter/media/userhooks/test-crash-on-load.js -url http://www.casadellibro.com/comunidad/lo-que-vendo/srcualquiera/7532498

You may need to repeat this several times to get it to reproduce.

Get Spider from http://bclary.com/projects/spider/spider-0.1.0.3-sm+tb+fx+an+fn.xpi and the user hook is attached.
(Assignee)

Comment 6

2 years ago
Created attachment 8763051 [details] [diff] [review]
bug1280132-sweep-oom

Patch to check for allocation failure while building the list of weak caches to sweep and sweep these on the main thread in that case.

I changed the vector to use SystemAllocPolicy from the default of mozilla::MallocAllocPolicy because this supports simulated OOM testing.  This is now exercised by our current tests.
Assignee: nobody → jcoppeard
Attachment #8763051 - Flags: review?(terrence)
(Assignee)

Updated

2 years ago
Flags: needinfo?(terrence)

Updated

2 years ago
status-firefox48: --- → affected
status-firefox49: --- → affected
status-firefox50: --- → affected
(Reporter)

Comment 7

2 years ago
[Tracking Requested - why for this release]:

Bughunter OOM crash
tracking-firefox48: --- → ?
tracking-firefox49: --- → ?
tracking-firefox50: --- → ?
Comment on attachment 8763051 [details] [diff] [review]
bug1280132-sweep-oom

Review of attachment 8763051 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to see if the method I suggested works first.

::: js/src/jsgc.cpp
@@ +4967,5 @@
>      SweepCCWrappersTask sweepCCWrappersTask(rt);
>      SweepObjectGroupsTask sweepObjectGroupsTask(rt);
>      SweepRegExpsTask sweepRegExpsTask(rt);
>      SweepMiscTask sweepMiscTask(rt);
> +    bool haveSweepCacheTasks = true;

Ugh, passing around a bool is a pretty awful code smell. How about we split this out into a separate method like:

using WeakCacheTaskVector = mozilla::Vector<SweepWeakCacheTask, 0, SystemAllocPolicy>;

WeakCacheTaskVector
GCRuntime::prepareToSweepWeakCaches()
{
    WeakCacheTaskVector out;
    for (GCZoneGroupIter zone(rt); !zone.done(); zone.next()) {
        for (JS::WeakCache<void*>* cache : zone->weakCaches_) {
            if (!out.append(SweepWeakCacheTask(rt, *cache))) {
                out.clear();
                goto oom;
            }
        }
    }
    return out;

  oom:
    for (GCZoneGroupIter zone(rt); !zone.done(); zone.next()) {
        for (JS::WeakCache<void*>* cache : zone->weakCaches_) {
            SweepWeakCacheTask task(rt, *cache);
            task.runFromMainThread(rt);
        }
    }
    return WeakCacheTaskVector();
}
Attachment #8763051 - Flags: review?(terrence)
You could avoid the goto with something like

WeakCacheTaskVector
GCRuntime::prepareToSweepWeakCaches()
{
    WeakCacheTaskVector out;
    auto foregroundSweepOnOOM = [] {
        for (GCZoneGroupIter zone(rt); !zone.done(); zone.next()) {
            for (JS::WeakCache<void*>* cache : zone->weakCaches_) {
                SweepWeakCacheTask task(rt, *cache);
                task.runFromMainThread(rt);
            }
        }
    };

    for (GCZoneGroupIter zone(rt); !zone.done(); zone.next()) {
        for (JS::WeakCache<void*>* cache : zone->weakCaches_) {
            if (!out.append(SweepWeakCacheTask(rt, *cache))) {
                out.clear();
                foregroundSweepOnOOM();
                return out;
            }
        }
    }
    return out;
}

but I admit it does put the error handling first, which is kind of weird.
(Assignee)

Comment 10

2 years ago
Created attachment 8763829 [details] [diff] [review]
bug1280132-sweep-oom v2

> passing around a bool is a pretty awful code smell

Agreed.  Thanks for the suggestions.  Hopefully this is better.
Attachment #8763051 - Attachment is obsolete: true
Attachment #8763829 - Flags: review?(terrence)
Comment on attachment 8763829 [details] [diff] [review]
bug1280132-sweep-oom v2

Review of attachment 8763829 [details] [diff] [review]:
-----------------------------------------------------------------

Much nicer!
Attachment #8763829 - Flags: review?(terrence) → review+
Duplicate of this bug: 1281198

Comment 13

2 years ago
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b29c44f5d909
Add fallback path in case we hit OOM while creating the list of weak cache sweeping tasks r=terrence
(Reporter)

Comment 14

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/b29c44f5d909
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox50: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Is this a fix you would want to uplift, or would it be better to keep this fix on 50? 
We certainly would love to bring down the number of OOM crashes, but I can't judge the risk vs. possible impact from the comments here. Jon, terrence, what do you think?
tracking-firefox49: ? → +
tracking-firefox50: ? → +
Flags: needinfo?(terrence)
Flags: needinfo?(jcoppeard)
(Assignee)

Comment 16

2 years ago
I think this is low risk, so let's uplift.
Blocks: 1263769
Flags: needinfo?(terrence)
Flags: needinfo?(jcoppeard)
(Assignee)

Comment 17

2 years ago
Comment on attachment 8763829 [details] [diff] [review]
bug1280132-sweep-oom v2

Approval Request Comment
[Feature/regressing bug #]: Bug 1263769.
[User impact if declined]: Possible OOM crash.
[Describe test coverage new/current, TreeHerder]: On m-c for the last week.
[Risks and why]: Low.
[String/UUID change made/needed]: None.
Attachment #8763829 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 18

2 years ago
Created attachment 8766707 [details] [diff] [review]
bug1280132-beta

Approval Request Comment
[Feature/regressing bug #]: Bug 1263769.
[User impact if declined]: Possible OOM crash.
[Describe test coverage new/current, TreeHerder]: On m-c for the last week.
[Risks and why]: Low.
[String/UUID change made/needed]: None.
Attachment #8766707 - Flags: review+
Attachment #8766707 - Flags: approval-mozilla-beta?
Comment on attachment 8763829 [details] [diff] [review]
bug1280132-sweep-oom v2

Potential OOM crash fix, and we're still pretty early in the aurora cycle, let's uplift.
Attachment #8763829 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8766707 [details] [diff] [review]
bug1280132-beta

Fix an oom, taking it.
Attachment #8766707 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 21

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/ddf1c2c6a9e3
status-firefox49: affected → fixed

Comment 22

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/c134bf637b15
status-firefox48: affected → fixed
Track this as this is oom issue ans it's in 48.
tracking-firefox48: ? → +
You need to log in before you can comment on or make changes to this bug.