Closed Bug 1280132 Opened 8 years ago Closed 8 years ago

Assertion failure: [unhandlable oom] preparing weak cache sweeping task list

Categories

(Core :: JavaScript: GC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox48 + fixed
firefox49 + fixed
firefox50 + fixed

People

(Reporter: cbook, Assigned: jonco)

References

()

Details

(Keywords: crash)

Attachments

(4 files, 1 obsolete file)

In bughunter we discovered opt crashes like :

Assertion failure: [unhandlable oom] preparing weak cache sweeping task list

with signatures like js::AutoEnterOOMUnsafeRegion::crash js::gc::GCRuntime::beginSweepingZoneGroup JSScript::traceChildren js::gcstats::AutoPhase::~AutoPhase js::gc::GCRuntime::endMarkingZoneGroup

It seems that such crashes also happen in firefox beta opt. One url where this happend seems to be was http://wilmas.zoomin.se/blogg/allmant.html
Attached file bughunter crash report
Component: JavaScript Engine → JavaScript: GC
:jorendorff mentions to needinfo? :terrence as a start.
Flags: needinfo?(terrence)
bc: can we extract a list of urls where this crash/oom happens
Flags: needinfo?(bob)
You can get the list if you click on the fatal message, but I have also extracted the urls from the db directly. They are all very slow loading since they use a large amount of ram and I haven't finished reviewing them for nsfw content. If you want, I'll send them to you via email but talk to me before sharing.

fwiw, I haven't been able to reproduce so for on win 7 32bit and nightly debug build.
Flags: needinfo?(bob)
Attached file test-crash-on-load.js
This appears to require Spider and the userhook. It also occurs in an opt build which I don't understand. This does not occur in Firefox 47 but does in Firefox Beta 48 and Firefox Nightly 50.

One of the urls reproduces on a Windows 7 32bit vm with 2G ram with the following:

firefox -spider -start -quit -hook http://sisyphus.bughunter.ateam.phx1.mozilla.com/bughunter/media/userhooks/test-crash-on-load.js -url http://www.casadellibro.com/comunidad/lo-que-vendo/srcualquiera/7532498

You may need to repeat this several times to get it to reproduce.

Get Spider from http://bclary.com/projects/spider/spider-0.1.0.3-sm+tb+fx+an+fn.xpi and the user hook is attached.
Attached patch bug1280132-sweep-oom (obsolete) — Splinter Review
Patch to check for allocation failure while building the list of weak caches to sweep and sweep these on the main thread in that case.

I changed the vector to use SystemAllocPolicy from the default of mozilla::MallocAllocPolicy because this supports simulated OOM testing.  This is now exercised by our current tests.
Assignee: nobody → jcoppeard
Attachment #8763051 - Flags: review?(terrence)
Flags: needinfo?(terrence)
[Tracking Requested - why for this release]:

Bughunter OOM crash
Comment on attachment 8763051 [details] [diff] [review]
bug1280132-sweep-oom

Review of attachment 8763051 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to see if the method I suggested works first.

::: js/src/jsgc.cpp
@@ +4967,5 @@
>      SweepCCWrappersTask sweepCCWrappersTask(rt);
>      SweepObjectGroupsTask sweepObjectGroupsTask(rt);
>      SweepRegExpsTask sweepRegExpsTask(rt);
>      SweepMiscTask sweepMiscTask(rt);
> +    bool haveSweepCacheTasks = true;

Ugh, passing around a bool is a pretty awful code smell. How about we split this out into a separate method like:

using WeakCacheTaskVector = mozilla::Vector<SweepWeakCacheTask, 0, SystemAllocPolicy>;

WeakCacheTaskVector
GCRuntime::prepareToSweepWeakCaches()
{
    WeakCacheTaskVector out;
    for (GCZoneGroupIter zone(rt); !zone.done(); zone.next()) {
        for (JS::WeakCache<void*>* cache : zone->weakCaches_) {
            if (!out.append(SweepWeakCacheTask(rt, *cache))) {
                out.clear();
                goto oom;
            }
        }
    }
    return out;

  oom:
    for (GCZoneGroupIter zone(rt); !zone.done(); zone.next()) {
        for (JS::WeakCache<void*>* cache : zone->weakCaches_) {
            SweepWeakCacheTask task(rt, *cache);
            task.runFromMainThread(rt);
        }
    }
    return WeakCacheTaskVector();
}
Attachment #8763051 - Flags: review?(terrence)
You could avoid the goto with something like

WeakCacheTaskVector
GCRuntime::prepareToSweepWeakCaches()
{
    WeakCacheTaskVector out;
    auto foregroundSweepOnOOM = [] {
        for (GCZoneGroupIter zone(rt); !zone.done(); zone.next()) {
            for (JS::WeakCache<void*>* cache : zone->weakCaches_) {
                SweepWeakCacheTask task(rt, *cache);
                task.runFromMainThread(rt);
            }
        }
    };

    for (GCZoneGroupIter zone(rt); !zone.done(); zone.next()) {
        for (JS::WeakCache<void*>* cache : zone->weakCaches_) {
            if (!out.append(SweepWeakCacheTask(rt, *cache))) {
                out.clear();
                foregroundSweepOnOOM();
                return out;
            }
        }
    }
    return out;
}

but I admit it does put the error handling first, which is kind of weird.
> passing around a bool is a pretty awful code smell

Agreed.  Thanks for the suggestions.  Hopefully this is better.
Attachment #8763051 - Attachment is obsolete: true
Attachment #8763829 - Flags: review?(terrence)
Comment on attachment 8763829 [details] [diff] [review]
bug1280132-sweep-oom v2

Review of attachment 8763829 [details] [diff] [review]:
-----------------------------------------------------------------

Much nicer!
Attachment #8763829 - Flags: review?(terrence) → review+
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b29c44f5d909
Add fallback path in case we hit OOM while creating the list of weak cache sweeping tasks r=terrence
https://hg.mozilla.org/mozilla-central/rev/b29c44f5d909
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Is this a fix you would want to uplift, or would it be better to keep this fix on 50? 
We certainly would love to bring down the number of OOM crashes, but I can't judge the risk vs. possible impact from the comments here. Jon, terrence, what do you think?
Flags: needinfo?(terrence)
Flags: needinfo?(jcoppeard)
I think this is low risk, so let's uplift.
Blocks: 1263769
Flags: needinfo?(terrence)
Flags: needinfo?(jcoppeard)
Comment on attachment 8763829 [details] [diff] [review]
bug1280132-sweep-oom v2

Approval Request Comment
[Feature/regressing bug #]: Bug 1263769.
[User impact if declined]: Possible OOM crash.
[Describe test coverage new/current, TreeHerder]: On m-c for the last week.
[Risks and why]: Low.
[String/UUID change made/needed]: None.
Attachment #8763829 - Flags: approval-mozilla-aurora?
Attached patch bug1280132-betaSplinter Review
Approval Request Comment
[Feature/regressing bug #]: Bug 1263769.
[User impact if declined]: Possible OOM crash.
[Describe test coverage new/current, TreeHerder]: On m-c for the last week.
[Risks and why]: Low.
[String/UUID change made/needed]: None.
Attachment #8766707 - Flags: review+
Attachment #8766707 - Flags: approval-mozilla-beta?
Comment on attachment 8763829 [details] [diff] [review]
bug1280132-sweep-oom v2

Potential OOM crash fix, and we're still pretty early in the aurora cycle, let's uplift.
Attachment #8763829 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8766707 [details] [diff] [review]
bug1280132-beta

Fix an oom, taking it.
Attachment #8766707 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Track this as this is oom issue ans it's in 48.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: