Open Bug 1405521 Opened 3 years ago Updated 9 months ago

Crash in nsGlobalWindow::ClearDocumentDependentSlots: MOZ_CRASH(Unhandlable OOM while clearing document dependent slots.)

Categories

(Core :: DOM: Core & HTML, defect, P2)

Unspecified
All
defect

Tracking

()

Tracking Status
firefox57 --- wontfix
firefox58 --- fix-optional
firefox59 --- affected
firefox60 --- affected

People

(Reporter: njn, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-0901faec-eb60-4ad0-b1ce-0a9e10171003.
=============================================================

We are hitting this:

> MOZ_CRASH(Unhandlable OOM while clearing document dependent slots.)

But in the linked crash report the system has huge amounts of virtual memory, page file and physical memory.

The code looks like this:

>  if (!WindowBinding::ClearCachedDocumentValue(aCx, this) ||
>      !WindowBinding::ClearCachedPerformanceValue(aCx, this)) {
>    MOZ_CRASH("Unhandlable OOM while clearing document dependent slots.");
>  }

It assumes that ClearCachedDocumentValue() and ClearCachedPerformanceValue() failure indicates OOM, but they will also fail if `this->GetWrapper()` returns null. bz, is that possible?
Flags: needinfo?(bzbarsky)
> but they will also fail if `this->GetWrapper()` returns null.

No, they won't; they'll no-op.  The generated code is:

  obj = aObject->GetWrapper();
  if (!obj) {
    return true;
  }

precisely because no wrapper is not a failure condition: it just means that there is nothing to clear.  

False return means that either get_document or get_performance returned false.

Looking at get_document, we're in the case when we just cleared the slot, so we'll make it to the self->GetDocument() call.  Once we do, we will return false only in the following cases:

1)  GetOrCreateDOMReflector(cx, result, args.rval()) returns false.
2)  Wrapping the value the getter returned into the window's compartment returns false.
3)  Wrapping the value the getter returned into the caller's compartment (which is also
    the window compartment in this case) returns false.

That's it.  Looking at get_performance, it can return false also in only those three cases.

Wrapping into compartments only fails if JS_WrapValue returns false, which only happens if JSCompartment::wrap() returns false, which can happen in the following ways:

A) CheckSystemRecursionLimit() (inside getNonWrapperObjectForCurrentCompartment) fails.
B) The prewrap callback outputs null.  This should never happen for webidl objects, afaict.
C) JSCompartment::getOrCreateWrapper fails.  I believe this can only happen on "OOM".

As for GetOrCreateDOMReflector, it returns false if:

D) CouldBeDOMBinding() returns false (should never happen for document or performance).
E) The actual WrapObject() call returns false.
F) JS_WrapValue returns false, see above.

WrapObject() _can_ fail in somewhat interesting ways for documents.  See <http://searchfox.org/mozilla-central/source/dom/base/nsINode.cpp#2953-2958>.  But that shouldn't happen in this case, since we're just setting the document up so all the script handling object state should be fine.  The other thing both WrapObjects call is the relevant binding's Wrap(), which can happen in cases when globals are missing or protos can't be instantiated, but fundamentally none of those should be happening here.  And of course it can fail on "OOM".

Now the big caveat: "OOM" for our purposes is "out of SpiderMonkey heap", not "out of memory".  It's totally possible to hit "OOM" without actually being out of memory for malloc() purposes...  I don't know offhand what the cap is on the size of SpiderMonkey's heap, esp because it looks like we have separate heaps for gcthing allocations and JS_malloc allocations or something.

So for this specific crash, my best guess is that we either hit the SpiderMonkey memory cap or CheckSystemRecursionLimit() failed.  Hard to tell about the latter, because the stack is truncated at the first jitframe.
Flags: needinfo?(bzbarsky)
> No, they won't; they'll no-op.  The generated code is:
> 
>   obj = aObject->GetWrapper();
>   if (!obj) {
>     return true;
>   }
> 
> precisely because no wrapper is not a failure condition: it just means that
> there is nothing to clear.  

Yes, my bad.

Thank you for the detailed analysis.

jonco, does SM have a heap limit?
Flags: needinfo?(jcoppeard)
(In reply to Nicholas Nethercote [:njn] from comment #2)
> jonco, does SM have a heap limit?

Yes, the GC parameter JSGC_MAX_BYTES is used to set this.  This is set to 0xffffffff in XPConnect.
Flags: needinfo?(jcoppeard)
Fwiw, bug 1197540 has a testcase that can trigger this assert.  That testcase definitely looks like an infinite recursion case to me, so it might be running into CheckSystemRecursionLimit() failure...  I'll try catching it in rr.
Depends on: 1197540
Priority: -- → P2
426 crashes in the last week in 57.
OS: Windows 10 → All
Crash Signature: [@ nsGlobalWindow::ClearDocumentDependentSlots] → [@ nsGlobalWindow::ClearDocumentDependentSlots] [@ nsGlobalWindowInner::ClearDocumentDependentSlots]
Duplicate of this bug: 1422313
There are around 1000 crashes per week on release 57 versions. From the duplicate bug 1422313, sounds like this is a shift in signature rather than a new crash.
bp-9d2936eb-65eb-4e92-bd7d-540380180425 with build 20180410100115 @ nsGlobalWindowInner::ClearDocumentDependentSlots
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #4)
> Fwiw, [Mac] bug 1197540 has a testcase that can trigger this assert.  That
> testcase definitely looks like an infinite recursion case to me, so it might
> be running into CheckSystemRecursionLimit() failure...  I'll try catching it
> in rr.

The Mac crashes for that bug report dropped to near zero around mid-March, from whatever shipped after 58.0.2. https://crash-stats.mozilla.com/signature/?_sort=user_comments&_sort=-date&signature=nsGlobalWindow%3A%3AInnerSetNewDocument&date=%3E%3D2017-12-11T04%3A18%3A16.000Z&date=%3C2018-06-11T05%3A18%3A16.000Z#graphs
(In reply to Wayne Mery (:wsmwk) from comment #9)
> Crashes for that bug report (bug 1197540) dropped to near zero around mid-March,
> from whatever shipped after 58.0.2.
> https://crash-stats.mozilla.com/signature/?_sort=user_comments&_sort=-
> date&signature=nsGlobalWindow%3A%3AInnerSetNewDocument&date=%3E%3D2017-12-
> 11T04%3A18%3A16.000Z&date=%3C2018-06-11T05%3A18%3A16.000Z#graphs

However, the crash rate for this (Windows) bug has held fairly steady.  But that is actually not surpring because the majority of crashes here are on 52.x esr.

I had another crash bp-3819de05-b8f6-41a0-9297-1b2980180611
See Also: → 1488480
Depends on: 1491313
Depends on: 1491925
Depends on: 1493849
Depends on: 1496805
Depends on: 1499150
Depends on: 1501479
Depends on: 1503659
Depends on: 1503664
Depends on: 1505468
Component: DOM → DOM: Core & HTML

While trying to get an rr trace for bug 1593704 I kept triggering this issue. I created a Pernosco session which can be found here: https://pernos.co/debug/HeNG0Imk-tsJryLVWyqD2g/index.html

You need to log in before you can comment on or make changes to this bug.