Closed Bug 1413467 Opened 2 years ago Closed 2 years ago

Crash in OOM | large | NS_ABORT_OOM | nsTArrayInfallibleAllocator::ResultTypeProxy nsTArray_base<T>::EnsureCapacity<T>

Categories

(Core :: DOM: HTML Parser, defect, P2, critical)

57 Branch
x86_64
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox56 --- wontfix
firefox57 --- unaffected
firefox58 --- unaffected
firefox59 --- fixed

People

(Reporter: david, Assigned: hsivonen)

Details

(Whiteboard: [MemShrink])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-10fb304e-2d3b-4b39-aad7-6a4e90171101.
=============================================================

Firefox 57 beta has become almost unusable for me on MacOS Sierra since 57.0beta10 (beta8, beta9: ok; beta10, beta11, beta12, and beta13: not ok). Tabs are crashing frequently, always with the same signature. I also tried to reset my settings, but it didn't help. I think that this should be a blocker for a public Firefox 57 release.
Component: General → Memory Allocator
Product: Firefox → Core
Whiteboard: [MemShrink]
Still seeing crashes with the latest 57 release:
https://crash-stats.mozilla.com/report/index/eded8215-e4b2-40c6-b13c-25f750171108
OOM Allocation Size 	2,208,301,056 bytes (2.06 GB)
from nsHtml5TreeBuilder::createElement(int, nsIAtom*, nsHtml5HtmlAttributes*, void*, nsHtml5ContentCreatorFunction)
Component: Memory Allocator → HTML: Parser
Henri, can you please take a look?
Flags: needinfo?(hsivonen)
In the sense of "OOM crash at this particular point", this is a duplicate of an earlier bug that I fail to locate. The old bug was about 32-bit systems, though. It's of interest that a) this crash is 64-bit Firefox and b) the attempted allocation is actually huge.

If I'm reading the code right, in order for an allocation of 2.06 GB to be attempted here, there had to already be 1.8 GB of tree operations in the off-the-main-thread parser's queue.

The notion that the queue could accumulate 1.8 GB of tree ops without getting a timer flush is rather remarkable.

Reporter, do you happen to regularly open a particular huge Web page over a very fast connection?

The Mac-only (except one Linux crash) nature of this signature probably arises from clang-specific inlining. Note that the crash signature field on this bug ends up associating this with a whole bunch of unrelated nsTArray OOM crashes on Mac.

Without having steps to reproduce, the only thing that comes to mind is that perhaps we should make the off-the-main-thread parser flush the queue not just on timer and </script> but also when the length of the queue exceeds a magic number.
Flags: needinfo?(hsivonen)
(In reply to Henri Sivonen (:hsivonen) from comment #4)
> Note that the crash signature field on
> this bug ends up associating this with a whole bunch of unrelated nsTArray
> OOM crashes on Mac.

Filed bug 1416168.
(In reply to Henri Sivonen (:hsivonen) from comment #4)
> Without having steps to reproduce, the only thing that comes to mind is that
> perhaps we should make the off-the-main-thread parser flush the queue not
> just on timer and </script> but also when the length of the queue exceeds a
> magic number.

Except that won't help if the parser is speculating. The crash here could well arise on a page that has a slow script followed by a huge number of tags after the script tag.

What we could do is making each tree op append fallible and capable of marking the parser as broken.

It's unclear to me what priority that work should be. I'd expect problems on 32-bit Windows to be much more common than the 64-bit Mac case reported here.
I do regularly open my employers internal applications (wiki, bug tracking, etc.), but I wouldn't call them "huge". I'll try to keep an eye if firefox starts crashing more frequently after I visited a certain page.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=71aeda70639d1cbf218c180b380c30e6a3ac795d
https://treeherder.mozilla.org/#/jobs?repo=try&revision=afe6626e5482b6b4077f855e52bc999f3bd65fbf

(In reply to David Schweikert from comment #7)
> I do regularly open my employers internal applications (wiki, bug tracking,
> etc.), but I wouldn't call them "huge". I'll try to keep an eye if firefox
> starts crashing more frequently after I visited a certain page.

Unless my math is completely wrong, "huge" would be on the order of over a million but less than 10 million elements.
Assignee: nobody → hsivonen
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #8927263 - Flags: review?(bzbarsky)
Priority: -- → P2
Comment on attachment 8927263 [details]
Bug 1413467 - Make enqueueing tree ops use fallible allocation.

https://reviewboard.mozilla.org/r/198574/#review204316

Looks reasonable, though it might be nice to have an AppendTreeOp() function that does the fallible append and if that fails calls MarkAsBrokenAndRequestSuspensionWithoutBuilder and then returns null, else returns the newly-appended tree op, instead of having to duplicate that logic all over
Attachment #8927263 - Flags: review?(bzbarsky) → review+
Comment on attachment 8927263 [details]
Bug 1413467 - Make enqueueing tree ops use fallible allocation.

https://reviewboard.mozilla.org/r/198574/#review204316

Thanks.

A new function would save only one function call. To fully abstract the operation, a macro would be needed. I considered it but didn't do it, because, AFAICT, we are supposed to avoid hiding return statements in macros these days.
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b6daeeac264c
Make enqueueing tree ops use fallible allocation. r=bz
https://hg.mozilla.org/mozilla-central/rev/b6daeeac264c
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
No occurrence of this crash in 58 or 57.
I switched to Firefox 59 nightly build in the mean time, because I have seen that that fix was available there. That might explain it. I have now switched back to Firefox 58 beta to see if the crashes still happen for me there.
Crash Signature: [@ OOM | large | NS_ABORT_OOM | nsTArrayInfallibleAllocator::ResultTypeProxy nsTArray_base<T>::EnsureCapacity<T>] → [@ OOM | large | NS_ABORT_OOM | nsTArrayInfallibleAllocator::ResultTypeProxy nsTArray_base<T>::EnsureCapacity<T> | nsHtml5TreeBuilder::createElement ]
(In reply to Sylvestre Ledru [:sylvestre] from comment #15)
> No occurrence of this crash in 58 or 57.

The crash reports were reprocessed in bug 1416168, so the old signature no longer matches. I updated the signature field on this bug.
You need to log in before you can comment on or make changes to this bug.