OOM crash on http://pioul.fr/lolgl/ WebGl game.

NEW
Unassigned

Status

()

Core
JavaScript: GC
3 years ago
5 months ago

People

(Reporter: VarCat, Unassigned)

Tracking

({crash})

33 Branch
x86
Windows 8.1
crash
Points:
---

Firefox Tracking Flags

(firefox36 affected, firefox41 affected)

Details

(Whiteboard: [MemShrink:P2], crash signature)

Attachments

(1 attachment)

(Reporter)

Description

3 years ago
Environment:

FF 33.0.2
OS: Win 8.1 x32

STR:

1. In a new tab go to http://pioul.fr/lolgl/
2. Let the animations run for several seconds.
3. Close the tab.

Issue:
FF crashes due to OOM (out of memory).

Note:
Please note that the issue is reproducible on FF 32.0.3 so it's not a regression from FF 33.0.1 to FF 33.0.2.
(Reporter)

Comment 1

3 years ago
https://crash-stats.mozilla.com/report/index/510a4af6-b66a-4a14-8390-a10812141027
This is one of the related crash reports.
After step 3, I beachball for a few seconds on Mac 10.9.4, but no crash.  just fyi.

Updated

3 years ago
Crash Signature: [@ OOM | small ]
Keywords: crash

Comment 3

3 years ago
This sounds like memory was "just" fairly full at that point.
Flags: needinfo?(jgilbert)
Yep, this site could easily exhaust memory on a 32-bit OS (like Catalin's). 

The heap-unclassified is pretty high.

924.76 MB (100.0%) -- explicit
├──513.40 MB (55.52%) -- window-objects
├──248.14 MB (26.83%) ── heap-unclassified

1,166.20 MB ── private
1,128.64 MB ── resident
1,707.92 MB ── vsize
Flags: needinfo?(catalin.varga)
Whiteboard: [MemShrink]
Oops
Flags: needinfo?(catalin.varga)
Whiteboard: [MemShrink] → [MemShrink:P2]
Created attachment 8521148 [details]
Cumulative heap allocations in the child process

I was able to start the game and then close the tab successfully on my Linux64
machine. about:memory looked reasonable: once the game was loaded, the parent process was only ~100 MiB, and the child was ~1 GiB with much of it in JS
memory; nothing looked very unusual.

I've attached some *cumulative* allocation counts for the child process,
obtained with DMD, using a stack depth of 6.

Cycle collection shows up a lot. See records 2, 4, 17, 18, 19, 22, 30, 31, 32.

js::jit::FixedList also shows up a lot. See records 1, 5, 23, 37. These seems
to be mostly for the MResumePoint::operands_ and MBasic::slots_. I did some ad
hoc profiling of the former, and here are the freqencies of sizes:

> 136948 counts:
> (  1)    16930 (12.4%, 12.4%): MResumePoint: 235
> (  2)    15757 (11.5%, 23.9%): MResumePoint: 236
> (  3)     8087 ( 5.9%, 29.8%): MResumePoint: 237
> (  4)     7655 ( 5.6%, 35.4%): MResumePoint: 6
> (  5)     7456 ( 5.4%, 40.8%): MResumePoint: 7
> (  6)     6651 ( 4.9%, 45.7%): MResumePoint: 8
> (  7)     6625 ( 4.8%, 50.5%): MResumePoint: 10
> (  8)     6180 ( 4.5%, 55.0%): MResumePoint: 238
> (  9)     6025 ( 4.4%, 59.4%): MResumePoint: 5
> ( 10)     5669 ( 4.1%, 63.6%): MResumePoint: 11

Here's the same data, but with each line weighted by its size:

> 12808751 counts:
> (  1)  3978550 (31.1%, 31.1%): MResumePoint: 235
> (  2)  3718652 (29.0%, 60.1%): MResumePoint: 236
> (  3)  1916619 (15.0%, 75.1%): MResumePoint: 237
> (  4)  1470840 (11.5%, 86.5%): MResumePoint: 238
> (  5)   248580 ( 1.9%, 88.5%): MResumePoint: 45
> (  6)   203676 ( 1.6%, 90.1%): MResumePoint: 44
> (  7)   158696 ( 1.2%, 91.3%): MResumePoint: 239
> (  8)    75532 ( 0.6%, 91.9%): MResumePoint: 46
> (  9)    66250 ( 0.5%, 92.4%): MResumePoint: 10
> ( 10)    62359 ( 0.5%, 92.9%): MResumePoint: 11

Those are some really deep stacks. jandem, does MResumePoint::operands_ really
need to be that big? Is it a worst-case allocation?
Flags: needinfo?(jdemooij)
> Cycle collection shows up a lot. See records 2, 4, 17, 18, 19, 22, 30, 31, 32.

Oh! Look at the "individual block sizes" in record 2:

> 9 blocks in heap block record 2 of 6,030
> 267,911,168 bytes (267,911,168 requested / 0 slop)
> Individual block sizes: 134,217,728; 67,108,864; 33,554,432; 16,777,216; 8,388,608; 4,194,304; 2,097,152; 1,048,576; 524,288
> 6.38% of the heap (20.40% cumulative)
> Allocated at {
>   #01: PLDHashTable::ChangeTable(int) (/home/njn/moz/mi5/xpcom/glue/pldhash.cpp:510)
>   #02: PLDHashTable::Operate(void const*, PLDHashOperator) (/home/njn/moz/mi5/xpcom/glue/pldhash.cpp:597)
>   #03: CCGraph::AddNodeToMap(void*) (/home/njn/moz/mi5/co64dmd/xpcom/base/../../../xpcom/base/nsCycleCollector.cpp:913)
>   #04: CCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*) (/home/njn/moz/mi5/co64dmd/xpcom/base/../../../xpcom/base/nsCycleCollector.cpp:2172)
>   #05: CCGraphBuilder::NoteChild(void*, nsCycleCollectionParticipant*, nsCString) (/home/njn/moz/mi5/co64dmd/xpcom/base/../../../xpcom/base/nsCycleCollector.cpp:2102)
>   #06: CCGraphBuilder::NoteJSChild(void*) (/home/njn/moz/mi5/co64dmd/xpcom/base/../../../xpcom/base/nsCycleCollector.cpp:2332)
> }

The pldhash storage got up to 128 MiB! I bet that's the problem.
And it looks like CC only ran once, when the tab was closed. That's surprising, given how many JS objects were being allocated. Maybe this page interacts badly with the GC/CC heuristics?
(In reply to Nicholas Nethercote [:njn] from comment #6)
> Those are some really deep stacks. jandem, does MResumePoint::operands_
> really need to be that big? Is it a worst-case allocation?

Each basic block and effectful MIR instruction has a resume point. The resume point captures the stack state for bailouts, so the operands list has a pointer for each stack slot (arguments, locals, expression slots).

It's possible we have some huge script with many variables; then the resume points can be big. Let me know if this is the culprit and we can investigate more but I'm afraid there's no easy fix.
Flags: needinfo?(jdemooij)
(In reply to Nicholas Nethercote [:njn] from comment #8)
> And it looks like CC only ran once, when the tab was closed. That's
> surprising, given how many JS objects were being allocated. Maybe this page
> interacts badly with the GC/CC heuristics?

If the JS being allocated isn't DOM wrappers, we don't need to CC.
> It's possible we have some huge script with many variables; then the resume
> points can be big. Let me know if this is the culprit and we can investigate
> more but I'm afraid there's no easy fix.

The length of MResumePoint::operands_ is the stack depth. So it looks like we're creating many resume points with a call depth of 235+.
I got this signature while navigating on google maps, FF 36b1 Win 7 x86
https://crash-stats.mozilla.com/report/index/d32494fa-f4d6-4c59-97f8-4829f2150115
See Also: → bug 1115929
status-firefox36: --- → affected
Encountered again on Google Maps with Firefox 41 RC on Windows 10 x86: https://crash-stats.mozilla.com/report/index/e5fa58ee-57eb-4445-ad0a-03a562150915.
status-firefox41: --- → affected
This WFM on Nightly46, Win10-64.

I'm going to move this to GC like bug 1115929. Not sure if we want to close this yet.
Component: Canvas: WebGL → JavaScript: GC
Flags: needinfo?(jgilbert)
You need to log in before you can comment on or make changes to this bug.