Closed Bug 1089682 Opened 10 years ago Closed 3 years ago

OOM crash on http://pioul.fr/lolgl/ WebGl game.

Categories

(Core :: JavaScript: GC, defect)

33 Branch
x86
Windows 8.1
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox36 --- affected
firefox41 --- affected

People

(Reporter: VarCat, Unassigned)

References

Details

(Keywords: crash, Whiteboard: [MemShrink:P2])

Crash Data

Attachments

(1 file)

Environment:

FF 33.0.2
OS: Win 8.1 x32

STR:

1. In a new tab go to http://pioul.fr/lolgl/
2. Let the animations run for several seconds.
3. Close the tab.

Issue:
FF crashes due to OOM (out of memory).

Note:
Please note that the issue is reproducible on FF 32.0.3 so it's not a regression from FF 33.0.1 to FF 33.0.2.
After step 3, I beachball for a few seconds on Mac 10.9.4, but no crash.  just fyi.
Crash Signature: [@ OOM | small ]
Keywords: crash
This sounds like memory was "just" fairly full at that point.
Flags: needinfo?(jgilbert)
Yep, this site could easily exhaust memory on a 32-bit OS (like Catalin's). 

The heap-unclassified is pretty high.

924.76 MB (100.0%) -- explicit
├──513.40 MB (55.52%) -- window-objects
├──248.14 MB (26.83%) ── heap-unclassified

1,166.20 MB ── private
1,128.64 MB ── resident
1,707.92 MB ── vsize
Flags: needinfo?(catalin.varga)
Whiteboard: [MemShrink]
Oops
Flags: needinfo?(catalin.varga)
Whiteboard: [MemShrink] → [MemShrink:P2]
I was able to start the game and then close the tab successfully on my Linux64
machine. about:memory looked reasonable: once the game was loaded, the parent process was only ~100 MiB, and the child was ~1 GiB with much of it in JS
memory; nothing looked very unusual.

I've attached some *cumulative* allocation counts for the child process,
obtained with DMD, using a stack depth of 6.

Cycle collection shows up a lot. See records 2, 4, 17, 18, 19, 22, 30, 31, 32.

js::jit::FixedList also shows up a lot. See records 1, 5, 23, 37. These seems
to be mostly for the MResumePoint::operands_ and MBasic::slots_. I did some ad
hoc profiling of the former, and here are the freqencies of sizes:

> 136948 counts:
> (  1)    16930 (12.4%, 12.4%): MResumePoint: 235
> (  2)    15757 (11.5%, 23.9%): MResumePoint: 236
> (  3)     8087 ( 5.9%, 29.8%): MResumePoint: 237
> (  4)     7655 ( 5.6%, 35.4%): MResumePoint: 6
> (  5)     7456 ( 5.4%, 40.8%): MResumePoint: 7
> (  6)     6651 ( 4.9%, 45.7%): MResumePoint: 8
> (  7)     6625 ( 4.8%, 50.5%): MResumePoint: 10
> (  8)     6180 ( 4.5%, 55.0%): MResumePoint: 238
> (  9)     6025 ( 4.4%, 59.4%): MResumePoint: 5
> ( 10)     5669 ( 4.1%, 63.6%): MResumePoint: 11

Here's the same data, but with each line weighted by its size:

> 12808751 counts:
> (  1)  3978550 (31.1%, 31.1%): MResumePoint: 235
> (  2)  3718652 (29.0%, 60.1%): MResumePoint: 236
> (  3)  1916619 (15.0%, 75.1%): MResumePoint: 237
> (  4)  1470840 (11.5%, 86.5%): MResumePoint: 238
> (  5)   248580 ( 1.9%, 88.5%): MResumePoint: 45
> (  6)   203676 ( 1.6%, 90.1%): MResumePoint: 44
> (  7)   158696 ( 1.2%, 91.3%): MResumePoint: 239
> (  8)    75532 ( 0.6%, 91.9%): MResumePoint: 46
> (  9)    66250 ( 0.5%, 92.4%): MResumePoint: 10
> ( 10)    62359 ( 0.5%, 92.9%): MResumePoint: 11

Those are some really deep stacks. jandem, does MResumePoint::operands_ really
need to be that big? Is it a worst-case allocation?
Flags: needinfo?(jdemooij)
> Cycle collection shows up a lot. See records 2, 4, 17, 18, 19, 22, 30, 31, 32.

Oh! Look at the "individual block sizes" in record 2:

> 9 blocks in heap block record 2 of 6,030
> 267,911,168 bytes (267,911,168 requested / 0 slop)
> Individual block sizes: 134,217,728; 67,108,864; 33,554,432; 16,777,216; 8,388,608; 4,194,304; 2,097,152; 1,048,576; 524,288
> 6.38% of the heap (20.40% cumulative)
> Allocated at {
>   #01: PLDHashTable::ChangeTable(int) (/home/njn/moz/mi5/xpcom/glue/pldhash.cpp:510)
>   #02: PLDHashTable::Operate(void const*, PLDHashOperator) (/home/njn/moz/mi5/xpcom/glue/pldhash.cpp:597)
>   #03: CCGraph::AddNodeToMap(void*) (/home/njn/moz/mi5/co64dmd/xpcom/base/../../../xpcom/base/nsCycleCollector.cpp:913)
>   #04: CCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*) (/home/njn/moz/mi5/co64dmd/xpcom/base/../../../xpcom/base/nsCycleCollector.cpp:2172)
>   #05: CCGraphBuilder::NoteChild(void*, nsCycleCollectionParticipant*, nsCString) (/home/njn/moz/mi5/co64dmd/xpcom/base/../../../xpcom/base/nsCycleCollector.cpp:2102)
>   #06: CCGraphBuilder::NoteJSChild(void*) (/home/njn/moz/mi5/co64dmd/xpcom/base/../../../xpcom/base/nsCycleCollector.cpp:2332)
> }

The pldhash storage got up to 128 MiB! I bet that's the problem.
And it looks like CC only ran once, when the tab was closed. That's surprising, given how many JS objects were being allocated. Maybe this page interacts badly with the GC/CC heuristics?
(In reply to Nicholas Nethercote [:njn] from comment #6)
> Those are some really deep stacks. jandem, does MResumePoint::operands_
> really need to be that big? Is it a worst-case allocation?

Each basic block and effectful MIR instruction has a resume point. The resume point captures the stack state for bailouts, so the operands list has a pointer for each stack slot (arguments, locals, expression slots).

It's possible we have some huge script with many variables; then the resume points can be big. Let me know if this is the culprit and we can investigate more but I'm afraid there's no easy fix.
Flags: needinfo?(jdemooij)
(In reply to Nicholas Nethercote [:njn] from comment #8)
> And it looks like CC only ran once, when the tab was closed. That's
> surprising, given how many JS objects were being allocated. Maybe this page
> interacts badly with the GC/CC heuristics?

If the JS being allocated isn't DOM wrappers, we don't need to CC.
> It's possible we have some huge script with many variables; then the resume
> points can be big. Let me know if this is the culprit and we can investigate
> more but I'm afraid there's no easy fix.

The length of MResumePoint::operands_ is the stack depth. So it looks like we're creating many resume points with a call depth of 235+.
I got this signature while navigating on google maps, FF 36b1 Win 7 x86
https://crash-stats.mozilla.com/report/index/d32494fa-f4d6-4c59-97f8-4829f2150115
See Also: → 1115929
This WFM on Nightly46, Win10-64.

I'm going to move this to GC like bug 1115929. Not sure if we want to close this yet.
Component: Canvas: WebGL → JavaScript: GC
Flags: needinfo?(jgilbert)

(In reply to Jeff Gilbert [:jgilbert] from comment #14)

This WFM on Nightly46, Win10-64.

I'm going to move this to GC like bug 1115929. Not sure if we want to close
this yet.

Maybe now?

Flags: needinfo?(jgilbert)
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(jgilbert)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: