Memory not collected until tab left on WebGL application

NEW
Assigned to

Status

()

4 years ago
3 years ago

People

(Reporter: azakai, Assigned: jgilbert, NeedInfo)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P2] [gfx-noted], URL)

(Reporter)

Description

4 years ago
https://code.google.com/p/ozz-animation/

shows a cool skeletal animation demo. On chrome memory usage is stable. On firefox nightly, memory increases steadily, until after 30 seconds or so the browser takes over 1GB of additional memory compared to before, and would OOM on my machine.

I can't seem to get useful data from about:memory, as merely switching tabs quickly frees the excess memory! Closing the tab likewise frees it. Seems like just when the page is running it won't free anything.
(Reporter)

Updated

4 years ago
The usual technique when you have this kind of issue is to open a second window and run about:memory there.
(Reporter)

Comment 2

4 years ago
Brilliant, thanks! about:memory in another windows works perfectly.

Ok, here is a diff showing what's going on:

Explicit Allocations

249.93 MB (100.0%) -- explicit
├──261.25 MB (104.53%) -- gfx
│  ├──261.25 MB (104.53%) ── heap-textures
│  └────0.00 MB (00.00%) ── font-shaped-words
├───-9.21 MB (-3.68%) -- js-non-window
│   ├──-9.22 MB (-3.69%) -- runtime
│   │  ├──-4.82 MB (-1.93%) ── uncompressed-source-cache
│   │  ├──-3.25 MB (-1.30%) ── temporary
│   │  └──-1.15 MB (-0.46%) ++ (4 tiny)
│   └───0.01 MB (00.00%) ++ (2 tiny)
└───-2.11 MB (-0.85%) ++ (10 tiny)

Looks like a graphics (likely WebGL, because that's what is being used to render?) issue.
(Reporter)

Updated

4 years ago
Component: JavaScript Engine → Graphics
Whiteboard: [MemShrink]
Could also be images related, switching away from tabs freeing memory is something that happens for images.
(In reply to Timothy Nikkel (:tn) from comment #3)
> Could also be images related, switching away from tabs freeing memory is
> something that happens for images.

But that should show up in about:memory under "images", not "gfx/heap-textures", no?
If we layerize images (a layer for just the image) I think it's memory can show up as gfx/heap-textures.
Is this only happening on Linux? I just tried on OS X and my local debug build actually crashed while getting the about:memory report. Running in lldb and periodically inspecting the GfxMemoryImageReporter::sAmount value showed it was stable.

If it's platform specific it might provide a hint as to what's going on. I'll file a bug for the crash I saw.
I'm not able to reproduce on Linux using a non-debug m-c build either. Is anybody else able to reproduce this?
Whiteboard: [MemShrink] → [MemShrink] gfx-noted
(Reporter)

Comment 8

4 years ago
I can't reproduce this on 2 other linux machines. It only happens on first linux machine where I saw the issue.
(Reporter)

Comment 9

4 years ago
On a different machine than the first (also linux), I do see the same symptoms on a different site, that I just happened to see now on HN. STR:

1. Open a non-e10s window in nightly
2. Go to http://timeinvariant.github.io/gorescript/play/
2. Click "new game", and just wait. After a few seconds, memory usage has substantially increased, where this machine (8GB) would OOM in less than 10.

As in the original STR, all the excess memory vanishes instantly by just switching tabs (seemingly without waiting for a GC or CC). Then returning lets it start to increase again at the same speed as before. Do we have special behavior to free something on tab switch?

Interestingly, the use of a non-e10s windows seems critical. I do *not* see the bug in e10s (although the game is unplayable due to mouselock not working in e10s, but that's irrelevant), but I do see it consistently in a non-e10s window.

Yet, I only see this on one machine. The other one I have here does not show it. Overall, this is clearly a hard to reproduce bug, but when it does manifest it is pretty bad. I suspect this is a fairly recent regression, as I use these machines all the time, but just saw the bug on 2 separate machines for the first time over the last week.
(Reporter)

Comment 10

4 years ago
Renaming as the second STR is a non-emscripten WebGL app.
Summary: Memory not collected until tab left on emscripten application → Memory not collected until tab left on WebGL application
A few things might help here if you have the time. One is to try and isolate the differences in the profiles being used and the hardware of the different machines. The other is just sticking a breakpoint in the GfxMemoryImageReporter and seeing who is calling it (which should point us to the code doing the memory allocation).
Whiteboard: [MemShrink] gfx-noted → [MemShrink] [gfx-noted]
Alon, can you clarify which memory numbers you see going up and which you don't?
Flags: needinfo?(azakai)
(Reporter)

Comment 13

4 years ago
Sure, it's gfx/heap-textures that I see go up.
Flags: needinfo?(azakai)
(Reporter)

Comment 14

4 years ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #11)
> A few things might help here if you have the time. One is to try and isolate
> the differences in the profiles being used and the hardware of the different
> machines. The other is just sticking a breakpoint in the
> GfxMemoryImageReporter and seeing who is calling it (which should point us
> to the code doing the memory allocation).

Hmm, due to the difficulty in reproducing (2 STRs, each working only on one machine, all of which linuxes but otherwise various hardware), I would guess this depends on timing somehow. Like the frame rate is just fast enough to get certain collection code running on some machines/some setups (like with or without e10s), and otherwise not.

I tried to see about making a build and running in the debugger, but mach build isn't working for me. Suggestions on #developers didn't seem to help. Can I use a debugger to get a stack trace without building my own browser? (running gdb on nightly doesn't find any symbols)
(Reporter)

Comment 15

4 years ago
I bisected this manually over nightlies. The regression happened on the nightly on Oct 11, 2014 - so farther back than I was guessing before. Range:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=50b689feab5f&tochange=f74ad36bb97b

Bunch of WebGL changes there, cc'ing Jeff.
jgilbert, any thoughts on the regression window in comment 15?
Flags: needinfo?(jgilbert)
Whiteboard: [MemShrink] [gfx-noted] → [MemShrink:P2] [gfx-noted]
(Assignee)

Comment 17

4 years ago
(In reply to Nicholas Nethercote [:njn] from comment #16)
> jgilbert, any thoughts on the regression window in comment 15?

Yep, that's when we landed the most recent WebGL compositing changes. Maybe we're leaking something?
Flags: needinfo?(jgilbert)
(In reply to Jeff Gilbert [:jgilbert] from comment #17)
> (In reply to Nicholas Nethercote [:njn] from comment #16)
> > jgilbert, any thoughts on the regression window in comment 15?
> 
> Yep, that's when we landed the most recent WebGL compositing changes. Maybe
> we're leaking something?

Let me be more explicit: as far as I can tell, you landed those changes. Can you please investigate the potential regression that your changes caused? Or, if it was somebody else's changes, please feel free to reassign the bug to them. Thank you.
Assignee: nobody → jgilbert
(Assignee)

Comment 19

4 years ago
Thank you for being explicit, and I'll root-cause this as priorities allow.
Flags: needinfo?(jgilbert)
You need to log in before you can comment on or make changes to this bug.