Closed Bug 1120115 Opened 9 years ago Closed 5 years ago

Memory not collected until tab left on WebGL application

Categories

(Core :: Graphics, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: azakai, Assigned: jgilbert)

References

()

Details

(Whiteboard: [MemShrink:P2] [gfx-noted])

https://code.google.com/p/ozz-animation/

shows a cool skeletal animation demo. On chrome memory usage is stable. On firefox nightly, memory increases steadily, until after 30 seconds or so the browser takes over 1GB of additional memory compared to before, and would OOM on my machine.

I can't seem to get useful data from about:memory, as merely switching tabs quickly frees the excess memory! Closing the tab likewise frees it. Seems like just when the page is running it won't free anything.
The usual technique when you have this kind of issue is to open a second window and run about:memory there.
Brilliant, thanks! about:memory in another windows works perfectly.

Ok, here is a diff showing what's going on:

Explicit Allocations

249.93 MB (100.0%) -- explicit
├──261.25 MB (104.53%) -- gfx
│  ├──261.25 MB (104.53%) ── heap-textures
│  └────0.00 MB (00.00%) ── font-shaped-words
├───-9.21 MB (-3.68%) -- js-non-window
│   ├──-9.22 MB (-3.69%) -- runtime
│   │  ├──-4.82 MB (-1.93%) ── uncompressed-source-cache
│   │  ├──-3.25 MB (-1.30%) ── temporary
│   │  └──-1.15 MB (-0.46%) ++ (4 tiny)
│   └───0.01 MB (00.00%) ++ (2 tiny)
└───-2.11 MB (-0.85%) ++ (10 tiny)

Looks like a graphics (likely WebGL, because that's what is being used to render?) issue.
Component: JavaScript Engine → Graphics
Whiteboard: [MemShrink]
Could also be images related, switching away from tabs freeing memory is something that happens for images.
(In reply to Timothy Nikkel (:tn) from comment #3)
> Could also be images related, switching away from tabs freeing memory is
> something that happens for images.

But that should show up in about:memory under "images", not "gfx/heap-textures", no?
If we layerize images (a layer for just the image) I think it's memory can show up as gfx/heap-textures.
Is this only happening on Linux? I just tried on OS X and my local debug build actually crashed while getting the about:memory report. Running in lldb and periodically inspecting the GfxMemoryImageReporter::sAmount value showed it was stable.

If it's platform specific it might provide a hint as to what's going on. I'll file a bug for the crash I saw.
I'm not able to reproduce on Linux using a non-debug m-c build either. Is anybody else able to reproduce this?
Whiteboard: [MemShrink] → [MemShrink] gfx-noted
I can't reproduce this on 2 other linux machines. It only happens on first linux machine where I saw the issue.
On a different machine than the first (also linux), I do see the same symptoms on a different site, that I just happened to see now on HN. STR:

1. Open a non-e10s window in nightly
2. Go to http://timeinvariant.github.io/gorescript/play/
2. Click "new game", and just wait. After a few seconds, memory usage has substantially increased, where this machine (8GB) would OOM in less than 10.

As in the original STR, all the excess memory vanishes instantly by just switching tabs (seemingly without waiting for a GC or CC). Then returning lets it start to increase again at the same speed as before. Do we have special behavior to free something on tab switch?

Interestingly, the use of a non-e10s windows seems critical. I do *not* see the bug in e10s (although the game is unplayable due to mouselock not working in e10s, but that's irrelevant), but I do see it consistently in a non-e10s window.

Yet, I only see this on one machine. The other one I have here does not show it. Overall, this is clearly a hard to reproduce bug, but when it does manifest it is pretty bad. I suspect this is a fairly recent regression, as I use these machines all the time, but just saw the bug on 2 separate machines for the first time over the last week.
Renaming as the second STR is a non-emscripten WebGL app.
Summary: Memory not collected until tab left on emscripten application → Memory not collected until tab left on WebGL application
A few things might help here if you have the time. One is to try and isolate the differences in the profiles being used and the hardware of the different machines. The other is just sticking a breakpoint in the GfxMemoryImageReporter and seeing who is calling it (which should point us to the code doing the memory allocation).
Whiteboard: [MemShrink] gfx-noted → [MemShrink] [gfx-noted]
Alon, can you clarify which memory numbers you see going up and which you don't?
Flags: needinfo?(azakai)
Sure, it's gfx/heap-textures that I see go up.
Flags: needinfo?(azakai)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #11)
> A few things might help here if you have the time. One is to try and isolate
> the differences in the profiles being used and the hardware of the different
> machines. The other is just sticking a breakpoint in the
> GfxMemoryImageReporter and seeing who is calling it (which should point us
> to the code doing the memory allocation).

Hmm, due to the difficulty in reproducing (2 STRs, each working only on one machine, all of which linuxes but otherwise various hardware), I would guess this depends on timing somehow. Like the frame rate is just fast enough to get certain collection code running on some machines/some setups (like with or without e10s), and otherwise not.

I tried to see about making a build and running in the debugger, but mach build isn't working for me. Suggestions on #developers didn't seem to help. Can I use a debugger to get a stack trace without building my own browser? (running gdb on nightly doesn't find any symbols)
I bisected this manually over nightlies. The regression happened on the nightly on Oct 11, 2014 - so farther back than I was guessing before. Range:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=50b689feab5f&tochange=f74ad36bb97b

Bunch of WebGL changes there, cc'ing Jeff.
jgilbert, any thoughts on the regression window in comment 15?
Flags: needinfo?(jgilbert)
Whiteboard: [MemShrink] [gfx-noted] → [MemShrink:P2] [gfx-noted]
(In reply to Nicholas Nethercote [:njn] from comment #16)
> jgilbert, any thoughts on the regression window in comment 15?

Yep, that's when we landed the most recent WebGL compositing changes. Maybe we're leaking something?
Flags: needinfo?(jgilbert)
(In reply to Jeff Gilbert [:jgilbert] from comment #17)
> (In reply to Nicholas Nethercote [:njn] from comment #16)
> > jgilbert, any thoughts on the regression window in comment 15?
> 
> Yep, that's when we landed the most recent WebGL compositing changes. Maybe
> we're leaking something?

Let me be more explicit: as far as I can tell, you landed those changes. Can you please investigate the potential regression that your changes caused? Or, if it was somebody else's changes, please feel free to reassign the bug to them. Thank you.
Assignee: nobody → jgilbert
Thank you for being explicit, and I'll root-cause this as priorities allow.
Flags: needinfo?(jgilbert)

WORKSFORME now. Reopen if not!

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(jgilbert)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.