Closed Bug 1127188 Opened 9 years ago Closed 9 years ago

Web Audio stuff on reloaded page is keeping the page alive due to ICC

Categories

(Core :: Web Audio, defect)

35 Branch
x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: u528881, Assigned: padenot)

References

(Depends on 1 open bug)

Details

(Whiteboard: [MemShrink:P2][games])

Attachments

(5 files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:35.0) Gecko/20100101 Firefox/35.0
Build ID: 20150125221831

Steps to reproduce:

I was repeatedly reloading Em-DOSBox compiled from https://github.com/dreamlayers/em-dosbox/ in a single tab from a file:// URL. This was in 64 bit Firefox 35.0.1 in Ubuntu. Em-DOSBox is a port of DOSBox to asm.js using Emscripten. I wanted to see how an Emscripten setting affected load times, so I would wait for the program to start running and then press the reload button at the right side of the address bar.


Actual results:

Every reload increased memory usage by more than 100 megabytes. In about:memory, the increasing usage is in elements/asm.js under js-main-runtime. There seemed to be no bound to this. I got to over 4 gigabytes in the RES column in top. Then I created a new profile and reproduced the problem there, showing that it is not due to extensions or custom settings.


Expected results:

Memory usage should not have kept increasing that much without bound.

If instead of pressing the reload button I opened a new tab, closed the old tab, and loaded the same thing in the new tab, memory usage didn't keep increasing significantly. That is acceptable and expected behaviour.
Whiteboard: [MemShrink]
Luke, I think we already have a bug open about this. Can you remember the bug number?
Flags: needinfo?(luke)
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Flags: needinfo?(luke)
Hmm, actually reading comment 0, this is different.  For one thing, it's 64-bit so this isn't just a 32-bit-fragmentation-OOM thing.  For another, those pages should be being GC'd so you never have more than a few resident.

Can you reproduce on FF36?  Does memory drop down to an expect level if, in about:memory, you click 'Minimize memory usage' and then re-Measure?  Also, could you attach the files (or a link to the files) for local testing?
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: DUPLICATE → ---
I can reproduce this by reloading https://dl.dropboxusercontent.com/u/16662598/Ports/DOSBox-web/dn1.html?engine=dosbox-sdl2.js in 64-bit Firefox 36.0.1 in Ubuntu 14.10. Each time I waited until at least the initial DOSBox text screen with the blue box on top is displayed, to ensure that the program started running. The RES column number in top increased every time I reloaded, with no apparent bound.
Thanks a lot!  I can indeed reproduce this quite easily.

It looks like, each time we reload, the old page stays alive: you can see the total asm.js heap memory increment by 128 in about:memory, which reports these windows as 'active'.  I waited quite a while and I can see full gcs in the Browser Console, but still these pages are not getting GC'd.  If I close the tab (that I've been reloading in), then about:memory changes the windows to 'top(none)/detached', but still they stay alive. When I finally 'Minimize memory usage' in about:memory, though, it all goes away.

So this seems like a general GC bug and a pretty serious one too.  By comparison, if I keep reloading http://beta.unity3d.com/jonas/DT2, the old windows die after a few seconds and the steady state is to have 1 live window.  (This does resemble bug 1134590 comment 0 in that bug the windows are eventually released and the OOM is more related to the 2gb of memory used.)
In 32-bit Firefox 36.0.1 in 32-bit Windows 7 SP1, I can reload https://dl.dropboxusercontent.com/u/16662598/Ports/DOSBox-web/dn1.html?engine=dosbox-sdl2.js a few times with increasing memory usage. Then the page fails to load, with "Downloading..." displayed at the top and "uncaught exception: out of memory" in the console. In Task Manager, "Memory (Private Working Set)" was never above 500,000 K and there is plenty of memory available to processes. I guess it runs into an address space limitation. When the out of memory error happens, "Memory (Private Working Set)" goes back down to around 175,000 K.

In 64-bit Linux, the RES memory usage reported by top does not go down if I press "Minimize memory usage", but I can see the memory was freed up in about:memory.
(In reply to boris.gjenero from comment #7)
> limitation. When the out of memory error happens, "Memory (Private Working
> Set)" goes back down to around 175,000 K.

That's because, on OOM, we trigger a low-memory event which is roughly the same as the 'Minimize Memory Usage' button.  Definitely looks like a bug here.
Perhaps Andrew could get a CC dump to help explain what is keeping these windows alive?
Flags: needinfo?(continuation)
This is interesting.  I looked at the nsGlobalWindows that aren't being freed.  There seem to be two kinds of rooting objects.  One is nsGlobalWindows, the other is CallbackObjects for some kind of audio processing (SDL2.audio.scriptProcessorNode.onaudioprocess).  In both cases, all edges are known, but the object is being treated as an incremental root, which probably means that it was addrefed or released during an incremental CC (ICC).  This matches up with what Luke said in comment 6, as making a GC/CC happen in about:memory will run a non-incremental CC.

Maybe the page is continuously pumping audio out, even after the reload, and it ends up touching stuff?
Flags: needinfo?(continuation)
Component: Untriaged → Web Audio
Product: Firefox → Core
I'll put this in WebAudio for now, as maybe we're not killing off audio when we reload the page, but that may not be right.
Status: REOPENED → NEW
I also see a JSEventHandler with handlerName=onaudioprocess as an incremental root, as well as an ELM, and what I think is a DETH that is a ScriptProcessNode.
Summary: Memory leak when reloading large asm.js program → Web Audio stuff on reloaded page is keeping the page alive due to ICC
Whiteboard: [MemShrink] → [MemShrink][games]
This is what the many addref and releases of the CallbackObject look like.

The underlying JS object isn't black which is odd to me.
Paul: Do you think this is specific to Web Audio events, or lies elsewhere?
Flags: needinfo?(padenot)
Whiteboard: [MemShrink][games] → [MemShrink:P2][games]
Olli may also have some ideas.
Flags: needinfo?(bugs)
This is really about Web Audio not closing itself down when it should.
(and this isn't about ICC, CC and GC running independently is enough.)
Flags: needinfo?(bugs)
It's not clear to me what happens here, this really is at the edge of Web Audio API for me, but I can tell you what happens in this scenario on the Web Audio side.

Say you're playing some audio using a ScriptProcessorNode (which has an `onaudioprocess` handler, as mentioned above). The audio thread is continuously sending buffers to the main thread [0]. Then you reload the page. For now, this does not do much, it simply mutes the AudioContext (but see below, we can fix this). At this point, the audio thread is still running and sending buffers to the main thread.

Then a lot of things are probably getting collected, but there can be `SendBuffersToMainThread` events in flight (depending on the machine/OS/hardware/drivers, intervals between audio thread runs can be between 2.5ms and 40ms), and this is continuously AddRef-ing the ScriptProcessorNode [1].

It can take some time to delete the ScriptProcessorNode's MediaStream, which is what backs the ScriptProcessNode on the audio thread side (it keeps being processed, and keeps sending buffers to the main thread), so that might be the cause for the leak, but keep in mind that I know very little about GC and CC and all this. I don't know what a black JS object is for example.

With bug 1094764 (that I'm going to land today), we can easily make it so that we stop processing AudioNodes when nsGlobalWindow::FreeInnerObjects or nsGlobalWindow::CleanUp is called. I think that would help for this bug as well, because that would me `SendBuffersToMainThread` would not be called anymore when the page is close/reloaded.

[0]:  https://dxr.mozilla.org/mozilla-central/source/dom/media/webaudio/ScriptProcessorNode.cpp?from=SendBuffersToMainThread#350
[1]: https://dxr.mozilla.org/mozilla-central/source/dom/media/webaudio/ScriptProcessorNode.cpp?from=SendBuffersToMainThread#385
Flags: needinfo?(padenot)
Black js object is something that cycle collector thinks should definitely stay alive.
And whenever an event listener is called, it is first marked black (and all the js objects reachable from that listeners are marked black too), and then the listener is called.

When GC runs, and it doesn't have from JS roots an edge to the event listener, the listener gets marked
gray (it is C++ EventListenerManager of an EventTarget which keeps the listener alive). Now when
CC runs, it traces through gray js world and if C++->gray_js->C++ cycles are found, those can be
unlinked. The issue in this bug is that GC runs, marks some js gray, then event listener is called and it is marked black, and then
CC can't unlink the cycle, since it can't detect there is a cycle.


Stopping around nsGlobalWindow::FreeInnerObjects might indeed fix this. CleanUp is too late.

In a way this is a bit similar to bug 1118689, where we didn't detect that the page is actually gone and just tried to
reload the image all the time because of the <img>'s load event listener.
One might even reuse nsContentUtils::DocumentInactiveForImageLoads from that bug here (well, in that case the method should be renamed) if the FreeInnerObjects approach doesn't help.
Okay, so I'm taking this, and I'll have a shot at stopping processing in FreeInnerObject after bug 1094764 is landed.
Assignee: nobody → padenot
Ehsan, the backstory here is in comment 16, 17, 18. tl; dr, a
ScriptProcessorNode can keep a page alive.

close()ing the AudioContext seems to solve it, I've manually tested with the
page linked in comment 7:
(1) open the page
(2) open htop in a terminal window
(3) wait for DOSBox to boot (something that looks like a machine booting), this 
    increases the resident memory usage in htop, as expected
(4) refresh the page
(5) resident memory usage drops in htop after a short amount of time

Without this patch, I don't see memory usage dropping in (5), and if I keep
refreshing the page for some time, the resident memory usage can go pretty high.
Attachment #8595357 - Flags: review?(ehsan)
Comment on attachment 8595357 [details] [diff] [review]
Close() AudioContexts when Freeing inner window objects. r=

Review of attachment 8595357 [details] [diff] [review]:
-----------------------------------------------------------------

ouch!
Attachment #8595357 - Flags: review?(ehsan) → review+
This is orange on try for unclear reasons, I'm investigating.
When closing an AudioContext, the MSG will check if it really needs to use an
(expensive) audio thread, and if not, switch to a normal thread. This is
happening much more often, since we're now aggresively close()-ing AudioContext
when changing test case (because we close() in FreeInnerObjects). This is a nice
optimization to the load of test suite overall.

Switching more often between system threads and audio threads, we now are much
more prone to situations where the initial "suspended" -> "running" transition
can take some time.  This test was making the assumption that the underlying
audio stream would be fast to create (it's true locally on this super fast Linux
machine), whereas it's super slow on try sometimes.

This mainly makes sure the context is in "running" state (meaning the audio
stream has been created) before doing anything else, and shuffles a couple
things around to make it clearer.
Attachment #8605161 - Flags: review?(ehsan)
When the underlying audio stream platform is slow to start, it can take a little
while for the AudioContext to switch from its initial "suspended" state to
"running".

Authors can call AudioContext.close() right after (new AudioContext()),
that was resulting in a bogus "closed" -> "running" transition, that is now
handled properly by bailing out: the context will simply not report a switch to
"running".
Attachment #8605162 - Flags: review?(ehsan)
Comment on attachment 8605161 [details] [diff] [review]
Make test_audioContextSuspendResumeClose.html less flaky. r=

Review of attachment 8605161 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/webaudio/test/test_audioContextSuspendResumeClose.html
@@ +102,5 @@
>  
> +    var mediaStreamAudioSourceNode2 =
> +      ac2.createMediaStreamSource(mediaStreamDestination1.stream),
> +      sp2 = ac2.createScriptProcessor(),
> +      suspendCalled = false,

Looks like nothing is now using this variable.

@@ +170,5 @@
> +      ac2.createMediaStreamSource(mediaStreamDestination1.stream),
> +      sp2 = ac2.createScriptProcessor(),
> +      resumed = false,
> +      suspended = false,
> +      countEventOnFirstSP = true,

These three variables look unused as well.
Attachment #8605161 - Flags: review?(ehsan) → review+
Comment on attachment 8605162 [details] [diff] [review]
Properly handle AudioContext.close() calls right after the creation of an AudioContext. r=

Review of attachment 8605162 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the review delay.
Attachment #8605162 - Flags: review?(ehsan) → review+
sorry had to back this out on request for test failures like https://treeherder.mozilla.org/logviewer.html#?job_id=9999455&repo=mozilla-inbound
Flags: needinfo?(padenot)
I think I'm missing a null-check: now, AudioContext::Close() can be called when the DestinationStream is dead.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f79cfcfee699
Flags: needinfo?(padenot)
Attachment #8608613 - Flags: review?(ehsan) → review+
Depends on: 1228226
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: