Open Bug 897796 Opened 7 years ago Updated 3 months ago

constantly-playing Web Audio nodes that aren't connected to outputs can leak

Categories

(Core :: Web Audio, defect, P3)

x86_64
Linux
defect

Tracking

()

People

(Reporter: roc, Unassigned)

References

(Blocks 4 open bugs)

Details

(Keywords: memory-leak, perf, Whiteboard: [blocking-webaudio-])

See https://bugzilla.mozilla.org/show_bug.cgi?id=856361#c63 and earlier.

The setup we have right now is that nodes that can produce data on their own keep themselves alive, and keep alive all the nodes that might receive that data; i.e., a node keeps its downstream consumers alive. This works well but it fails to handle the case where a node is producing data but is not connected to any observable output --- e.g. a looping AudioBufferSourceNode that has been disconnected from the graph, and is not otherwise reachable, should be GCed.

It's unclear how big a problem this will be in practice, so maybe we don't need to worry too much about fixing it.

I think the fundamental issue is that a node needs to be kept alive if it could produce output in the future *and* its output could reach an observable destination in the future. Right now we're only testing the former. The full condition can't be implemented just using a unidirectional graph traversal of nodes.
I'm not sure I understand "observable destination in the future". Is that a audio-producing node connected into an AudioParam of a node which is connected to other nodes that eventually reach a destination node?
(In reply to Paul Adenot (:padenot) from comment #1)
> I'm not sure I understand "observable destination in the future". Is that a
> audio-producing node connected into an AudioParam of a node which is
> connected to other nodes that eventually reach a destination node?

It basically means if JS has a way to connect a node to another node/AudioParam, which implies there being a JS wrapper for the node (since the Web Audio API doesn't allow you to "find" a node once you lose your last reference to it.)

So once this happens, the mPlayingRef will be invisible to the cycle collector, so it will never be detected and collected as a cycle.  The node keeps the context alive, and the context keeps the window alive, so nothing will get freed unless you navigate away from the page, which calls Stop() on the node and that drops the self reference.

As a fix, I think we need to resurrect JSBindingFinalized().
Keywords: mlk
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #2)
> As a fix, I think we need to resurrect JSBindingFinalized().

That might be necessary, but I'm not sure. Nothing should be done here without a detailed plan and a proof of correctness, since we've managed to come up with incorrect plans several times already :-).
Hehe, ok, how about this?

The problematic case happens when we have a node with a self reference which is not connected to any other nodes, *and* does not have a JS wrapper.  We can introduce a function called ValidateSelfReference() which does something like this:

void ValidateSelfReference() {
  if (mJSBindingFinalized && mOutputNodes.IsEmpty()) {
    mSelfReference.Drop();
  }
}

Then, we override Disconnect() and JSBindingFinalized() like this:

void FooNode::Disconnect() {
  AudioNode::Disconnect();
  ValidateSelfReference();
}

void FooNode::JSBindingFinalized() {
  mJSBindingFinalized = true;
  ValidateSelfReference();
}

ValidateSelfReference's only job is to break the reference that CC does not know about.  Once that happens, the CC can kick in and correctly break the cycle.
That would help a bit, but it would mean that a looping AudioBufferSourceNode connected to a GainNode (connected to nothing), with all JS references dropped, would still leak.
(In reply to comment #5)
> That would help a bit, but it would mean that a looping AudioBufferSourceNode
> connected to a GainNode (connected to nothing), with all JS references dropped,
> would still leak.

You're right.  OK, maybe it's time to stop trying to bite the bullet and run a full-blown reachability analysis on the graph and essentially do our own GC? :(
Right now we're using GC to trace from audio sources to sinks, so that nodes that can produce data in the future are kept alive and nodes that can't, aren't. I think that's good and we should keep doing it, because it's relatively simple and handles the case we care most about well.

Maybe we could add a separate but similar periodic sweep over the audio graph (or more generally the DOM side of the MediaStream graph, maybe), identifying nodes that are not connected (indirectly) to any sinks and never will be. Any such nodes could drop their mPlayingRef SelfReference (or equivalent), allowing them to be cleaned up.

Identifying AudioNodes that can never be connected to a sink is tricky because of JS bindings. We could hack their wrapper cache again so that nodes can have their JS bindings GCed, and then test for a live JS binding. I suppose you could still have a leak where an AudioBufferSourceNode is looping and has an "ended" event handler; it would be difficult to prove that no other JS reference exists and therefore the node will never end and so the event handler doesn't matter. Maybe that's OK and we just ensure that AudioBufferSourceNode and ScriptProcessorNodes with no event handlers can have their JS bindings GCed.

I don't think we should do anything here until we encounter a page that needs it.
(In reply to comment #7)
> Right now we're using GC to trace from audio sources to sinks, so that nodes
> that can produce data in the future are kept alive and nodes that can't,
> aren't. I think that's good and we should keep doing it, because it's
> relatively simple and handles the case we care most about well.

Correct.

> Maybe we could add a separate but similar periodic sweep over the audio graph
> (or more generally the DOM side of the MediaStream graph, maybe), identifying
> nodes that are not connected (indirectly) to any sinks and never will be. Any
> such nodes could drop their mPlayingRef SelfReference (or equivalent), allowing
> them to be cleaned up.

Yeah this is sort of what I was proposing.

> Identifying AudioNodes that can never be connected to a sink is tricky because
> of JS bindings. We could hack their wrapper cache again so that nodes can have
> their JS bindings GCed, and then test for a live JS binding.

I don't think it's that tricky.  For each node type we should be able to come up with a boolean expression which tells us whether it can produce output in the future fairly easily.

> I suppose you
> could still have a leak where an AudioBufferSourceNode is looping and has an
> "ended" event handler; it would be difficult to prove that no other JS
> reference exists and therefore the node will never end and so the event handler
> doesn't matter.

I think it's not that difficult:

 bool CanProduceOutput() const {
   // AudioBufferSourceNode can only produce output in the future if
   // it's connected to other nodes, or if its JS binding is alive.
   return !mJSBindingFinalized || !mOutputNodes.IsEmpty();
 }

> Maybe that's OK and we just ensure that AudioBufferSourceNode
> and ScriptProcessorNodes with no event handlers can have their JS bindings
> GCed.

I think we can use the information that if the node is looping and it does not have a JS wrapper, it will never receive the ended event, because there is nothing that can stop it.

> I don't think we should do anything here until we encounter a page that needs
> it.

Yes, I agree.  This will be very inefficient and I really want to not do this.
Whiteboard: [blocking-webaudio-]
An approach where each node needs only to know about its immediately adjacent
connections and external references:

In principal, a node with external references or with both input and output
remains alive.

  "input" here is considered provided either by input connections or by the
  nodes ability to produce its own sound with any input connections.

  There are some special cases when nodes have event handlers: non-looping
  AudioBufferSourceNode, OscillatorNode after stop() is called, and
  ScriptProcessorNode each need to stay alive longer.

Nodes with both input and output have one or more extra "internal"
references.  An adjustment is required to deal with cycles, in the graph or
otherwise through javascript: the internal reference is traversed from, and
perhaps stored on, the nodes on input (or alternatively output) connections.

  The key thing is to inform the cycle collector of the connections, so that
  it can traverse.

  I haven't thought of any fundamental reason to choose either input or output
  nodes to store the "internal" reference, but using the input nodes would be
  more similar to the current approach.  Connections often converge at an
  AudioDestinationNode, so upstream nodes are often removed from connections
  and therefore having references stored upstream is more intuitive.  However,
  downstream nodes can be collected (though not in the current implementation)
  when, for example, unreferenced AnalyzerNodes or ScriptProcessorNodes hang
  off upstream nodes.

  If there is no input (or no output) node then these references can be stored
  on the AudioContext in mActiveNodes as they are now, which provides removal
  of references when the context is destroyed.
(In reply to Karl Tomlinson (:karlt) from comment #9)

I mean:
 
   "input" here is considered provided either by input connections or by the
   nodes ability to produce its own sound *without* any input connections.
The approach in comment 9 fails to collect a cycle with a source upstream (or a cycle with downstream connections if storing internal references on output nodes).

That takes us back to detecting when JS Bindings are no longer required, and identifying nodes not directly or indirectly connected to sinks.
Whiteboard: [blocking-webaudio-] → [blocking-webaudio-][webaudio-perf]
Keywords: perf
Whiteboard: [blocking-webaudio-][webaudio-perf] → [blocking-webaudio-]
This has the potential to be a big performance improvement, so this can be a P1 if we find any sites that are affected.
Priority: -- → P2
Blocks: 942751
P1 because this affects B2G dialer, and will affect other sites too.
Priority: P2 → P1
Assignee: nobody → paul
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #7)
> Right now we're using GC to trace from audio sources to sinks, so that nodes
> that can produce data in the future are kept alive and nodes that can't,
> aren't. I think that's good and we should keep doing it, because it's
> relatively simple and handles the case we care most about well.
> 
> Maybe we could add a separate but similar periodic sweep over the audio
> graph (or more generally the DOM side of the MediaStream graph, maybe),
> identifying nodes that are not connected (indirectly) to any sinks and never
> will be. Any such nodes could drop their mPlayingRef SelfReference (or
> equivalent), allowing them to be cleaned up.
> 
> Identifying AudioNodes that can never be connected to a sink is tricky
> because of JS bindings. We could hack their wrapper cache again so that
> nodes can have their JS bindings GCed, and then test for a live JS binding.
> I suppose you could still have a leak where an AudioBufferSourceNode is
> looping and has an "ended" event handler; it would be difficult to prove
> that no other JS reference exists and therefore the node will never end and
> so the event handler doesn't matter. Maybe that's OK and we just ensure that
> AudioBufferSourceNode and ScriptProcessorNodes with no event handlers can
> have their JS bindings GCed.

I think we should do this.

One way to do this would to periodically run a special kind of garbage collection over an AudioContext. Let's call it disconnected source collection, DSC. The AudioContext would keep track of all its AudioNodes. A DSC would scan all AudioNodes, marking those that are sinks, or those that could become sinks in the future (AudioDestinationNode, MediaStreamAudioDestinationNode, ScriptProcessorNode, nodes with JS wrappers). We also mark all sources feeding into a marked node. After that, unmarked nodes will never be able to contribute to observable output. We sweep the nodes by disconnecting them from the graph; reference counting should let us destroy them immediately without even waiting for a regular GC.

We would schedule a DSC after a JSBindingFinalized notification, and after a Disconnect call.
We also need to do something so that nodes which will never fire an event can tell their nsWrapperCache to drop the JS wrapper.
Note that DSC only needs to consider source nodes in the AudioContext's
active list.  Other nodes would be handled by regular GC/CC.
I would like to review the patch here if possible, as I'm very scared by it.  :-)
Karl points out that you can't Disconnect without a JS wrapper so we only need to track JSBindingFinalized.

I don't think we need to worry about performance here for two reasons. We already have to do work proportional to the size of the graph on a regular basis (actual audio processing), so doing DSC work periodically proportional to the size of the graph is going to be negligible --- no need for incremental schemes. Also, DSC is only going to be effective after JS GC has collected the wrappers for nodes, so running more frequently than JS GC wouldn't buy us anything.

So we should focus on making the code as simple as possible. Here's my proposal for the DSC phase.
1) AudioContext remembers the set of source nodes. Walk the graph to collect the set of sink nodes reachable from the sources.
2) Walk the graph in the reverse direction to collect the set of source nodes reachable from the sinks of step 1.
3) Any source nodes not reached in step #2 are disconnected.
(In reply to Karl Tomlinson (:karlt) from comment #16)
> Note that DSC only needs to consider source nodes in the AudioContext's
> active list.  Other nodes would be handled by regular GC/CC.

Sorry, this does not actually cover all cases.  A source or processing node
with a JS reference may not be in the AudioContext's active list, but may
still have downstream nodes that can be collected.  The approach in comment 14
collects these nodes appropriately, but I lead roc astray to suggest the approach
in comment 18.
Proposal:

Give all nodes a "sink-connected" flag.  This is initially true because the
node initially has a JS binding and nodes with JS bindings are sinks
(including potential sinks).

Define SCC as Sinkless Cycle Collection (because this is not for
collecting sources).  Start with an empty SCC candidate list.

Define handle-sinkless-nodes(root-node) as: search upstream, beginning with
this root node, terminating each branch when reaching a sink or a node with (other) output connections.  While traversing, unlink each node that does not
terminate the branch, removing it from the SCC candidate list if
sink-connected is not set.  If any branch-end nodes are not sinks and are
sink-connected, then add them to a list of SCC candidates, unset
sink-connected, and schedule SCC.

On JSBindingFinalized, handle-sinkless-nodes(this).

In SCC:

1. For each node in the candidate list:

   The node will have at least one output connection and sink-connected will
   be false.

   Search downstream, depth-first, beginning with the candidate node,
   unsetting sink-connected on each node, terminating each branch when
   reaching a sink or a node already not sink-connected.

     If a sink is found, then abort the entire downstream search.  Search
     upstream, beginning with the candidate node, setting sink-connected on
     each node, terminating each branch when reaching a node with sink-connected
     already set.

     If no sink is found on a branch, then the last node will either be another
     candidate or be connected to one or more other candidates by only
     not-sink-connected nodes.  Unset sink-connected on all nodes back to the
     last branch or current candidate, and continue the downstream search for
     a sink.

2. Add each node in the candidate list that is not sink-connected to an unlink
   list, and clear the candidate list:

3. For each node in the unlink list:

   Disconnect the node from downstream nodes.

   handle-sinkless-nodes(node).

This algorithm provides minimal searching in the case where JSBindingFinalized
does not require any unlink.  It always avoids searching the same paths for
more than one candidate.  It can become iterative if there are multiple
cycles to collect.
The SCC algorithm in the previous comment was trying to do too much in one
pass.  A separate pass from the sinks avoids searching from sink-connected
candidates again when considering other candidates, similar to the algorithm
in comment 18, with sources replaced by candidates:

In SCC:

1. For each node in the candidate list:

   The node will have at least one output connection and sink-connected will
   be false.

   Search downstream, depth-first, excluding the candidate node,
   unsetting sink-connected on each node, terminating each branch when
   seeing a sink or another node already not sink-connected.

     If a sink is found and still has sink-connected set, then add it to a
     list of sinks and abort the entire search for this candidate.  Unset the
     sink-connected flag on the sink.

     If no sink is found on a branch, then the last node on the branch will
     either be a candidate or connected to one or more candidates via paths
     of nodes that are all not sink-connected.  This means that the branch
     will either be marked sink-connected in 2, or unlinked in 4.  Continue
     the search for a sink.

2. For each sink in the sink list, search upstream, beginning with the sink
   itself, setting sink-connected on each node, terminating each branch when
   reaching a node with sink-connected already set.

3. Add each node in the candidate list that is not sink-connected to an unlink
   list, and clear the candidate list:

4. For each node in the unlink list:

   Disconnect the node from downstream nodes.

   handle-sinkless-nodes(node).

It is possible to do less searching with additional state.  When encountering
another candidate in the downstream search, that candidate has either been
searched or will need to be searched anyway, so it would be better to use the
result at that point than to just move on to searching other branches.  The
algorithm involves an additional search phase, but the use of state is
clearer.

With an additional flag "traversed" initially false for all nodes, in SCC:

1. For each node in the candidate list:

   The node will have at least one output connection.

   Search downstream, depth-first, beginning with the candidate node,
   unsetting sink-connected and setting traversed on each node, terminating
   each branch when seeing a node that is a sink or already traversed
   (including the candidate itself at the start point).

     If a sink is found or an already traversed node has sink-connected set,
     then

       Abort the entire downstream search for this candidate.

       If a sink is found that is not traversed, then mark it traversed, add
       it to a list of sinks.

       Search upstream, beginning with the node upstream from the branch end,
       setting sink-connected on each node, terminating each upstream branch
       when seeing a node with sink-connected already set or traversed not
       set.

     If the branch end is not sink-connected, then it will will either be a
     candidate or connected to one or more candidates via paths of nodes that
     are all not sink-connected.  The nodes on this branch cannot be unlinked
     yet because the branch end may be this candidate and the search for a
     sink has not completed.  Continue the search for a sink.

2. For each sink in the sink list, search upstream, beginning with the sink
   itself, unsetting traversed on each node, terminating each branch when
   reaching a node without traversed set.

3. Add to an unlink list each node in the candidate list that is not
   sink-connected, and clear the candidate list.

4. For each node in the unlink list:

   Disconnect the node from downstream nodes.

   handle-sinkless-nodes(node).
(In reply to Karl Tomlinson (:karlt) from comment #19)
> Sorry, this does not actually cover all cases.  A source or processing node
> with a JS reference may not be in the AudioContext's active list, but may
> still have downstream nodes that can be collected.  The approach in comment
> 14
> collects these nodes appropriately, but I lead roc astray to suggest the
> approach
> in comment 18.

Excellent point. However I think we might be able to find a simpler way to solve this than comment #21.
SCC isn't a great term here since it expands to Strongly Connected Components, which is a frequently used term in graph algorithms. How about NOC, No-Output Collection.

AudioContext remembers a candidate list. Every time we finalize the JS binding of a node, or disconnect one of its output connections, if it's not a sink, add it to the candidate list and schedule a NOC:
1) Walk the graph upstream from the candidate list, skipping visiting any sink nodes, adding all visited nodes to the candidate list.
2) Walk the graph downstream from the candidates to collect the set of sink nodes reachable from the candidates.
3) Walk the graph upstream from the sinks of step 2 to collect the set of nodes from which sinks are reachable.
4) For every candidate that was not visited in step 3, disconnect the node and clear its self-reference if it has one. (Such nodes will have refcount 0 and will be destroyed immediately.)
5) Clear the candidate list.
(of course)
We can make that a lot more efficient with a very simple change to step 3:
3) Walk the graph upstream from the sinks of step 2, skipping visiting of any nodes not visited in step 2, to collect the set of nodes from which sinks are reachable.
Moving to Roc per our meeting yesterday.
Assignee: paul → roc
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #23)
> AudioContext remembers a candidate list. Every time we finalize the JS
> binding of a node, or disconnect one of its output connections, if it's not
> a sink, add it to the candidate list and schedule a NOC:

Karl pointed out that we only need to add to the candidate list when we finalize the JS binding of a node. For, if we've disconnected one of its output connections, the node must remain alive anyway until the JS binding is finalized.
Note, all these traversals have to account for node connections that go indirectly through an AudioParam as well as direct connections.
No longer depends on: 812617
I have WIP patches for this bug, but I can't test them until we do something to enable JSBindingFinalized to actually be called. See bug 812617.
To completely fix this we need separate objects to determine observability from JS and observability of generation and propagation of signals.

The initial plan was that the JS wrapper was one object and the C++ AudioNode was the other, but bug 812617 comment 42 explains that EventTarget is not currently implementation in a way that makes that separation.

Changing EventTarget may be one option, but another is to make the separation elsewhere.

We already have a separation between AudioNodes and their MediaStreams, so it may be a sensible to use that:

* AudioNodes determine observability from JS and hold their MediaStreams alive
  (if the streams can have inputs at least).

* MediaStreams can own connected MediaStreams and determine observability
  through signal generation, propagation, and output to an owning AudioNode.

* MediaStreams never own an AudioNode, so any cycles in unowned MediaStreams
  can be collected on the graph thread.
That is a great idea ... I think.

Currently the main thread controls the lifetime of all MediaStreams. We can change that so that the main thread can hand control of a MediaStream to the graph thread. The graph thread would then be responsible for destroying the stream as soon as it cannot ever produce new output or cannot ever be connected (indirectly) to a sink. The main thread would be prohibited from ever mentioning such streams in its messages, so the graph thread can be sure no new InputPort will have a released stream at either end.

Then, AudioNodes would strongly own their MediaStreams but not each other. We would not need to track tail-time references on the main thread or "active nodes". We would need to keep AudioNodes alive as long as the MediaStream (via the AudioNodeEngine) might cause an event to be fired on the MediaStream and there is a listener. Otherwise an AudioNode would be destroyed normally when its JS wrapper becomes unreachable. At that point we hand ownership of its MediaStream over to the graph.

Need to think about this a bit. It's going to be quite a bit of work and I'm not sure I want to sign up to do it right away.
I talked to smaug and mccr8 about this at the DOM work week today.  We may be able to use the forget skippable facilities in the cycle collector.  The basic idea is that we do our analysis along the lines of comment 23 once when the forget skippable starts to run, and we remove the nodes that need to be kept alive from the purple buffer, and for the nodes that we determine they can be killed, we can just go ahead and unbind them from the strong refs that keep them alive.

We do a similar analysis for DOM nodes already in FragmentOrElement::CanSkip, and a list of nodes that can be unbound is generated there and then we use ContentUnbinder to unbind them all from the tree, and all of this happens without the cycle collector needing to run.

I am fairly convinced that we might be able to make it work, but I don't claim to understand all of the details here.  It would be nice to check the details with smaug/mccr8 if you think this makes sense.
See Also: → 974089
Duplicate of this bug: 1092819
Priority: P1 → P2
Blocks: 934512
Assignee: roc → nobody
Depends on: 1281001
Blocks: 1281001
No longer depends on: 1281001
Blocks: 1271425
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
Blocks: 1572524
You need to log in before you can comment on or make changes to this bug.