1189562 - Web Audio CPU usage can climb during long waits for garbage collection

Reporter

Description

•

9 years ago

+++ This bug was initially created as a clone of Bug #974089 +++

Each Web Audio node (except Oscillator and AudioBufferSource) will consume cpu
for as long as it lives.  Some pages create many nodes and use them briefly
before creating more similar nodes.

I assume GC is tuned for memory pressure, not cpu resources.
I think we need to stop depending on GC for cleaning up unnecessary streams.
Bug 974089 addressed this for one-short source nodes.

For others nodes, which can be reconnected, this may involve nodes recreating
streams after they are destroyed.  Another option might be to remove the
stream from the graph and add later if required.

This is a variation on bug 897796 comment 30:

1. Passive streams without input are destroyed.
   This includes streams with a node.

2. Newly connected source nodes will propagate (re)creation of streams for
   their output nodes.

3. Streams without node or output are destroyed.
   This includes active streams.

4. Occasional cycle collection on the graph collects cycles without nodes or
   outputs to nodes (sinks), and cycles without inputs from active streams.

1 and 2 can be implemented separately in this bug and 3 and 4 in bug 897796.

Karl Tomlinson (:karlt)

Reporter

Comment 1

•

9 years ago

STR:

1. Load http://webaudiodemos.appspot.com/MIDIDrums/index.html
3. Wait for 10 or 20 seconds (for GCs to finish).
4. Click the play button with the triangle icon.
5. Observe CPU use of the audio thread ("threaded-ml" with PA) over a minute
   (If on a platform supporting "top", type 'H' for threads.)

Desired results:
CPU use remains consistent.

Actual results:
CPU use climbs for about a minute and then falls back and starts climbing again.
Here, it may climb to close to 100% before falling back to 45%.

Forcing CC from a preloaded about:memory in another window collects no objects
and has no effect on cpu.
Forcing GC causes CC to collect objects and the cpu usage falls.

Maire Reavy [:mreavy]

Updated

•

9 years ago

Rank: 5

Karl Tomlinson (:karlt)

Reporter

Updated

•

9 years ago

Blocks: 1172181

Johnny Stenback (:jst)

Comment 2

•

9 years ago

Cc:ing :mccr8 so he can keep an eye on this from the CC/GC end of things.

Karl Tomlinson (:karlt)

Reporter

Updated

•

9 years ago

URL: https://dl.dropboxusercontent.com/u/4... → http://webaudiodemos.appspot.com/MIDI...

Karl Tomlinson (:karlt)

Reporter

Updated

•

9 years ago

Depends on: 1205540

Karl Tomlinson (:karlt)

Reporter

Updated

•

9 years ago

Depends on: 1205558

Comment hidden (obsolete)

On http://webaudiodemos.appspot.com/MIDIDrums/index.html
increasing Tempo to 180 BPM and clicking play, while there are
only 1000 nodes perf top -t <msg-thread> indicates that biggest cpu consumers
are for convolution and resampling:

    10.13%  liblgpllibs.so  [.] rdft_calc_c                            
     7.26%  libxul.so       [.] moz_speex_interpolate_product_single   
     7.06%  libxul.so       [.] resampler_basic_interpolate_single     
     4.85%  libxul.so       [.] cubic_coef                             
     4.85%  libxul.so       [.] WebCore::DirectConvolver::process      
     4.82%  liblgpllibs.so  [.] ff_fft_permute_sse.loop                
     4.67%  libxul.so       [.] mozilla::AudioBufferAddWithScale       
     3.95%  libxul.so       [.] mozilla::DelayBuffer::ReadChannels     
     3.95%  libxul.so       [.] mozilla::BufferComplexMultiply         
     3.71%  liblgpllibs.so  [.] pass_sse.loop                          
     3.67%  libxul.so       [.] WebCore::FFTConvolver::process         
     3.62%  libxul.so       [.] mozilla::FFTBlock::GetInverseWithoutSca
     2.78%  liblgpllibs.so  [.] fft16_sse                              
     1.99%  libxul.so       [.] mozilla::FFTBlock::PerformFFT          
     1.92%  liblgpllibs.so  [.] pass_interleave_sse.loop               
     1.83%  libm-2.20.so    [.] __powf_finite                          
     1.25%  libxul.so       [.] mozilla::AudioNodeStream::ObtainInputBl
     1.11%  liblgpllibs.so  [.] fft8_sse                               
     1.09%  libxul.so       [.] mozilla::AudioNodeStream::ProcessInput 
     1.08%  libm-2.20.so    [.] __logf_finite                          
     1.07%  libxul.so       [.] WebCore::ReverbConvolverStage::process 
     0.99%  libxul.so       [.] WebCore::ZeroPole::process             
     0.88%  libxul.so       [.] WebCore::DynamicsCompressorKernel::proc
     0.81%  libxul.so       [.] moz_speex_resampler_set_rate_frac      
     0.80%  libxul.so       [.] moz_speex_resampler_process_float      

When the number of nodes reaches 15000, looping over inactive nodes makes up
big parts of cpu time.

    11.31%  libxul.so       [.] mozilla::AudioNodeStream::ProcessInput 
     9.23%  libxul.so       [.] mozilla::AudioNodeStream::ObtainInputBl
     7.71%  libxul.so       [.] mozilla::MediaStreamGraphImpl::UpdateSt
     5.04%  libxul.so       [.] mozilla::MediaStreamGraphImpl::ProduceD
     4.69%  liblgpllibs.so  [.] rdft_calc_c                            
     3.31%  libxul.so       [.] moz_speex_interpolate_product_single   
     3.21%  libxul.so       [.] resampler_basic_interpolate_single     
     3.05%  libxul.so       [.] mozilla::MediaStreamGraphImpl::NotifyHa
     2.91%  libxul.so       [.] mozilla::MediaStreamGraphImpl::CreateOr
     2.52%  liblgpllibs.so  [.] ff_fft_permute_sse.loop                
     2.36%  libxul.so       [.] mozilla::AudioBufferAddWithScale       
     2.27%  libxul.so       [.] WebCore::DirectConvolver::process      
     2.20%  libxul.so       [.] cubic_coef                             
     2.13%  libxul.so       [.] WebCore::FFTConvolver::process         
     1.94%  libxul.so       [.] mozilla::MediaStreamGraphImpl::UpdateCu
     1.87%  libxul.so       [.] mozilla::BufferComplexMultiply         
     1.80%  libxul.so       [.] mozilla::DelayBuffer::ReadChannels     
     1.77%  libxul.so       [.] mozilla::FFTBlock::GetInverseWithoutSca
     1.75%  liblgpllibs.so  [.] pass_sse.loop                          
     1.66%  libxul.so       [.] mozilla::AudioNodeStream::MainThreadNee
     1.60%  libxul.so       [.] mozilla::ProcessedMediaStream::AsProces
     1.59%  libxul.so       [.] mozilla::StreamBuffer::ForgetUpTo      
     1.51%  libxul.so       [.] mozilla::MediaStreamGraphImpl::StreamSe
     1.24%  liblgpllibs.so  [.] fft16_sse                              
     1.09%  libxul.so       [.] mozilla::FFTBlock::PerformFFT          

CPU usage of the MSG thread:
49% at 500 nodes.
57% at 6000 nodes.
67% at 10000 nodes.
80% at 15000 nodes.
90% at 19000 nodes.
(GC sweeps.)

By suspending inactive AudioNodeStreams, 65% of the CPU time spent on inactive
nodes is removed:
49% at 500 nodes.
52% at 6000 nodes.
55% at 10000 nodes.
60% at 15000 nodes.
65% at 20000 nodes.

The distribution at 15000 nodes becomes:

    15.14%  libxul.so       [.] mozilla::AudioNodeStream::ObtainInputBl
     6.80%  liblgpllibs.so  [.] rdft_calc_c                            
     6.65%  libxul.so       [.] mozilla::MediaStreamGraphImpl::UpdateCu
     4.84%  libxul.so       [.] moz_speex_interpolate_product_single   
     4.35%  libxul.so       [.] resampler_basic_interpolate_single     
     3.33%  libxul.so       [.] mozilla::AudioBufferAddWithScale       
     3.33%  liblgpllibs.so  [.] ff_fft_permute_sse.loop                
     3.11%  libxul.so       [.] cubic_coef                             
     2.99%  libxul.so       [.] WebCore::FFTConvolver::process         
     2.82%  libxul.so       [.] WebCore::DirectConvolver::process      
     2.62%  liblgpllibs.so  [.] pass_sse.loop                          
     2.57%  libxul.so       [.] mozilla::AudioNodeStream::MainThreadNee
     2.54%  libxul.so       [.] mozilla::MediaStreamGraphImpl::StreamSe
     2.48%  libxul.so       [.] mozilla::DelayBuffer::ReadChannels     
     2.48%  libxul.so       [.] mozilla::FFTBlock::GetInverseWithoutSca
     2.39%  libxul.so       [.] mozilla::BufferComplexMultiply         
     2.24%  libxul.so       [.] mozilla::MediaStreamGraphImpl::UpdateSt
     2.17%  libxul.so       [.] mozilla::StreamBuffer::ForgetUpTo      
     2.04%  libxul.so       [.] mozilla::MediaStreamGraphImpl::PrepareU
     1.60%  liblgpllibs.so  [.] fft16_sse                              
     1.46%  libxul.so       [.] mozilla::FFTBlock::PerformFFT          
     1.27%  liblgpllibs.so  [.] pass_interleave_sse.loop               
     1.24%  libm-2.20.so    [.] __powf_finite                          
     0.94%  libxul.so       [.] WebCore::ReverbConvolverStage::process 
     0.82%  libxul.so       [.] mozilla::MediaStreamGraphImpl::UpdateGr

Karl Tomlinson (:karlt)

Reporter

Updated

•

9 years ago

Depends on: 1217625

Karl Tomlinson (:karlt)

Reporter

Updated

•

8 years ago

Depends on: 1172997

Karl Tomlinson (:karlt)

Reporter

Comment 4

•

8 years ago

The cpu usage of inactive nodes waiting for GC is now only a fraction of what
it was when this bug was filed, but it can still become significant if
nodes are created rapidly enough.  Increasing 
Tempo to 180 BPM on http://webaudiodemos.appspot.com/MIDIDrums/index.html is 
one way to do this.

The approach used for bug 1217625 et al differs from what was outlined in
comment 0 in that streams are retained on inactive nodes but suspended.  This
avoids complications of passing state from AudioNode to AudioNodeEngine
differently when there is no stream.

There remain a few places where iterating over suspended streams should be
skipped.  Bug 1172997 covers the biggest CPU consumer.

To avoid iterating over all suspended streams to check whether any have finished,
I think it may be best to remove the main thread finished notifications code
in PrepareUpdatesToMainThreadState() and instead add to mStreamUpdates or
mUpdateRunnables more directly, perhaps via RunMessageAfterProcessing if
required.  This means that only streams that change state will be examined.

Karl Tomlinson (:karlt)

Reporter

Updated

•

8 years ago

Depends on: 1274797

Comment hidden (typo)

I'll have a look to see what can be done quickly.

61079e59ef35 may make things worse, but there was already a problem.
https://crash-stats.mozilla.com/signature/?signature=OOM+%7C+large+%7C+mozalloc_abort+%7C+mozalloc_handle_oom+%7C+moz_xrealloc+%7C+nsTArray_base%3CT%3E%3A%3AEnsureCapacity+%7C+mozilla%3A%3AMediaStreamGraphImpl%3A%3APrepareUpdatesToMainThreadState
is similar and happening on older branches without 61079e59ef35, including 38
esr and 40.

OOMAllocationSize is typically 1, 2, 4, 8 MB, but is close to 100MB in
https://crash-stats.mozilla.com/report/index/431e8bc1-0d1d-4f92-b64a-e23572160523#allthreads

Each element in the array would be 16 bytes, so I suspect there are many more
updates than streams, as if the main thread is not emptying the queue.  The
first few reports I looked at supported that hypothesis, with the main thread
in GC or CC.

We may need to detect such a situation and wait for the main thread to catch
up, but we should also be able to reduce the number of updates being sent.

Maire Reavy [:mreavy]

Comment 6

•

8 years ago

As Karl says in Comment 4, we've improved things a lot ("The cpu usage of inactive nodes waiting for GC is now only a fraction of what it was").  So I'm lowering the rank from 5 to 17.

Karl -- IIUC you have other work that you need to do so I'm unassigning you as owner.  Feel free to take this back if you have the time and plan to work on this

Assignee: karlt → nobody

Status: ASSIGNED → NEW

Rank: 5 → 17

Bulk Bug Changes for mreavy's org

Comment 7

•

7 years ago

Mass change P1->P2 to align with new Mozilla triage process

Priority: P1 → P2

Sylvestre Ledru [:Sylvestre]

Comment 8

•

6 years ago

Moving to p3 because no activity for at least 1 year(s).
See https://github.com/mozilla/bug-handling/blob/master/policy/triage-bugzilla.md#how-do-you-triage for more information

Priority: P2 → P3

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Bugzilla

Quick Search

Web Audio CPU usage can climb during long waits for garbage collection

Categories

(Core :: Web Audio, defect, P3)

Tracking

()

People

(Reporter: karlt, Unassigned)

References

(Blocks 1 open bug,
URL
)

Details

(Keywords: perf)

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Comment 2

Updated

Updated

Updated

Comment 3

Updated

Updated

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Updated