Create Web Audio benchmarks

RESOLVED FIXED in Firefox 25

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: roc, Assigned: padenot)

Tracking

unspecified
mozilla26
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox25 fixed, firefox26 fixed)

Details

(Whiteboard: [qa-])

Attachments

(9 attachments, 2 obsolete attachments)

3.10 KB, patch
roc
: review+
justin.lebar+bug
: review+
Details | Diff | Splinter Review
2.68 KB, patch
roc
: review+
Details | Diff | Splinter Review
7.26 KB, patch
Details | Diff | Splinter Review
5.05 KB, patch
Ehsan
: review+
Details | Diff | Splinter Review
2.57 KB, patch
BenWa
: review+
Details | Diff | Splinter Review
4.15 MB, application/x-gzip
Details
6.00 KB, patch
Ehsan
: review+
Details | Diff | Splinter Review
7.26 KB, patch
Details | Diff | Splinter Review
422.77 KB, image/png
Details
Before we do much optimization work we should create some Web Audio benchmarks so we can measure our progress.

I think a pretty good way to do this would be to use OfflineAudioContext. We can create a node graph and then measure how long it takes to run the graph to produce some fixed number of samples. That won't handle latency but I think most of our WebAudio-specific optimizations will be targeting throughput.

It would be nice to have some tests in the tree, but tests we can run manually would be a great start. Bonus points if we can run them on other browsers.
Assignee

Comment 1

6 years ago
Taking, I already have some code.
Assignee: nobody → paul

Comment 2

6 years ago
Paul, I suggest you touch base with Chris Rogers about this.  He might already have something like this...  If not, I'm sure he's going to be interested in the result of our work here.  Thanks!
Assignee

Comment 3

6 years ago
Ha, did not see your comment, on time, so I started to write a toy webpage that dose some trivial benchmark [1], and I have bad news: when using the OfflineAudioContext in current Nightly, we are actually doing the work slower than in real time (14 seconds for 10 seconds of audio). I haven't had a chance to look into the cause yet.

[1]: https://github.com/padenot/webaudio-benchmark

Comment 4

6 years ago
After a quick profile, we're spending about 60% of our time in UpdateStreamOrder, which is almost all either realloc() or free().  :(

Also, the benchmark doesn't work in Chrome, unless they really finish their processing in 1ms.  ;-)

Comment 5

6 years ago
Also, we spend another 20-ish% in PrepareUpdatesToMainThreadState, most of which is, wait for it, realloc().

This is all completely unnecessary overhead which we should be able to get rid of completely.  The actual procesing time in my profile is 12ms, out of 12996ms.

Thia makes me very sad.

Comment 6

6 years ago
(In reply to comment #5)
> Also, we spend another 20-ish% in PrepareUpdatesToMainThreadState, most of
> which is, wait for it, realloc().
> 
> This is all completely unnecessary overhead which we should be able to get rid
> of completely.  The actual procesing time in my profile is 12ms, out of
> 12996ms.

For stream ordering at least, we should be able to set a dirty flag when something happens that actually affects stream ordering, and make UpdateStreamOrder a no-op otherwise.
PrepareUpdatesToMainThread state is going to be trickier. We can at least not allocate. We should also be able to flag the MediaStreams whose currentTime and other main-thread state can be queried; for AudioNodes (other than Destination), they can't. Then those can be skipped.

Updated

6 years ago
Depends on: 883010

Comment 8

6 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #7)
> PrepareUpdatesToMainThread state is going to be trickier. We can at least
> not allocate. We should also be able to flag the MediaStreams whose
> currentTime and other main-thread state can be queried; for AudioNodes
> (other than Destination), they can't. Then those can be skipped.

I believe not allocating is enough.  At least, we should first do that and then re-measure.

Updated

6 years ago
Depends on: 883011
Assignee

Comment 9

6 years ago
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #4)
> After a quick profile, we're spending about 60% of our time in
> UpdateStreamOrder, which is almost all either realloc() or free().  :(
> 
> Also, the benchmark doesn't work in Chrome, unless they really finish their
> processing in 1ms.  ;-)

Yes, they output a silent buffer for some reason.

Comment 10

6 years ago
(In reply to comment #9)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #4)
> > After a quick profile, we're spending about 60% of our time in
> > UpdateStreamOrder, which is almost all either realloc() or free().  :(
> > 
> > Also, the benchmark doesn't work in Chrome, unless they really finish their
> > processing in 1ms.  ;-)
> 
> Yes, they output a silent buffer for some reason.

Have you tried with the sampling rate of 44.1KHz?
Assignee

Comment 11

6 years ago
Yes, they throw an exception when you do that.

Comment 12

6 years ago
(In reply to comment #11)
> Yes, they throw an exception when you do that.

WTF?!  Please file a bug on that.
Assignee

Comment 13

6 years ago
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #12)
> (In reply to comment #11)
> > Yes, they throw an exception when you do that.
> 
> WTF?!  Please file a bug on that.

I looked into it a bit more after having written this comment, and it appears to be known, and a fix is coming [1].

[1]: http://code.google.com/p/chromium/issues/detail?id=168216#c5

Comment 14

6 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #7)
> PrepareUpdatesToMainThread state is going to be trickier. We can at least
> not allocate. We should also be able to flag the MediaStreams whose
> currentTime and other main-thread state can be queried; for AudioNodes
> (other than Destination), they can't. Then those can be skipped.

Hmm, then maybe we can differentiate those based on their AudioNodeStreamKind.  Should be very easy in fact!

Updated

6 years ago
Depends on: 884632

Updated

6 years ago
Blocks: 885496
Assignee

Comment 15

6 years ago
I could not find a method that did what I wanted on nsTArray, maybe it is hidden
somewhere else? Anyway, this obviously makes alloc/free disappear from the
profiles. We probably want to shrink the array after some time, but I haven't
done so in the patch, I need to write some instrumentation to characterize what is
happening here for some common use cases.
Attachment #770302 - Flags: review?(roc)
Assignee

Comment 16

6 years ago
This is obviously the biggest and most embarrassing win. This brings us in the
same ballpark as blink and webkit (we are two or three times slower when running
my testcase at that point, but we resample, and they don't, because the file I'm
using in the testcase has a samplerate of 44100Hz, and we run our context at
48000Hz, and they run their context at 44100Hz).
Attachment #770307 - Flags: review?(roc)
Assignee

Comment 17

6 years ago
This should give us better cache locality. I haven't profiled with and without,
nor have tried to tweak the factor.
Assignee

Comment 18

6 years ago
I'm not sure about this one, but it seems to work fine on my testcases, and it
seriously reduces the main thread CPU usage.
Assignee

Comment 19

6 years ago
Anyway, when all those are landed, we can start talking about SSE and friends, I believe :-).

Comment 20

6 years ago
Comment on attachment 770302 [details] [diff] [review]
Retain storage for media streams accross iterations instead of reallocating. r=

Review of attachment 770302 [details] [diff] [review]:
-----------------------------------------------------------------

Justin, can you please look over this new nsTArray API?
Attachment #770302 - Flags: review?(justin.lebar+bug)

Comment 21

6 years ago
Comment on attachment 770316 [details] [diff] [review]
Don't spam the main thread when rendering an offline graph. r=

Review of attachment 770316 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaStreamGraph.cpp
@@ +910,5 @@
>  {
>    mMonitor.AssertCurrentThreadOwns();
>  
> +  // We don't want to update the main thread about timing update when we are not
> +  // running in realtime.

This prevents OfflineAudioContext.currentTime from making progress, right?  That violates the spec (and I believe we have a test somewhere which should turn red with this patch.)
Comment on attachment 770302 [details] [diff] [review]
Retain storage for media streams accross iterations instead of reallocating. r=

Okay, but please be careful to compact these arrays eventually if they're not tiny.  :)

Could you add a comment on the method saying what it does, and in particular call out the need to Compact() eventually, unless you know the array is small or short-lived?

Also, please add one of these two checks

  if (mHdr == EmptyHdr()) return;

or

  if (mHdr->mLength == 0) return;

Otherwise if ClearRetainStorage() is called on the empty header, we'll write 0 into its length.  Which is fine, but it makes helgrind complain.
Attachment #770302 - Flags: review?(justin.lebar+bug) → review+
Comment on attachment 770302 [details] [diff] [review]
Retain storage for media streams accross iterations instead of reallocating. r=

Review of attachment 770302 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaStreamGraphImpl.h
@@ +535,5 @@
>     * True when a non-realtime MediaStreamGraph has started to process input.  This
>     * value is only accessed on the main thread.
>     */
>    bool mNonRealtimeProcessing;
> +  nsTArray<nsRefPtr<MediaStream> > mOldStreams;

Move this up next to some other pointer field. Also, give it a comment to explain what it's for.

::: xpcom/glue/nsTArray.h
@@ +1042,5 @@
>  
>    //
>    // Mutation methods
>    //
> +  void ClearRetainStorage() {

ClearAndRetainStorage()
Attachment #770302 - Flags: review?(roc) → review+
> ClearAndRetainStorage()

Ah, yes.  Good call.
Comment on attachment 770307 [details] [diff] [review]
Actually run offline MSG offline. r=

Review of attachment 770307 [details] [diff] [review]:
-----------------------------------------------------------------

Ahahahaha. r+ with the following comments fixed.

::: content/media/MediaStreamGraph.cpp
@@ +323,5 @@
> +    mCurrentTimeStamp = now;
> +    LOG(PR_LOG_DEBUG+1, ("Updating current time to %f (real %f, mStateComputedTime %f)",
> +          MediaTimeToSeconds(nextCurrentTime),
> +          (now - mInitialTimeStamp).ToSeconds(),
> +          MediaTimeToSeconds(mStateComputedTime)));

I think this LOG should happen in both the realtime and non-realtime cases.

@@ +326,5 @@
> +          (now - mInitialTimeStamp).ToSeconds(),
> +          MediaTimeToSeconds(mStateComputedTime)));
> +  } else {
> +    prevCurrentTime = mCurrentTime;
> +    nextCurrentTime = mCurrentTime + AUDIO_TARGET_MS;

I think you should use MEDIA_GRAPH_TARGET_PERIOD_MS here.
Attachment #770307 - Flags: review?(roc) → review+
Comment on attachment 770315 [details] [diff] [review]
Use bigger chunks when using an offline graph. r=

Review of attachment 770315 [details] [diff] [review]:
-----------------------------------------------------------------

Switching to MEDIA_GRAPH_TARGET_PERIOD_MS in the previous patch means that this patch should turn MEDIA_GRAPH_TARGET_PERIOD_MS into a function instead. Call it MediaGraphTargetPeriod().
Comment on attachment 770316 [details] [diff] [review]
Don't spam the main thread when rendering an offline graph. r=

Review of attachment 770316 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaStreamGraph.cpp
@@ +910,5 @@
>  {
>    mMonitor.AssertCurrentThreadOwns();
>  
> +  // We don't want to update the main thread about timing update when we are not
> +  // running in realtime.

Yeah, I don't think we should do this.

An alternative is for OfflineAudioContext to send an update to the main thread every time a certain amount of *real* time has passed.
(In reply to Paul Adenot (:padenot) from comment #16)
> This is obviously the biggest and most embarrassing win. This brings us in
> the
> same ballpark as blink and webkit (we are two or three times slower when
> running
> my testcase at that point, but we resample, and they don't, because the file
> I'm
> using in the testcase has a samplerate of 44100Hz, and we run our context at
> 48000Hz, and they run their context at 44100Hz).

I think it would be a good idea to use an odd sample rate for your test files, so that everyone has to resample no matter what platform the test runs on. Would suck to have a change in our sample rate make a big difference to benchmark results.

Of course, Blink/Webkit will still have an edge since they're still using linear resampling (right?).
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #25)
> @@ +326,5 @@
> > +          (now - mInitialTimeStamp).ToSeconds(),
> > +          MediaTimeToSeconds(mStateComputedTime)));
> > +  } else {
> > +    prevCurrentTime = mCurrentTime;
> > +    nextCurrentTime = mCurrentTime + AUDIO_TARGET_MS;
> 
> I think you should use MEDIA_GRAPH_TARGET_PERIOD_MS here.

This means that AUDIO_TARGET_MS and VIDEO_TARGET_MS will also need to become functions. Might as well make all those constants into functions.
Comment on attachment 770307 [details] [diff] [review]
Actually run offline MSG offline. r=

>   }

>+    if (mStateComputedTime < nextCurrentTime) {
>+      LOG(PR_LOG_WARNING, ("Media graph global underrun detected"));
>+      nextCurrentTime = mStateComputedTime;
>+    }
> 
>   if (prevCurrentTime >= nextCurrentTime) {

A few too many U+0020 wasted :).
Assignee

Comment 31

6 years ago
> I think it would be a good idea to use an odd sample rate for your test
> files, so that everyone has to resample no matter what platform the test
> runs on. Would suck to have a change in our sample rate make a big
> difference to benchmark results.

Good idea, I'll do that.

> Of course, Blink/Webkit will still have an edge since they're still using
> linear resampling (right?).

Their code did not look like they optimized it much, so we'll see. We will certainly have better sound quality, though.

iirc, there is code to use SIMD in the speex resampler. Considering how much we use this resampler, these days (with more usage coming, especially in WebRTC code), I think we should consider building it.
(In reply to Paul Adenot (:padenot) from comment #31)
> iirc, there is code to use SIMD in the speex resampler. Considering how much
> we use this resampler, these days (with more usage coming, especially in
> WebRTC code), I think we should consider building it.

Definitely!
Assignee

Comment 33

6 years ago
This brings a 60x improvement, and I still get the expected result. I need to
investigate more to check what is going on.
Assignee

Updated

6 years ago
Attachment #770315 - Attachment is obsolete: true
Assignee

Comment 34

6 years ago
This gives a 2x speedup on clock wall time, and frees up a core (because it is
not spinning to process the MSG messages).
Attachment #770888 - Flags: review?(roc)
Assignee

Updated

6 years ago
Attachment #770316 - Attachment is obsolete: true
Assignee

Comment 35

6 years ago
This registers the MSG thread to the SPS profiler so I can profile on other
platforms than MacOS/Instruments.
Attachment #770890 - Flags: review?(bgirard)
Comment on attachment 770890 [details] [diff] [review]
Register the MSG thread for in the profiler. r=

Review of attachment 770890 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaStreamGraph.cpp
@@ +1274,5 @@
> +  }
> +  NS_IMETHOD Run()
> +  {
> +    char aLocal;
> +    profiler_register_thread("MSG", &aLocal);

Ideally we can get a more descriptive name then MSG. This is shown in the UI and can be used to select which threads you want to sample.

@@ +1276,5 @@
> +  {
> +    char aLocal;
> +    profiler_register_thread("MSG", &aLocal);
> +    mGraph->RunThread();
> +    return NS_OK;

We need to unregister the thread before returning to make sure we don't sample a thread that is dead.
Attachment #770890 - Flags: review?(bgirard) → review+

Comment 37

6 years ago
(In reply to comment #27)
> Comment on attachment 770316 [details] [diff] [review]
>   --> https://bugzilla.mozilla.org/attachment.cgi?id=770316
> Don't spam the main thread when rendering an offline graph. r=
> 
> Review of attachment 770316 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: content/media/MediaStreamGraph.cpp
> @@ +910,5 @@
> >  {
> >    mMonitor.AssertCurrentThreadOwns();
> >  
> > +  // We don't want to update the main thread about timing update when we are not
> > +  // running in realtime.
> 
> Yeah, I don't think we should do this.
> 
> An alternative is for OfflineAudioContext to send an update to the main thread
> every time a certain amount of *real* time has passed.

Is this even an issue these days?  I thought that I had fixed most of this.

I'd love to see a profile with all of the patches here except for this one applied.

Comment 38

6 years ago
(In reply to comment #28)
> Of course, Blink/Webkit will still have an edge since they're still using
> linear resampling (right?).

Yes.

Comment 39

6 years ago
Comment on attachment 770888 [details] [diff] [review]
Don't spam the main thread when rendering an offline graph. r=

Why are you using the wall clock time?  That seems wrong.
Attachment #770888 - Flags: review?(roc) → review-
Assignee

Comment 40

6 years ago
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #37)
> (In reply to comment #27)
> Is this even an issue these days?  I thought that I had fixed most of this.
> 
> I'd love to see a profile with all of the patches here except for this one
> applied.

Attached are two profiles, gzipped: with-patch.trace and without-patch.trace (with-patch being a profile of a build with the "don't spam the main thread" patch, and without-patch being a profile of a build without this patch. Both build have all the other patches applied, opt-debug build). Those are made using Instruments on MacOs.

I see a 1.5x to 2.5x speedup _with_ the patch applied. Anyways, we should not even try to update the main thread each iteration when we are running offline.

Comment 41

6 years ago
Hmm, in the without-patch profile we're spending ~5% of the audio thread's time in PrepareUpdatesToMainThreadState.  But I guess this profile can be tricky since it's not easy to correlate this 5% time to the amount of time that we lock the main thread.  But still I'm quite surprised that this patch results in that much speedup...  But if that's the case, then ok, let's take this.  (I still believe that we should not make it look at the wall clock time.)

BTW, what's this OffTheBooksMutex stuff in the profile?  I can't see where we use this in the MSG code...
Assignee

Comment 42

6 years ago
http://hg.mozilla.org/mozilla-central/rev/56f89cd46261, used by the monitors, apparently, so we still do too much locking.

I'm trying to remove some allocations in AudioNodeStream::ProduceOutput at the moment. We have way too much MSG overhead at the moment to be able to feel the difference when using the SSE2 and NEON implementations we are adding.

Comment 43

6 years ago
(In reply to comment #42)
> http://hg.mozilla.org/mozilla-central/rev/56f89cd46261, used by the monitors,
> apparently, so we still do too much locking.

Huh, do you have --enable-debug in your mozconfig?  That would be misleading, you want --enable-debug-symbols if all you need is symbols for your build.

> I'm trying to remove some allocations in AudioNodeStream::ProduceOutput at the
> moment. We have way too much MSG overhead at the moment to be able to feel the
> difference when using the SSE2 and NEON implementations we are adding.

Agreed!
Assignee

Comment 44

6 years ago
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #39)
> Comment on attachment 770888 [details] [diff] [review]
> Don't spam the main thread when rendering an offline graph. r=
> 
> Why are you using the wall clock time?  That seems wrong.

Well, this is what roc suggests in comment 27. The two solutions here are to send the event once we have rendered _n_ chunks, or time based (every x milliseconds). Time based has the advantage of sending the messages at a rate that does not depend on the machine speed. The other way as the advantage of not querying the system clock but this should not be expensive. Thoughts ?
Flags: needinfo?(ehsan)

Comment 45

6 years ago
(In reply to Paul Adenot (:padenot) from comment #44)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #39)
> > Comment on attachment 770888 [details] [diff] [review]
> > Don't spam the main thread when rendering an offline graph. r=
> > 
> > Why are you using the wall clock time?  That seems wrong.
> 
> Well, this is what roc suggests in comment 27. The two solutions here are to
> send the event once we have rendered _n_ chunks, or time based (every x
> milliseconds). Time based has the advantage of sending the messages at a
> rate that does not depend on the machine speed. The other way as the
> advantage of not querying the system clock but this should not be expensive.
> Thoughts ?

Hmm, I see.  That's a good point.  If querying the system time with this patch doesn't show up in your profile, then I'm fine with this.
Flags: needinfo?(ehsan)
Using the system clock means that the overhead of sending the events is roughly constant, which is good.
Attachment #770888 - Flags: review- → review?(ehsan)

Updated

6 years ago
Attachment #770888 - Flags: review?(ehsan) → review+
Assignee

Comment 47

6 years ago
This code was making a lot of malloc/free in my benchmarks. Using a linked list
is way better in terms of perf (benchmarks are faster). I'll attach a profile
tomorrow.
Attachment #777939 - Flags: review?(ehsan)

Comment 48

6 years ago
Comment on attachment 777939 [details] [diff] [review]
Use a linked list for ordering stream instead of an array. r=

Review of attachment 777939 [details] [diff] [review]:
-----------------------------------------------------------------

Fine by me!
Attachment #777939 - Flags: review?(ehsan) → review+
If we're hitting UpdateStreamOrderForStream a lot, we should probably investigate why.

For example, if we're creating and destroying a lot of streams with no inputs (like AudioBufferSourceNodes), we could avoid walking the entire stream graph, since we know these nodes can't be part of a cycle. In particular when we add one of those streams, we can always add it at the beginning of the list of streams.
Assignee

Comment 50

6 years ago
I've benchmarked this, and tried to find a good value, 600 is nice.
Attachment #778384 - Flags: review?(roc)
Comment on attachment 778384 [details] [diff] [review]
Use bigger chunks when using an offline graph. r=

Review of attachment 778384 [details] [diff] [review]:
-----------------------------------------------------------------

Hang on. I thought we agreed to dispatch messages every N ms of real time. This patch doesn't do that.
Assignee

Comment 52

6 years ago
Posted image Without
This is a profile without attachment 778384 [details] [diff] [review] (Use bigger chunks when using an offline graph) applied. The MSG overhead is basically way too high. Applying this patch gives us a 64x speedup.

I guess the patch as it is is actually wrong, since we don't process main thread events frequently enough. I think MSG iterations should run as long as there are no main thread messages or something. That should give us the same perf improvements.
Then we'd better fix the MSG overhead. Making each RunThread iteration produce more audio samples will reduce the overhead of OfflineAudioContext but it will become less representative of normal AudioContext performance.
Assignee

Updated

6 years ago
Whiteboard: [leave-open]

Comment 57

6 years ago
Yay!

Should we move further patches to their own bugs?
Assignee

Updated

6 years ago
Attachment #778384 - Flags: review?(roc)
Assignee

Comment 58

6 years ago
Yes, I've got a list of things we can optimize per benchmark listed down in my notebook at the office, I'll open more specific bugs for them.

Comment 60

6 years ago
I'm gonna close this.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26

Updated

6 years ago
Whiteboard: [leave-open]
Paul, do you need any QA on this bug?
Flags: needinfo?(paul)
Assignee

Updated

6 years ago
Flags: needinfo?(paul)
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #61)
> Paul, do you need any QA on this bug?

Taking your silence as a "no". Please remove the [qa-] whiteboard tag and add the verifyme keyword if there's something we can do to help here.
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.