Closed Bug 901633 Opened 6 years ago Closed 4 years ago

Improve internal support for stereo audio in the webrtc media pipeline

Categories

(Core :: WebRTC: Audio/Video, defect, P5)

22 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox43 --- fixed
Blocking Flags:

People

(Reporter: jesup, Assigned: padenot)

References

(Blocks 1 open bug)

Details

Attachments

(17 files, 15 obsolete files)

1.83 KB, patch
Details | Diff | Splinter Review
12.82 KB, patch
jesup
: review+
Details | Diff | Splinter Review
4.52 KB, patch
jesup
: review+
Details | Diff | Splinter Review
3.64 KB, patch
roc
: review+
Details | Diff | Splinter Review
6.37 KB, patch
Details | Diff | Splinter Review
2.14 KB, patch
Details | Diff | Splinter Review
3.73 KB, patch
jesup
: review+
Details | Diff | Splinter Review
3.76 KB, patch
jesup
: review+
Details | Diff | Splinter Review
3.29 KB, patch
jesup
: review+
Details | Diff | Splinter Review
8.57 KB, patch
jesup
: review+
Details | Diff | Splinter Review
6.60 KB, patch
jesup
: review+
Details | Diff | Splinter Review
5.63 KB, patch
jesup
: review+
Details | Diff | Splinter Review
27.29 KB, patch
Details | Diff | Splinter Review
3.65 KB, patch
karlt
: review+
Details | Diff | Splinter Review
3.91 KB, patch
karlt
: review+
Details | Diff | Splinter Review
4.37 KB, patch
jesup
: review+
Details | Diff | Splinter Review
3.92 KB, patch
jesup
: review+
Details | Diff | Splinter Review
Explicitly for the part outside of the webrtc.org/GIPS code.
backlog: --- → webRTC+
Rank: 55
Priority: -- → P5
Whiteboard: [webrtc][getusermedia]
Assignee: nobody → padenot
Blocks: 1156472
Blocks: 1187333
No longer blocks: 1156472
This applies on pc_test.html from the WebRTC landing repo.

It makes it so that the sender sends an stereo sine wave that smoothly oscillates between hard-panning to the left and right.

Currently, we only send mono so it oscillates between silence and non-silence.
This means converting to int16, interleaving, and down-mixing to stereo (or
keeping it to mono if it's already mono of course).
Comment on attachment 8642473 [details] [diff] [review]
Part 1 - Implement a generic audio packetizer. r=

We have too much ad-hoc packetizers in the tree. Here is one more that is better (and tested).

The plan is to change the MediaPipeline one and the AEC feed back one to it.
Attachment #8642473 - Flags: review?(rjesup)
Comment on attachment 8642477 [details] [diff] [review]
Part 5 - Make MediaPipeline downmix and properly convert audio for webrtc.org code. r=

This is using the new templated conversion functions in part 2. We still need to have a switch here, because we don't know what the MSG is feeding us at this point: it can be Web Audio (floats), or directly a mic (int).
Attachment #8642477 - Flags: review?(rjesup)
Attachment #8642482 - Flags: review?(rjesup)
Comment on attachment 8642486 [details] [diff] [review]
Part 9 - Make the necessary changes to VoEExternalMediaImpl::ExternalRecordingInsertData so that it the number of channels is forwarded down the webrtc.org code. r=

That's the patch where I don't know if we want to upstream it.
Attachment #8642486 - Flags: review?(rjesup)
Comment on attachment 8642473 [details] [diff] [review]
Part 1 - Implement a generic audio packetizer. r=

Review of attachment 8642473 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/AudioPacketizer.h
@@ +42,5 @@
> +       "The packet size and the number of channel should be strictly positive");
> +  }
> +  void Input(InputType* aFrames, uint32_t aFrameCount)
> +  {
> +    uint32_t inputSamples = aFrameCount * mChannels;

I suppose in theory we should use size_t for all these, or something along those lines - but uint32_t is *way* bigger than could ever be needed

@@ +113,5 @@
> +  uint32_t PacketsAvailable() {
> +    return AvailableSamples() / mChannels / mPacketSize;
> +  }
> +
> +  bool Empty() {

Should some of these methods be const?
Attachment #8642473 - Flags: review?(rjesup) → review+
Attachment #8642477 - Flags: review?(rjesup) → review+
Comment on attachment 8642482 [details] [diff] [review]
Part 8 - Use our new generic packetizer in the MediaPipeline so that we can packetize stereo easily. r=

Review of attachment 8642482 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/mediapipeline/MediaPipeline.cpp
@@ +974,5 @@
>  
>    MOZ_ASSERT(!(rate%100)); // rate should be a multiple of 100
>  
> +  // Check if the rate or the number of channels has changed since the last time
> +  // we came through I realize it may be overkill to check if the rate has

"through."
Attachment #8642482 - Flags: review?(rjesup) → review+
Attachment #8642486 - Flags: review?(rjesup) → review+
This new version fixes a ridiculous bug (I forgot to add an offset to a pointer
in a PodCopy call) that was somehow not caught by the tests (and adds a test
that would have caught it off course).
Attachment #8653439 - Flags: review?(rjesup)
Attachment #8642473 - Attachment is obsolete: true
This is basically templating more things in AudioSegment.h and
AudioChannelFormat.h so we can use them for float or integers (WebRTC uses
integers and needed those).
Attachment #8653441 - Flags: review?(roc)
Attachment #8642474 - Attachment is obsolete: true
I don't think we have much people working on this this days, so you get to
review. It's rather simple though.
Attachment #8653442 - Flags: review?(rjesup)
Attachment #8642475 - Attachment is obsolete: true
This is making more things typed instead of using void*.
Attachment #8653443 - Flags: review?(roc)
Attachment #8642476 - Attachment is obsolete: true
This means converting to int16, interleaving, and down-mixing to stereo (or
keeping it to mono if it's already mono of course).

Already r+ed by jesup, carrying forward.
Attachment #8642477 - Attachment is obsolete: true
Again, making some things more typed now that we don't use void* everywhere.
Attachment #8653448 - Flags: review?(karlt)
Attachment #8642478 - Attachment is obsolete: true
Same thing for AudioNodeExternalInputStream.
Attachment #8653449 - Flags: review?(karlt)
Attachment #8642481 - Attachment is obsolete: true
Attachment #8642482 - Attachment is obsolete: true
Attachment #8642486 - Attachment is obsolete: true
This is the receiving side. This patch makes it possible to receive stereo, and
removes a malloc call as well.
Attachment #8653456 - Flags: review?(rjesup)
This is supporting code for the previous patch so that we can get two channels
if needed: the webrtc.org code does not down mix anymore if it receives stereo.
Attachment #8653459 - Flags: review?(rjesup)
This adds a generic function to convert and deinterleave audio we get from
webrtc.org code so that we can feed it to the MSG.
Attachment #8653462 - Flags: review?(rjesup)
When the audio comes from a PeerConnection, we don't know how many channels the
audio will have, and it can change anyways, so we need to teach the input
resampler to change its channel count dynamically.
Attachment #8653463 - Flags: review?(rjesup)
Now we can unit tests our new generic AudioSegment.h and AudioChannelFormat.h
methods, here are some C++ tests. Also this includes trivial fixes so that the
tests for the AudioPacketizer work on Windows and with static analysis builds.
Attachment #8653466 - Flags: review?(rjesup)
This removes an alloocation on the sending side of the PeerConnection, for
audio, by taking advantage of the fact that webrtc.org synchronously does a
copy later down the stack when it feeds data to its own resampler.
Attachment #8653468 - Flags: review?(rjesup)
This removes the other allocation in the common case on the sending side so we
don't malloc anymore on the critical path.
Attachment #8653469 - Flags: review?(rjesup)
Comment on attachment 8653441 [details] [diff] [review]
Part 2 - Make AudioChannelFormat and AudioSegment more generic. r=

Review of attachment 8653441 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/AudioSegment.cpp
@@ +184,2 @@
>  
> +    switch(c.mBufferFormat) {

space before (
Attachment #8653441 - Flags: review?(roc) → review+
Comment on attachment 8653441 [details] [diff] [review]
Part 2 - Make AudioChannelFormat and AudioSegment more generic. r=

>+  nsTArray<const T*> ChannelData()
>+  {
>+    nsTArray<const T*> channels;
>+    channels.SetLength(mChannelData.Length());
>+    for (uint32_t channel = 0; channel < mChannelData.Length(); channel++) {
>+      channels[channel] = reinterpret_cast<const T*>(mChannelData[channel]);
>+    }
>+    return channels;
>+  }

This is adding at least one allocation/free pair, possibly more depending on
how much RVO can do.

I assume it is possible to return type const nsTArray<const T*>& and get that
from *reinterpret_cast<nsTArray<const T*>*>(&mChannelData).

The API provides the opportunity to add

  MOZ_ASSERT(AudioSampleTypeToFormat<T>::Format == mBufferFormat);
Comment on attachment 8653448 [details] [diff] [review]
Part 6 - Update DelayBuffer to use the new AudioChunk methods

>+  mUpmixChannels = mChunks[aNewReadChunk].ChannelData<float>();

I want to avoid the new allocation here.  This is fine if that can be resolved.

Bonus points for removing static_cast<const float*>() from ReadChannels().
(I assume that is the advantage of templating AudioChannelsUpMix because the
function is the same regardless of the template parameter.)
Attachment #8653448 - Flags: review?(karlt) → review-
Comment on attachment 8653449 [details] [diff] [review]
Part 7 - Update AudioNodeExternalInputStream to use the new AudioChunk methods

>-CopyChunkToBlock(const AudioChunk& aInput, AudioChunk *aBlock,
>+CopyChunkToBlock(const nsTArray<const T*>& aInput, uint32_t aDuration, float aVolume, AudioChunk *aBlock,

I think the API is tidier as is with the const AudioChunk& aInput rather than the caller providing the three associated parameters.  This method copies chunk to block so having only the output be a chunk would confuse things.

Instead call with the template parameter.  e.g. CopyChunkToBlock<float>

>-      AudioChannelsUpMix(&channels, blockChannels, nullptr);
>+      AudioChannelsUpMix(&channels, blockChannels, SilentChannel::ZeroChannel<T>());

Passing a buffer of zeros instead of null defeats the null channel
optimization below.

>+        nsTArray<const float*> channelData = ci->ChannelData<float>();

Unnecessary memory allocation, as indicated previously.
The API may be fine though if the reinterpret_cast array reference works.
Attachment #8653449 - Flags: review?(karlt) → review-
This means converting to int16, interleaving, and down-mixing to stereo (or
keeping it to mono if it's already mono of course).
Attachment #8653444 - Attachment is obsolete: true
Attachment #8654083 - Flags: review?(rjesup)
Yeah this reinterpret_cast trick works fine.
Attachment #8653441 - Attachment is obsolete: true
This is similar, but ChannelData() has been changed, and the static_cast has
been removed.
Attachment #8654087 - Flags: review?(karlt)
Attachment #8653448 - Attachment is obsolete: true
Attachment #8653449 - Attachment is obsolete: true
Comment on attachment 8654087 [details] [diff] [review]
Part 6 - Update DelayBuffer to use the new AudioChunk methods

Thanks!
Attachment #8654087 - Flags: review?(karlt) → review+
Comment on attachment 8654089 [details] [diff] [review]
Part 7 - Update AudioNodeExternalInputStream to use the new AudioChunk methods

>-  if (aInput.IsNull()) {

>+  const nsTArray<const T*>& inputChannels = aInput.ChannelData<T>();
>+  if (!inputChannels[0]) {
>     channels.SetLength(blockChannels);
>     PodZero(channels.Elements(), blockChannels);
>   } else {
>-    channels.SetLength(aInput.ChannelCount());
>-    PodCopy(channels.Elements(), aInput.mChannelData.Elements(), channels.Length());
>+    channels.SetLength(inputChannels.Length());
>+    PodCopy(channels.Elements(), inputChannels.Elements(), channels.Length());

Leave the aInput.IsNull() test because the existing code is fine and
inputChannels[0] can read beyond the end of the array.

Instead move the inputChannels declaration into the else block.

>+    switch(ci->mBufferFormat) {

space before (
Attachment #8654089 - Flags: review?(karlt) → review+
Comment on attachment 8653439 [details] [diff] [review]
Part 1 - Implement a generic audio packetizer. r=

Review of attachment 8653439 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/AudioPacketizer.h
@@ +40,5 @@
> +  {
> +     MOZ_ASSERT(aPacketSize > 0 && aChannels > 0,
> +       "The packet size and the number of channel should be strictly positive");
> +  }
> +  void Input(InputType* aFrames, uint32_t aFrameCount)

add blank line (nit)

@@ +91,5 @@
> +  {
> +    uint32_t samplesNeeded = mPacketSize * mChannels;
> +    OutputType* out = new OutputType[samplesNeeded];
> +
> +    // Under-run. Pad the end of the buffer with silence.

Is there any need to (optionally) log underruns?  (it shouldn't be the default)

@@ +95,5 @@
> +    // Under-run. Pad the end of the buffer with silence.
> +    if (AvailableSamples() < samplesNeeded) {
> +      uint32_t zeros = samplesNeeded - AvailableSamples();
> +      PodZero(out + AvailableSamples(), zeros);
> +      samplesNeeded -= zeros;

Good, this puts the zeros after the real data

@@ +114,5 @@
> +
> +    return out;
> +  }
> +
> +  uint32_t PacketsAvailable() {

A number of these methods should be const
Attachment #8653439 - Flags: review?(rjesup) → review+
Comment on attachment 8653442 [details] [diff] [review]
Part 3 - Fix TrackEncoder to use the new AudioChunk methods. r=

Review of attachment 8653442 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/encoder/TrackEncoder.cpp
@@ +149,5 @@
> +      InterleaveTrackData(array, aDuration, aOutputChannels, aOutput, aChunk.mVolume);
> +      break;
> +   }
> +   case AUDIO_FORMAT_SILENCE: {
> +      MOZ_ASSERT(false, "To implement.");

File a bug?
Attachment #8653442 - Flags: review?(rjesup) → review+
Comment on attachment 8654083 [details] [diff] [review]
Part 5 - Make MediaPipeline downmix and properly convert audio for webrtc.org code. r=

Review of attachment 8654083 [details] [diff] [review]:
-----------------------------------------------------------------

Easy fix

::: media/webrtc/signaling/src/mediapipeline/MediaPipeline.cpp
@@ -953,2 @@
>  
> -  if (enabled_ && chunk.mBuffer) {

You lost the bit that zeros the buffers if enabled_ is false
Attachment #8654083 - Flags: review?(rjesup) → review-
Comment on attachment 8653456 [details] [diff] [review]
Part 10 - Change the receiving side of the MediaPipeline so that it can detect and handle stereo. r=

Review of attachment 8653456 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/mediapipeline/MediaPipeline.cpp
@@ +1361,2 @@
>      AudioSegment segment;
> +    // We derive the number of channel of the stream from the number of samples

channels
Attachment #8653456 - Flags: review?(rjesup) → review+
Attachment #8653459 - Flags: review?(rjesup) → review+
Attachment #8653462 - Flags: review?(rjesup) → review+
Comment on attachment 8653463 [details] [diff] [review]
Part 13 - Teach the resampler at the input of the MSG to dynamically change its channel count if needed. r=

Review of attachment 8653463 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/MediaStreamGraph.cpp
@@ +2482,5 @@
>    AudioSegment* segment = static_cast<AudioSegment*>(aSegment);
>    int channels = segment->ChannelCount();
>  
> +  // If this segment is just silence, we delay instanciating the resampler. We
> +  // also need too recreate the resampler if the channel count changes.

too -> to
Attachment #8653463 - Flags: review?(rjesup) → review+
Attachment #8653466 - Flags: review?(rjesup) → review+
Attachment #8653468 - Flags: review?(rjesup) → review+
Attachment #8653469 - Flags: review?(rjesup) → review+
Attachment #8653456 - Attachment is obsolete: true
(In reply to Paul Adenot (:padenot) from comment #47)
> Created attachment 8654762 [details] [diff] [review]
> Part 10 - Change the receiving side of the MediaPipeline so that it can
> detect and handle stereo. r=
> 
> Yes, good catch !

Are you sure you didn't mean to update part 5, which I r-'d?  part 10 I r+'d with a single grammar error
Attachment #8654083 - Attachment is obsolete: true
Attachment #8655307 - Flags: review?(rjesup) → review+
Attachment #8654762 - Flags: review?(rjesup) → review+
Depends on: 1203836
Depends on: 1203671
See Also: → 1543622
You need to log in before you can comment on or make changes to this bug.