Closed Bug 830707 Opened 11 years ago Closed 11 years ago

Make MediaStream AudioSegments not have a fixed number of channels

Categories

(Core :: Audio/Video, defect)

18 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla21

People

(Reporter: roc, Assigned: roc)

References

Details

Attachments

(4 files, 3 obsolete files)

We need to allow the number of channels in a MediaStream's audio track to change from moment to moment. In particular Web Audio needs this. I guess theoretically chained Oggs and similar kinds of media resources would need this too.
I have patches for this bug, but no good way to test them yet.
Comment on attachment 704415 [details] [diff] [review]
Part 1: Add code for upmixing and downmixing following Web Audio's spec

Review of attachment 704415 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/AudioChannelFormat.cpp
@@ +2,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this file,
> + * You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +#include "AudioChannelFormat.h"

You need to #include <algorithm> here, for std::min and std::max.
Comment on attachment 704417 [details] [diff] [review]
Part 2: Mix channels to output channel count when playing audio

Review of attachment 704417 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/AudioSegment.cpp
@@ +89,5 @@
>  }
>  
> +static const int AUDIO_PROCESSING_FRAMES = 640; /* > 10ms of 48KHz audio */
> +static const int GUESS_AUDIO_CHANNELS = 2;
> +static const uint8_t gZeroChannel[MAX_AUDIO_SAMPLE_SIZE*AUDIO_PROCESSING_FRAMES];

clang says:

/Users/ehsanakhgari/moz/mozilla-central/content/media/AudioSegment.cpp:93:22: error: default initialization of an object of const type 'const uint8_t [2560]'
static const uint8_t gZeroChannel[MAX_AUDIO_SAMPLE_SIZE*AUDIO_PROCESSING_FRAMES];
                     ^
1 error generated.

You probably want:

static const uint8_t gZeroChannel[MAX_AUDIO_SAMPLE_SIZE*AUDIO_PROCESSING_FRAMES] = {0};
Comment on attachment 704418 [details] [diff] [review]
Part 3: Don't constrain AudioSegment to a fixed number of channels

Review of attachment 704418 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/MediaStreamGraph.cpp
@@ +1159,3 @@
>          // something here ... preallocation, async allocation, multiplexing onto a single
>          // stream ...
>          AudioSegment* audio = tracks->Get<AudioSegment>();

With this change, this variable is now unused.
Comment on attachment 704418 [details] [diff] [review]
Part 3: Don't constrain AudioSegment to a fixed number of channels

Review of attachment 704418 [details] [diff] [review]:
-----------------------------------------------------------------

You probably want to remove the Init call from media/webrtc/signaling/test/mediapipeline_unittest.cpp too.
(In reply to :Ehsan Akhgari from comment #12)
> Comment on attachment 704418 [details] [diff] [review]
> Part 3: Don't constrain AudioSegment to a fixed number of channels
> 
> Review of attachment 704418 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> You probably want to remove the Init call from
> media/webrtc/signaling/test/mediapipeline_unittest.cpp too.

(And from media/webrtc/signaling/test/FakeMediaStreamsImpl.h too.)
I had already made all these changes locally, sorry :-(
Comment on attachment 704415 [details] [diff] [review]
Part 1: Add code for upmixing and downmixing following Web Audio's spec

Review of attachment 704415 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/AudioChannelFormat.cpp
@@ +111,5 @@
> +        aChannelArray->ElementAt(0) = surroundChannels[SURROUND_L];
> +        aChannelArray->ElementAt(1) = surroundChannels[SURROUND_R];
> +        aChannelArray->ElementAt(2) = surroundChannels[SURROUND_C];
> +        aChannelArray->ElementAt(3) = surroundChannels[SURROUND_SL];
> +        aChannelArray->ElementAt(4) = surroundChannels[SURROUND_SR];

I assume it's intentional for no-center to with-center upmixes to not generate a center channel?  Normally in multi-channel audio upmixes generate virtual centers (correlated R+L) and usually derived rears (anti-correlated R&L), typically with a delay. This may well not be the intention here, but it isn't clear.  Same question for down-mixes.   I'm sure there are tons of articles on all the different ways you can up and downmix.

@@ +118,5 @@
> +    }
> +
> +    memcpy(aChannelArray->Elements(), surroundChannels, sizeof(surroundChannels));
> +    inputChannelCount = CUSTOM_CHANNEL_LAYOUTS;
> +  }

If inputChannelCount > CUSTOM_CHANNEL_LAYOUTS (and it's tested, so let's assume it's possible), then we'll return NULL I presume for the first 6 channels. not zerochannel.  Either assert on it or move the initializer.
Attachment #704415 - Flags: review?(rjesup) → review+
Attachment #704416 - Flags: review?(rjesup) → review+
Comment on attachment 704417 [details] [diff] [review]
Part 2: Mix channels to output channel count when playing audio

Review of attachment 704417 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/AudioSegment.cpp
@@ +98,5 @@
> +  uint32_t outputChannels = aOutput->GetChannels();
> +  nsAutoTArray<AudioDataValue,AUDIO_PROCESSING_FRAMES*GUESS_AUDIO_CHANNELS> buf;
> +  nsAutoTArray<const void*,GUESS_AUDIO_CHANNELS> channelData;
> +  nsAutoTArray<float,AUDIO_PROCESSING_FRAMES*GUESS_AUDIO_CHANNELS> downmixConversionBuffer;
> +  nsAutoTArray<float,AUDIO_PROCESSING_FRAMES*GUESS_AUDIO_CHANNELS> downmixOutputBuffer;

is it possible/useful to remember the number of channels from the last iteration and use that to seed the TARRAY sizes to avoid re-allocations when channels != 2?   Perhaps not as coded. From one write to another they rarely change....  Also, holding allocations from one iteration to another may be useful and allow avoiding reallocs - I always try to avoid a memory allocator where possible in realtime output paths....

@@ +134,5 @@
> +            for (uint32_t i = 0; i < channelData.Length(); ++i) {
> +              float* conversionBuf = downmixConversionBuffer.Elements() + (i*duration);
> +              const int16_t* sourceBuf = static_cast<const int16_t*>(channelData[i]);
> +              for (uint32_t j = 0; j < duration; ++j) {
> +                conversionBuf[j] = AudioSampleToFloat(sourceBuf[j]);

I'd want to code this with a mind to doing an optimized (SSE/neon/etc) implementation(s) (ConvertAudioSamplesToFloats(dest,source,duration) or some such).

@@ +142,5 @@
> +          }
> +
> +          downmixOutputBuffer.SetLength(duration*outputChannels);
> +          nsAutoTArray<float*,GUESS_AUDIO_CHANNELS> outputChannelBuffers;
> +          nsAutoTArray<const void*,GUESS_AUDIO_CHANNELS> outputChannelData;

Ditto on remembering arrays

@@ +165,5 @@
> +      } else {
> +        // Assumes that a bit pattern of zeroes == 0.0f
> +        memset(buf.Elements(), 0, buf.Length()*sizeof(AudioDataValue));
> +      }
> +      aOutput->Write(buf.Elements(), int32_t(duration));

May need Start() similar to patch that landed last weekend
Attachment #704417 - Flags: review?(rjesup) → review+
Comment on attachment 704418 [details] [diff] [review]
Part 3: Don't constrain AudioSegment to a fixed number of channels

Review of attachment 704418 [details] [diff] [review]:
-----------------------------------------------------------------

r+ assuming GetFirstChunk == GetLastChunk is correct (or with an appropriate fix if not)

::: content/media/MediaSegment.h
@@ +281,5 @@
> +    if (mChunks.IsEmpty()) {
> +      return nullptr;
> +    }
> +    return &mChunks[mChunks.Length() - 1];
> +  }

How is this different than GetLastChunk()?

::: content/media/webrtc/MediaEngineDefault.cpp
@@ -335,5 @@
>    mSource = aStream;
>  
>    // AddTrack will take ownership of segment
>    AudioSegment* segment = new AudioSegment();
> -  segment->Init(CHANNELS);

With this removed, there are no uses of CHANNELS in this file; it can probably go too
Attachment #704418 - Flags: review?(rjesup) → review+
(In reply to Randell Jesup [:jesup] from comment #15)
> I assume it's intentional for no-center to with-center upmixes to not
> generate a center channel?  Normally in multi-channel audio upmixes generate
> virtual centers (correlated R+L) and usually derived rears (anti-correlated
> R&L), typically with a delay.

I don't understand everything you said here, but I'm trying to just follow the Web Audio spec for 1, 2, 4 and 6 channels and do something plausible for the rest:
https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#UpMix

> If inputChannelCount > CUSTOM_CHANNEL_LAYOUTS (and it's tested, so let's
> assume it's possible), then we'll return NULL I presume for the first 6
> channels. not zerochannel.  Either assert on it or move the initializer.

That's a horrible bug. In the inputChannelCount >= CUSTOM_CHANNEL_LAYOUTS case, I need to copy the channels to the output array. I'll fix it.

> is it possible/useful to remember the number of channels from the last iteration > and use that to seed the TARRAY sizes to avoid re-allocations when channels != 2?

Probably best to just up the nsAutoTArray buffer size if necessary.

> How is this different than GetLastChunk()?

Stupid bug. Good thing it's not used. I'll just remove it.

> With this removed, there are no uses of CHANNELS in this file; it can probably
> go too

I'll remove it.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #18)
> I don't understand everything you said here, but I'm trying to just follow
> the Web Audio spec for 1, 2, 4 and 6 channels and do something plausible for
> the rest:
> https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#UpMix

We've discussed this in #media and even suggested proposing the downmixing matrices we're currently using with Opus (<http://mxr.mozilla.org/mozilla-central/source/content/media/ogg/OggReader.cpp#525>). But roc is right, this is what the spec currently says to do.
> > If inputChannelCount > CUSTOM_CHANNEL_LAYOUTS (and it's tested, so let's
> > assume it's possible), then we'll return NULL I presume for the first 6
> > channels. not zerochannel.  Either assert on it or move the initializer.
> 
> That's a horrible bug. In the inputChannelCount >= CUSTOM_CHANNEL_LAYOUTS
> case, I need to copy the channels to the output array. I'll fix it.

Actually this is wrong --- I was right the first time. Since we're updating aChannelArray in place, when inputChannelCount >= CUSTOM_CHANNEL_LAYOUTS there is no need to touch the first inputChannelCount channels.
roc doesn't seem to have noted the mozilla-inbound push yet, but I'll note the trivial bustage fix I just landed:
   https://hg.mozilla.org/integration/mozilla-inbound/rev/703cfc290199
You need to log in before you can comment on or make changes to this bug.