[MediaEncoder] Implement Opus encoder

VERIFIED FIXED

Status

()

VERIFIED FIXED
5 years ago
5 years ago

People

(Reporter: shelly, Assigned: shelly)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MR1.2])

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

5 years ago
Follow up of bug 842243, comment 71
https://bugzilla.mozilla.org/show_bug.cgi?id=842243#c71

TODO:

1. The granule position of an audio data page should be incremented to a rate of 48K.
   http://tools.ietf.org/html/draft-ietf-codec-oggopus-00#section-4

2. Use OPUS_GET_LOOKAHEAD to get the value of pre-skip, and consider it at rate 48K too.

3. Handle padding.

4. Set up the comments in comment header from mozilla-config.h

5. Duration of last frame should be the actual duration from source.
(Assignee)

Updated

5 years ago
Duplicate of this bug: 868954
No longer depends on: 842243
(Assignee)

Comment 2

5 years ago
Created attachment 746815 [details] [diff] [review]
v5: Media Encoder Framework (with major changes in Opus encoder)

Hi derf,

As for the value of vender from comment header, is there any information I should append other than opus_get_version_string()? Second, the value of "ENCODER=?", what information should it be exactly? (MOZ_APP_UA_VERSION is something like "23.0a1", and MOZ_APP_UA_NAME is "Firefox", doesn't really tell it is b2g or desktop.)

Do you have any suggestion about handling sample rate that can't divide 48K evenly? (e.g. 32K, though it is one of the supported rate of WebRTC, but WebRTC outputs 16K only.) Thanks.
Attachment #746815 - Flags: feedback?(tterribe)
I'm not derf, but for the vendor string in the comment header, just recording opus_get_version_string() is fine. This is supposed to record the encoding library version so you don't need anything more.

The ENCODER comment describing the application doesn't have to distinguish between b2g and desktop I think. These strings are for tracing file behaviour back to the encoding application when debugging, but we should balance that against recording specific information about use. Ideally this code will be shared between all products anyway.

It looks like MOZ_APP_UA_NAME is empty for the desktop build, so I'd suggest |"Mozilla " MOZILLA_VERSION|.

For non-compatible sampling rates, you'll need to run a resampler to convert the audio to 48 kHz before submitting to the encoder. The code in media/libspeex_resampler should work fine if you don't have one handy already.
Comment on attachment 746815 [details] [diff] [review]
v5: Media Encoder Framework (with major changes in Opus encoder)

Review of attachment 746815 [details] [diff] [review]:
-----------------------------------------------------------------

This is moving in the right direction. There's a few things to finish up, still, but definite progress.

> For non-compatible sampling rates, you'll need to run a resampler to convert
> the audio to 48 kHz before submitting to the encoder. The code in
> media/libspeex_resampler should work fine if you don't have one handy
> already.

I'd suggest trying to re-use the code we already have to use the Speex resampler on MediaStreams from the Web Audio API. I'm not familiar enough with that code to give specific suggestions, though. I'd recommend talking to ehsan or padenot.

::: content/media/encoder/OpusEncoder.cpp
@@ +15,5 @@
> +static const int MAX_CHANNELS = 2;
> +
> +static const int kOpusSamplingRate = 48000;
> +static const int kFrameDurationMs  = 20;
> +static const int kFrameSizeSamples = kOpusSamplingRate * kFrameDurationMs / 1000;

This is no longer used, right?

@@ +263,5 @@
> +  mChannels = aChannels;
> +  mSamplingRate = aSamplingRate;
> +
> +  int error = 0;
> +  mEncoder = opus_encoder_create(mSamplingRate, mChannels,

Maybe add a comment here that opus_encoder_create() will fail if mSamplingRate is not at least 8000 or does not evenly divide 48000?

@@ +267,5 @@
> +  mEncoder = opus_encoder_create(mSamplingRate, mChannels,
> +                                 OPUS_APPLICATION_AUDIO, &error);
> +
> +  if (error == OPUS_OK) {
> +    opus_encoder_ctl(mEncoder, OPUS_SET_MAX_BANDWIDTH(OPUS_BANDWIDTH_WIDEBAND));

If you're passing aSamplingRate to opus_encoder_create(), you don't need to call OPUS_SET_MAX_BANDWIDTH() too. That's mostly useful when the audio bandwidth of the input differs from the sampling rate you're running the API at.

@@ +293,5 @@
> +OpusEncoder::GetHeader(nsTArray<uint8_t>* aOutput, int aMaxLength,
> +                       int &aOutputLen)
> +{
> +  while (!mEncoder) {
> +    ReentrantMonitorAutoEnter mon(mReentrantMonitor);

You need to acquire the monitor before testing mEncoder. Do this outside the loop.

@@ +304,5 @@
> +    int lookahead = 0;
> +    int error = opus_encoder_ctl(mEncoder, OPUS_GET_LOOKAHEAD(&lookahead));
> +    if (error != OPUS_OK) {
> +      lookahead = 0;
> +    }

You still need to pad the input with lookahead samples of audio data. The best thing to do would be to use the LPC extension code I pointed you at in bug 842243. But a simple thing to do in the short term would be to prepend a null chunk to mSourceSegment with a duration of lookahead samples.

@@ +321,5 @@
> +    nsString vender;
> +    vender.AppendASCII(opus_get_version_string());
> +
> +    OpusCommentHeader commentHeader(vender);
> +    commentHeader.AddComment(NS_LITERAL_STRING("ENCODER=b2g media encoder"));

> The ENCODER comment describing the application doesn't have to distinguish
> between b2g and desktop I think. These strings are for tracing file

Yes, this is what I was trying to say in the other bug.

> It looks like MOZ_APP_UA_NAME is empty for the desktop build, so I'd suggest
> |"Mozilla " MOZILLA_VERSION|.

Or |"Mozilla " MOZ_APP_UA_VERSION|. I'm not actually sure what the difference between MOZILLA_VERSION and MOZ_APP_UA_VERSION is, but we use the latter for the UA string in the SDP we generate in WebRTC.

@@ +352,5 @@
> +OpusEncoder::GetEncodedData(nsTArray<uint8_t>* aOutput, int aMaxLength,
> +                            int &aOutputLen, int &aOutputDuration)
> +{
> +  while (!mEncoder) {
> +    ReentrantMonitorAutoEnter mon(mReentrantMonitor);

You need to acquire the monitor before testing mEncoder. Do this outside the loop.

@@ +358,5 @@
> +  }
> +
> +  {
> +    // Remove at least the amount of GetFrameSize() from mRawSegment to
> +    // mSourceSegment. The monitor obtains the lock of resource in this block only.

Slightly more natural sounding:
// Move at least GetFrameSize() samples from mRawSegment to
// mSourceSegment. We only hold the monitor in this block.

@@ +359,5 @@
> +
> +  {
> +    // Remove at least the amount of GetFrameSize() from mRawSegment to
> +    // mSourceSegment. The monitor obtains the lock of resource in this block only.
> +    ReentrantMonitorAutoEnter mon(mReentrantMonitor);

Once you move the monitor acquisition out of the loop above, you don't need to re-acquire it again here.

@@ +395,5 @@
> +  while(!iter.IsEnded() && frameCopied < GetFrameSize()) {
> +    AudioChunk chunk = *iter;
> +
> +    // Chunk to the required frame size.
> +    int frameToCopy = (int32_t)chunk.GetDuration();

What's the purpose of the cast here?
Attachment #746815 - Flags: feedback?(tterribe) → feedback+

Comment 5

5 years ago
(In reply to Timothy B. Terriberry (:derf) from comment #4)
> > For non-compatible sampling rates, you'll need to run a resampler to convert
> > the audio to 48 kHz before submitting to the encoder. The code in
> > media/libspeex_resampler should work fine if you don't have one handy
> > already.
> 
> I'd suggest trying to re-use the code we already have to use the Speex
> resampler on MediaStreams from the Web Audio API. I'm not familiar enough
> with that code to give specific suggestions, though. I'd recommend talking
> to ehsan or padenot.

I don't think there's any reusable code to do resampling in Web Audio.  You can look into examples of how we do this in AudioBufferSourceNode.cpp or MediaBufferDecoder.cpp but I doubt that you can use either of them directly.

There's code that will be added to resample the segments inside a MediaStream for the purpose of connecting a generic MediaStream to an AudioNodeStream, but that work is not finished yet, so it's not clear whether the code that we'll add there will be useful to you, but feel free to watch bug 856361 to see if the code which will be added there will be useful to you.
(Assignee)

Comment 6

5 years ago
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #5)
> (In reply to Timothy B. Terriberry (:derf) from comment #4)
> > > For non-compatible sampling rates, you'll need to run a resampler to convert
> > > the audio to 48 kHz before submitting to the encoder. The code in
> > > media/libspeex_resampler should work fine if you don't have one handy
> > > already.
> > 
> > I'd suggest trying to re-use the code we already have to use the Speex
> > resampler on MediaStreams from the Web Audio API. I'm not familiar enough
> > with that code to give specific suggestions, though. I'd recommend talking
> > to ehsan or padenot.
> 
> I don't think there's any reusable code to do resampling in Web Audio.  You
> can look into examples of how we do this in AudioBufferSourceNode.cpp or
> MediaBufferDecoder.cpp but I doubt that you can use either of them directly.
> 
> There's code that will be added to resample the segments inside a
> MediaStream for the purpose of connecting a generic MediaStream to an
> AudioNodeStream, but that work is not finished yet, so it's not clear
> whether the code that we'll add there will be useful to you, but feel free
> to watch bug 856361 to see if the code which will be added there will be
> useful to you.
That would be great if there is such a feature like resample in the generic MediaStream, thanks for the information :), I'll keep trak of this bug.
(Assignee)

Comment 7

5 years ago
Created attachment 747761 [details] [diff] [review]
v5: Media Encoder Framework

Hi :derf, many thanks.

One other thing that puzzled me is the handle of eos, currently, I'm not doing anything special besides setting e_o_s flag to 1, but below is the implementation from opusenc, doing some extra work to the granulspos:

if(op.e_o_s) {
  op.granulepos=((original_samples*48000+rate-1)/rate)+header.preskip;
}

And please find my comments in the following review, thanks.
Attachment #746815 - Attachment is obsolete: true
Attachment #747761 - Flags: feedback?(tterribe)
(Assignee)

Comment 8

5 years ago
Comment on attachment 747761 [details] [diff] [review]
v5: Media Encoder Framework

Review of attachment 747761 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/encoder/OpusEncoder.cpp
@@ +318,5 @@
> +    }
> +
> +    // Pads |lookahead| samples to the start of encoding to prevent lost of
> +    // original data, the pcm duration will be calculated at rate 48K later.
> +    mSourceSegment->AppendNullData(lookahead);

The quick and easy fix for padding.

Please correct me if I'm wrong, my understanding to the read_padder is, "For each packet, pads extra samples to the buffer if in_samples(the source samples for this packet) is smaller than frame_size. 

extra = frame_size - in_samples;
if(extra>*d->extra_samples) extra=*d->extra_samples;
*d->extra_samples-=extra;

and *d->extra_samples is initialized as lookahead.

@@ +340,5 @@
> +
> +    //e.g. ENCODER=Firefox23.0a1
> +    commentHeader.AddComment(NS_LITERAL_STRING("ENCODER=") +
> +                             NS_LITERAL_STRING(MOZ_APP_UA_NAME) +
> +                             NS_LITERAL_STRING(MOZ_APP_UA_VERSION));

In the config file, "Mozilla" is MOZ_APP_VENDOR, MOZ_APP_UA_NAME(Firefox) and MOZ_APP_UA_VERSION(23.0a1) looks like a pair to go together. I don't have strong insist on this value, feel free to change this value anytime.

Comment 9

5 years ago
(In reply to comment #6)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #5)
> > There's code that will be added to resample the segments inside a
> > MediaStream for the purpose of connecting a generic MediaStream to an
> > AudioNodeStream, but that work is not finished yet, so it's not clear
> > whether the code that we'll add there will be useful to you, but feel free
> > to watch bug 856361 to see if the code which will be added there will be
> > useful to you.
> That would be great if there is such a feature like resample in the generic
> MediaStream, thanks for the information :), I'll keep trak of this bug.

Yes, it would be great, but unfortunately the resampling needs are sort of different, so it's not clear to me how I would do that.  Also, it turns out that the speex_resampler API is not as easy to use as I originally thought, see my WIP patch in bug 870174.

Also, you might be interested in bug 836599 where I'm going to add yet another resampling code path!  No patches to do the resampling are up yet though.
Comment on attachment 747761 [details] [diff] [review]
v5: Media Encoder Framework

Review of attachment 747761 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/encoder/OpusEncoder.cpp
@@ +11,5 @@
> +
> +namespace mozilla {
> +
> +// http://www.opus-codec.org/docs/html_api-1.0.2/group__opus__encoder.html
> +// In section "opus_encoder_init", chaneels must be 1 or 2 of input signal.

"channels"

@@ +19,5 @@
> +// Second paragraph, " The granule position of an audio data page is in units
> +// of PCM audio samples at a fixed rate of 48 kHz."
> +static const int kOpusSamplingRate = 48000;
> +
> +// The duration of an Opus packet. Must be 2.5, 5, 10, 20, 40 or 60 ms.

s/packet/frame/

Opus packets can contain multiple frames and be any multiple of 2.5 ms up to 120 ms, but a single frame must have one of the durations you list here.

@@ +266,5 @@
> +  mChannels = aChannels;
> +
> +  // The granule position is required to be incremented at a rate of 48KHz, and
> +  // it is simply calculated as |granulepos = samples * (48000/source_rate)|,
> +  // that is, the source sampling rate must divides 48000 evenly.

"divide"

@@ +269,5 @@
> +  // it is simply calculated as |granulepos = samples * (48000/source_rate)|,
> +  // that is, the source sampling rate must divides 48000 evenly.
> +  MOZ_ASSERT((aSamplingRate >= 8000) && (kOpusSamplingRate / aSamplingRate) *
> +             aSamplingRate == kOpusSamplingRate,
> +             "The source sample rate is desired to be greater than 8000 and "

s/is desired to/should/

@@ +270,5 @@
> +  // that is, the source sampling rate must divides 48000 evenly.
> +  MOZ_ASSERT((aSamplingRate >= 8000) && (kOpusSamplingRate / aSamplingRate) *
> +             aSamplingRate == kOpusSamplingRate,
> +             "The source sample rate is desired to be greater than 8000 and "
> +             "divides 48000 evenly.");

"divide"

@@ +318,5 @@
> +    }
> +
> +    // Pads |lookahead| samples to the start of encoding to prevent lost of
> +    // original data, the pcm duration will be calculated at rate 48K later.
> +    mSourceSegment->AppendNullData(lookahead);

I think you are right.

Reviewing how libopus works more closely, this padding actually needs to go at the end of the recording, not the beginning. Apologies if I mislead you earlier when I said "prepend".

@@ +340,5 @@
> +
> +    //e.g. ENCODER=Firefox23.0a1
> +    commentHeader.AddComment(NS_LITERAL_STRING("ENCODER=") +
> +                             NS_LITERAL_STRING(MOZ_APP_UA_NAME) +
> +                             NS_LITERAL_STRING(MOZ_APP_UA_VERSION));

As Ralph correctly pointed out, MOZ_APP_UA_NAME is empty on desktop Firefox, so if we wanted to use it here, we'd have to duplicate the code at <http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpHandler.cpp#274> to fill in a default value (or fix configure to always make MOZ_APP_UA_NAME non-empty).

Also, FWIW, NS_LITERAL_STRING("ENCODER=" MOZ_APP_UA_NAME MOZ_APP_UA_VERSION) will do the string concatenation at compile time instead of runtime.
Attachment #747761 - Flags: feedback?(tterribe) → feedback+
(Assignee)

Comment 11

5 years ago
Created attachment 755289 [details] [diff] [review]
Part4: Add OpusEncoder.h/.cpp

(Cancel the need info request from bug 842243 since it is just about Opus encoder.)

Hi Ralph, could you give me some ideas about how the serialno should be when initializing the ogg stream? Many thanks :)
Attachment #747761 - Attachment is obsolete: true
Hi Shelly. Sorry for not responding yesterday. The serialno field should be initialized to a random number. The idea is to make it unlikely two files will have the same serialno.

The only hard requirement is that multiple streams in the same file have different serialno fields. That isn't an issue with the current OpenEncoder implementation because it only has a single stream.

Updated

5 years ago
Blocks: 879688
(Assignee)

Comment 13

5 years ago
Has already fixed by bug 842243.
(Assignee)

Updated

5 years ago
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED

Updated

5 years ago
Summary: [follow-up bug 842243] Implement Opus encoder → [MediaEncoder] Implement Opus encoder
(Assignee)

Updated

5 years ago
Whiteboard: [MR1.2]

Updated

5 years ago
Keywords: verifyme
QA Contact: jsmith
smoketests are passing for media recorder -> verified.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Component: Video/Audio → Video/Audio: Recording
You need to log in before you can comment on or make changes to this bug.