Closed Bug 1431810 Opened 2 years ago Closed 2 years ago

Disable Opus phase inversion for mono downmixes

Categories

(Core :: Audio/Video: Playback, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: jmvalin, Assigned: achronop)

Details

Attachments

(4 files)

Opus has a special feature for stereo coding where it can represent wide stereo images by making the channels 180-degree out of phase. This improves stereo quality, but sometimes leads to poor quality when the output is downmixed to mono since the channels intermittently cancel each other. 

Since version 1.2.x, Opus has a OPUS_SET_PHASE_INVERSION_DISABLED() setting that can let the decoder know that its output will be downsampled to mono. This reduces artefacts due to mono downmix. Since lately several people have started complaining about the Opus mono issue so it would be good to address it. All that's needed is to call:

opus_decoder_ctl(decoder_object, OPUS_SET_PHASE_INVERSION_DISABLED(1))
when FF knows the output is being played in mono. This also works for surround by using opus_multistream_decoder_ctl() instead of opus_decoder_ctl().
Component: Audio/Video → Audio/Video: Playback
Alex, is this something you'd like to take on?
Flags: needinfo?(achronop)
Priority: -- → P2
Sure, let me have a look, I guess you are the opus guy in case of questions.
Assignee: nobody → achronop
Flags: needinfo?(achronop)
Reading the OpusDecoder class my understanding is that we always configure the decoder using the number of channels (as well as stream + coupled_stream) from the values we get from opus header. Does it mean that there is no chance for decoder to downmix the output?

In general, if we need to downmix the stream in order to play it we do it later in AudioSink using the decoded, PCM, data [2].

[1] https://searchfox.org/mozilla-central/source/dom/media/platforms/agnostic/OpusDecoder.cpp#91
[2] https://searchfox.org/mozilla-central/source/dom/media/mediasink/AudioSink.cpp#398
Flags: needinfo?(giles)
I'm talking about a separate issue here. Let's say you have a stereo music file and you initialize your decoder as stereo. It's still possible that you're playing that stereo file on a phone (or other device) that only has a single speaker for some reason. When that happens, the stereo signal you get will be downmixed to mono somewhere down the path and for Opus it might result in some artefacts. That's what we're trying to avoid here. The idea is for FF to detect that there's only one output channel (or that there could be only one, it doesn't need to be 100% reliable) and then pass a flag that tells the decoder to disable the phase inversion flag. The decoder can still decode in stereo (especially when we're not 100% sure the output is mono), but the stereo signal it outputs will be more "downmix-friendly".
Thank you Jean-Marc for the clarifications. In this case when the opus file is stereo we need to check the number of channels from cubeb and the pref "accessibility.monoaudio.enable" in order to figure out if the signal will be downmixed. Ralph, is it ok to do that in OpusDecoder class?
Does cubeb know about this? It seems like somewhere it's the OS that would know what's going on with the output device.
Yeah, cubeb has an api method that provides the max number of channels supported by the device. We can access that through the corresponding util method [1]

[1] https://searchfox.org/mozilla-central/source/dom/media/CubebUtils.h#28
Thanks for taking a look, Alex. The call to OPUS_SET_PHASE_INVERSION_DISABLED should happen in the OpusDecoder class, but it looks like it needs to be a method so it can be called higher up the chain. The AudioSink constructor decides to downmix based on MediaPrefs::MonoAudio, but it's the caller in MediaDecoderStateMachine which knows whether it will be coming from an Opus stream or not.

There's a comment which says AudioSink::mOutputChannels is never modified after construction, but it's not const, so there's no enforcement of that. If that turns out to be true, maybe you can just check after the AudioSink is corrected and call the OpusDecoder's MightDownmix() method there.

Alternatively, you may be able to add a mime-type accessor to the TrackInfo passed through aInfo in the AudioSink creator and call based on that. I'm not sure which is clean better.

The 'maximum number of channels supported by the output device' sounds like it could, in fact, include downmixing beyond cubeb, so you should double-check that. Also IIRC we downmix webrtc streams on macbooks because of problems with the echo canceller because the microphone is right next to the other speaker. Please make sure the method is called in that case too!
Flags: needinfo?(giles)
Thanks for the explanations Ralph. One question about the 1st approach. From inside MediaDecoderStateMachine, how can I get the which decoder is being used? The construction of AudioSink happens inside MediaDecoderStateMachine [1]. If I can query the type of the decoder I could add a method in AudioSink to check whether the stereo to mono downmix is taking place and call the OpusDecoder's MightDownmix() there. 

In the mean time I will check the second option of adding mime-type in TrackInfo.

(:rillian) | needinfo me from comment #8)
> Also IIRC we downmix webrtc streams on macbooks because
> of problems with the echo canceller because the microphone is right next to
> the other speaker. Please make sure the method is called in that case too!
I do not think we have to worry about this part. In this case the audio is being panned to the right speaker [2] so no channel modification is taking place. Panning happens directly in the audio driver and only on OSX platform.

[1] https://searchfox.org/mozilla-central/source/dom/media/MediaDecoderStateMachine.cpp#2770(In reply to Ralph Giles 
[2] https://searchfox.org/mozilla-central/source/dom/media/GraphDriver.cpp#1065
MediaDecoderStateMachine has an Info() method which returns a MediaInfo& [1]

MediaInfo has mAudio and mVideo members, which both inherit an mMimeType member from TrackInfo. [2]

Assuming that's initialized at the time the AudioSink is created, you should be able to pass that to OpusDataDecoder::IsOpus to see whether the sink is for decoded opus data. And, since the MediaInfo& is passed to the AudioSink constructor and stored in AudioSink::mInfo, you can actually make the check inside that class if you want.

It looks like everything is initialized through the MediaFormatReader call chain starting at MediaDecoderStateMachine::Enter [3] but I didn't trace through very carefully. You should do that.

> Panning happens directly in the audio driver and only on OSX platform.

Panning is just a weighted sum of the two channels, so would expect the issue to occur in this case too.


[1] https://searchfox.org/mozilla-central/source/dom/media/MediaDecoderStateMachine.h#342
[2] https://searchfox.org/mozilla-central/source/dom/media/MediaInfo.h#613
[3] https://searchfox.org/mozilla-central/source/dom/media/MediaDecoderStateMachine.cpp#327-341
(In reply to Ralph Giles (:rillian) | needinfo me from comment #10)
> Panning is just a weighted sum of the two channels, so would expect the
> issue to occur in this case too.

I do not think we can fix the panning case. It happens inside MSG, we can get opus decoded data in WebRTC scenarios but decoding happens in third party webrtc.org code. If we want to fix this for our panning we need to patch against webrtc.org upstream.

For the playback scenario, using the Mime already set in MediaInfo the fix is straight forward. Do you want me to push that part for review?
Flags: needinfo?(giles)
Sure. I only have two more days, then I'm leaving Mozilla. Kinetik can take over review if we don't get it landed by then.
Flags: needinfo?(giles)
Comment on attachment 8958495 [details]
Bug 1431810 - Disable Opus phase inversion on stereo to mono downmix.

https://reviewboard.mozilla.org/r/227420/#review233324

Thanks. Looks reasonable to me, so r+ with comment nits addressed.

Is is possible to write a test for this? Would be good to catch regressions, since it depends on what the encoder is sending.

::: commit-message-38617:2
(Diff revision 1)
> +Bug 1431810 - Disable Opus phase inversion for mono downmixes. r?rillian
> +

Please say something about the motivation for the change in the body of the commit meesage. Paraphrasing the comments is fine; it's just nice to save looking at the diff when someone's reading the commit history.

::: dom/media/VideoUtils.h:165
(Diff revision 1)
>  
> +// Decide the number of playback channels according to the
> +// given AudioInfo and the prefs that are being set.
> +// The basic use of the method is for AudioSink in order to
> +// set the output number of channels. OpusDataDecoder also uses
> +// that method to cover an special configuration case.

Thanks for the verbose comment. It's not always a great idea to list callers like this, since the comment can easily get out of date. Since one has to search for callers (easy enough in searchfox or dxr) anyway, having the list here clutters up the file.

It's a good thing to put in the commit message though, since it explains what you were trying change.

I don't feel strongly about it though, so feel free to keep it if you think it's important.

::: dom/media/platforms/agnostic/OpusDecoder.cpp:99
(Diff revision 1)
>                                                   mMappingTable,
>                                                   &r);
> +
> +  // Opus has a special feature for stereo coding where it represent wide
> +  // stereo channels by 180-degree out of phase. This improves quality, but
> +  // need to be disabled when the output is downmixed to mono. Playback number

"needs to be" number agreement with the singular subject "This".

::: dom/media/platforms/agnostic/OpusDecoder.cpp:100
(Diff revision 1)
>                                                   &r);
> +
> +  // Opus has a special feature for stereo coding where it represent wide
> +  // stereo channels by 180-degree out of phase. This improves quality, but
> +  // need to be disabled when the output is downmixed to mono. Playback number
> +  // of channels are set by in AudioSink, using the same method

"by in" should just be "in".
Attachment #8958495 - Flags: review?(giles) → review+
(In reply to Alex Chronopoulos [:achronop] from comment #11)

> I do not think we can fix the panning case. It happens inside MSG, we can
> get opus decoded data in WebRTC scenarios but decoding happens in third
> party webrtc.org code.

I still think this is an important case to address, and worth patching upstream. Hopefully we could get a fix integrated. But that can happen in a follow-up bug. The work here is still an improvement and worth landing on its own.
(In reply to Ralph Giles (:rillian) | needinfo me from comment #15)
> I still think this is an important case to address, and worth patching
> upstream. Hopefully we could get a fix integrated. But that can happen in a
> follow-up bug.

Filed a bug on webrtc.org: https://bugs.chromium.org/p/webrtc/issues/detail?id=9025
Comment on attachment 8959250 [details]
Bug 1431810 - Test that opus phase inversion is disabled on mono output.

https://reviewboard.mozilla.org/r/228120/#review234000

Wouldn't this test pass both with and without the phase inversion patch?  I think you need to compare the audio to an expected result to confirm it has been decoded the correct way.
Attachment #8959250 - Flags: review?(kinetik)
Thanks for the review. Any idea how to get a comparable output from the audio element? Can you link me to other tests that do something similar?
Flags: needinfo?(kinetik)
(In reply to Alex Chronopoulos [:achronop] from comment #20)
> Thanks for the review. Any idea how to get a comparable output from the
> audio element? Can you link me to other tests that do something similar?

You can use AudioContext.createMediaElementSource() to attach an audio element to Web Audio, then process it from there (e.g. with a ScriptProcessor node, if you want to examine the samples).  Simple example in dom/media/webaudio/test/test_bug966247.html.
Flags: needinfo?(kinetik)
Thanks to jmspeex we have an opus sample able to reproduce the case perfectly. This is an stereo opus file with phase inversion of 180 degrees. When the decoded data is dowmixed to mono, without the patch applied, the output is silence. You can verify that by setting the pref accessibility.monoaudio.enable = true and play the file. If you try the same with the patch applied you get the normal audio output
Comment on attachment 8961873 [details]
Bug 1431810 - Correct typo mistake in a webaudio test.

https://reviewboard.mozilla.org/r/230710/#review236456
Attachment #8961873 - Flags: review?(kinetik) → review+
Comment on attachment 8959250 [details]
Bug 1431810 - Test that opus phase inversion is disabled on mono output.

https://reviewboard.mozilla.org/r/228120/#review236458
Attachment #8959250 - Flags: review?(kinetik) → review+
Pushed by achronop@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/3fe9d85d684a
Disable Opus phase inversion on stereo to mono downmix. r=rillian
https://hg.mozilla.org/integration/autoland/rev/43eb1c01c67f
Test that opus phase inversion is disabled on mono output. r=kinetik
https://hg.mozilla.org/integration/autoland/rev/a01c1941a829
Correct typo mistake in a webaudio test. r=kinetik
Backed out 3 changesets (bug 1431810) for failing test_bug1431810_opus_downmix_to_mono.html on Windows 

Backout link: https://hg.mozilla.org/integration/autoland/rev/422ce7e85434f348398c7fefd1d549b01900f277

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=a01c1941a82932a0749e5193a0fa55aac90976b1

Log link: https://treeherder.mozilla.org/logviewer.html#?job_id=170228416&repo=autoland&lineNumber=2320

Log snippet:
08:19:11     INFO -  258 INFO TEST-START | dom/media/test/test_bug1431810_opus_downmix_to_mono.html
08:19:11     INFO -  TEST-INFO | started process screenshot
08:19:11     INFO -  TEST-INFO | screenshot: exit 0
08:19:11     INFO -  Buffered messages logged at 08:19:11
08:19:11     INFO -  259 INFO TEST-PASS | dom/media/test/test_bug1431810_opus_downmix_to_mono.html | Channels must be inverted
08:19:11     INFO -  260 INFO TEST-PASS | dom/media/test/test_bug1431810_opus_downmix_to_mono.html | Channels must be inverted
08:19:11     INFO -  Buffered messages finished
08:19:11    ERROR -  261 INFO TEST-UNEXPECTED-FAIL | dom/media/test/test_bug1431810_opus_downmix_to_mono.html | Channels must be inverted
08:19:11     INFO -      mediaElementWithPhaseInversion/mediaElement.onplay/script_processor.onaudioprocess@dom/media/test/test_bug1431810_opus_downmix_to_mono.html:62:7
08:19:11     INFO -      EventHandlerNonNull*mediaElementWithPhaseInversion/mediaElement.onplay@dom/media/test/test_bug1431810_opus_downmix_to_mono.html:52:5
08:19:11     INFO -      EventHandlerNonNull*mediaElementWithPhaseInversion@dom/media/test/test_bug1431810_opus_downmix_to_mono.html:51:3
08:19:11     INFO -      testPhaseInversion/<@dom/media/test/test_bug1431810_opus_downmix_to_mono.html:119:5
08:19:11     INFO -      testPhaseInversion@dom/media/test/test_bug1431810_opus_downmix_to_mono.html:118:10
08:19:11     INFO -      @dom/media/test/test_bug1431810_opus_downmix_to_mono.html:130:1
Flags: needinfo?(achronop)
Solution for the back out is to relax a little the fuzzy compare in inversion check:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a25656576805aeb61f6dd61b60286d31bb795ac7
Flags: needinfo?(achronop)
Pushed by achronop@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/04fd84dd5fc7
Disable Opus phase inversion on stereo to mono downmix. r=rillian
https://hg.mozilla.org/integration/autoland/rev/56a2dba05831
Test that opus phase inversion is disabled on mono output. r=kinetik
https://hg.mozilla.org/integration/autoland/rev/da2c0da3fd6f
Correct typo mistake in a webaudio test. r=kinetik
Comment on attachment 8958495 [details]
Bug 1431810 - Disable Opus phase inversion on stereo to mono downmix.

https://reviewboard.mozilla.org/r/227420/#review237754

::: dom/media/VideoUtils.cpp:165
(Diff revision 3)
>      sample = (aBuffer[fIdx*channels] + aBuffer[fIdx*channels + 1]) * 0.5;
>      aBuffer[fIdx*channels] = aBuffer[fIdx*channels + 1] = sample;
>    }
>  }
>  
> +uint32_t DecideAudioPlaybackChannels(const AudioInfo& info)

mozilla coding style.
return type on a new line

::: dom/media/platforms/agnostic/OpusDecoder.cpp:102
(Diff revision 3)
> +  // Opus has a special feature for stereo coding where it represent wide
> +  // stereo channels by 180-degree out of phase. This improves quality, but
> +  // needs to be disabled when the output is downmixed to mono. Playback number
> +  // of channels are set in AudioSink, using the same method
> +  // `DecideAudioPlaybackChannels()`, and triggers downmix if needed.
> +  if (mOpusDecoder && mOpusParser->mChannels == 2 &&

this will pretty much *never* be active as all those prefs are hidden with no GUI to set it up.

it should have been using the cubeb's info on how many channels are available.

otherwise it's pointless
You need to log in before you can comment on or make changes to this bug.