Closed Bug 1731430 Opened 4 months ago Closed 3 months ago

Stream audio and video desynchronization on Windows 11

Categories

(Core :: Audio/Video: Playback, defect, P2)

Firefox 92
Unspecified
Windows 11
defect

Tracking

()

VERIFIED FIXED
95 Branch
Tracking Status
firefox-esr91 --- fixed
firefox94 --- verified
firefox95 --- verified

People

(Reporter: fenixfurion, Assigned: alwu)

References

(Blocks 1 open bug)

Details

Attachments

(9 files)

User Agent: Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:92.0) Gecko/20100101 Firefox/92.0

Steps to reproduce:

Open any livestream (twitch.tv, youtube.com).
Non-livestream content (VODs, videos) plays back correctly.

Issue persists even with a new profile as well as in safe mode.
I've also run Mozregression and it seems to be related to a recent Windows 11 update, as builds from prior to this issue occurring are also showing this new issue.

Example livestream recording attached. Note the crackling/jittery audio.

Actual results:

Audio quickly or immediately starts to desynchronize and appears to stutter and play at a lower sampling frequency than expected.

Expected results:

Audio should be playing correctly and be synchronized properly with the video feed.

Media profile during playback attached as well.

Attached file firefox_support.json

about:support data from fresh profile

Recording of same stream using same quality settings on Microsoft Edge, for audio comparison.

Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core

Could be a duplicate for 1730348.

Could you help me check if this issue can still be reproduced on the latest Nightly? we recently landed a fix for out-of-order video, and I wonder if this is related with that. But if this is simply audio stuttering, it seems more related with bug1731006.
Thank you.

Flags: needinfo?(fenixfurion)

Would you mind also help us capture the profiled result by using Media preset again? (the one in comment1 doesn't contain the media related threads) and also capture the debug log by using MOZ_LOG=MediaDecoder:5,AudioStream:5,Cubeb:5? Thank you.

Latest Nightly (94.0a1 (2021-09-20) (64-bit)) still has the issue manifest.

Attaching MOZ_LOG=MediaDecoder:5,AudioStream:5,Cubeb:5 nightly log as well as Firefox profiler run with the Media preset to the bug description.

Flags: needinfo?(fenixfurion)

Nightly firefox profile with Media preset.

Nightly Mozilla log with MOZ_LOG=MediaDecoder:5,AudioStream:5,Cubeb:5 set.

Also, to note, in this case, the audio stuttering only manifests when watching livestreams on YouTube or Twitch.

Multiple videos can be played at once if they are not livestreams.

Prior to September 5th, multiple (3+) livestreams could be played at once on Firefox without any audio stuttering, which I believe places this issue before the release of Firefox 92.

Interesting, from the profiled result, it showed the audio played continuously in a stable rate, but it was also BEHIND the video, which seems weird (audio was behind video around 700,000us=0.7s, you can check the markers in MDSM thread) Then I checked the debug log, I found out that we have A LOT of queued audio frames in the audio queue (around 200+ at the most) So I wonder whether the audio stream consumed the audio too slowly?

However, it's still no clear to me why the video sink was playing video faster than the audio. Paul, do you have any thought about this issue?
Thank you.

Severity: -- → S3
Flags: needinfo?(padenot)
Priority: -- → P2

Can confirm this is happening for me and has been since the dev channel released (beta channel Win11 is fine), have had to switch to an alternate browser because of it.

I'm super noob at any kind of Firefox log gathering or troubleshooting but if there is anything I can do to help please let me know.

I'm currently installing Windows 11 on a laptop and will get this fixed, hopefully I can reproduce.

Flags: needinfo?(padenot)
Assignee: nobody → padenot

Has there been any progress on this one at all? Hoping to come back to Firefox as soon as this one is ready for testing. Again, if there is anything I can do to help please let me know.

I can reproduce this on my Windows 11 laptop, on twitch, when the mode Low-Latency is enabled, it works as it should when the low-latency mode is either unavailable or disabled. I haven't been able to reproduce on Youtube, but maybe I'm not trying on the right channels.

I can reproduce on builds < 91, so it really must be Windows 11.

I can make it happen when deactivating low latency mode on Twitch, but it's a lot more subtle, and the A/V drift happens a lot slower. I'm trying various channels. On some channels, audio is perfect, always, regardless of the mode (even in low-latency), and for long periods of time. But as soon as I switch to some other channel (I'm clicking randomly from the recommended channels on the left), I can quickly reproduce.

I'm running Windows 11 Home Insider Preview, version Dev, OS build 22463.100.

I believed the same, I assumed it was a change in Windows 11 but given I've only seen Firefox affected (no other apps or browsers appear affected) I wasn't sure if it would gain any traction on the Feedback app.

The only other audio related issue I've seen around the place with Windows 11 was in OBS. The audio meter was laggy and they narrowed it down to a call they were making, more info:

https://github.com/obsproject/obs-studio/pull/5265

Ok so I traced things further today.

When the audio glitches happens, it's because the decoder is drained. It's drained because this codepath is reached. I don't understand MediaSource's code very well as I've never worked on it, but this makes me wonder why this only seem to happen on Windows 11.

I'm attaching logs and a sample patch that adds some logging in the failure paths, this is running with:

MOZ_LOG=PlatformDecoderModule:5,MediaFormatReader:5,MediaSourceDemuxer:5,MediaSource:5 MOZ_DISABLE_CONTENT_SANDBOX=1 MOZ_LOG_FILE=firefox-logs MOZ_DUMP_AUDIO=1 ./mach run

The added log is line 6 and 7, this triggers the OnDemuxFailed, line 25, and this triggers the drain line 51. This drains the decoder, and so incorrect audio is output and A/V sync becomes off as well.

It's not a real data underrun: this is on a very fast link and laptop, and twitch's advanced stats report that enough data is buffered. I'm not sure yet how to debug this further without learning the MediaSource code.

Flags: needinfo?(bvandyk)

A new profile of this happening her: https://share.firefox.dev/3BwqoIm

From the partial log in comment19, it's hard to tell if there is anything wrong, because it's possible that the next sample hasn't been appened to the buffered range and then causing a demuxing failure. In the end of the log, MFR returned audio 35942333 based on the internal seek target, and there is no any sign showing that A/V will be unsync-ed later. (need more logs) In addition, MFTAudio decoder won't insert any silence during draining, but I found this will change the audio's start time, which looks suspicious. (I feel that maybe Windows 11 changes this behavior, which causes this new issue)

In addition, the profiled result in comment21 looks perfect, by checking the playAudio and playVideo markers on the MDSM thread. It shows the A/V sync well, it doesn't look the same as the one in comment8, which shows the big shift between audio and video.

Paul, would you mind to capture the full log (MOZ_LOG=MediaDecoder:5,MediaFormatReader:5,PlatformDecoderModule:5,MediaSource:5) this time? and capture the profiled result at the same time?

Thank you.

Flags: needinfo?(padenot)

This seems pretty similar with the situaion paul has seen. Now I'm installing the Windows 11 to see if I will be able to reproduce the issue.

Now I can sometime reproduce this issue on my Windows 11 Dev 22471.1000, I'm checking the log to see if I can provide more information.

Flags: needinfo?(padenot)

I think I found the root cause. This is defintely related with MFT decoder. Calling draining on decoder isn't wrong, the problem is MFT decoder returns incorrect sample, then causes the number of output samples is even greater than the number of input samples we gave to decoder.

For example, following is a simplied log, I only keep some important part.

[Child 22732: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[2c676971000] ::RequestDemuxSamples: Requesting extra demux Audio
[Child 22732: MediaSupervisor #2]: D/MediaFormatReader MediaFormatReader[2c676971000] ::OnDemuxFailed: Failed to demux audio, failure:NS_ERROR_DOM_MEDIA_WAITING_FOR_DATA
[Child 22732: MediaSupervisor #2]: D/MediaFormatReader MediaFormatReader[2c676971000] ::DrainDecoder: Requesting Audio decoder to drain
[Child 22732: MediaPDecoder #2]: D/PlatformDecoderModule AudioTrimmer[2c676c5b020] ::Drain: Draining
[Child 22732: MediaSupervisor #1]: V/MediaFormatReader MediaFormatReader[2c676971000] ::NotifyNewOutput: Received new Audio sample time:1277096 duration:23219
[Child 22732: MediaSupervisor #1]: D/MediaFormatReader MediaFormatReader[2c676971000] ::NotifyNewOutput: Done processing new Audio samples
[Child 22732: MediaSupervisor #1]: D/MediaFormatReader MediaFormatReader[2c676971000] ::ReturnOutput: Resolved data promise for Audio [1277096, 1300315]
[Child 22732: MediaSupervisor #1]: D/MediaFormatReader MediaFormatReader[2c676971000] ::DrainDecoder: Requesting Audio decoder to drain
[Child 22732: MediaPDecoder #2]: D/PlatformDecoderModule AudioTrimmer[2c676c5b020] ::Drain: Draining
[Child 22732: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[2c676971000] ::Update: Update(Audio) ni=0 no=1 in:56 out:56 qs=0 decoding:0 flushing:0 desc:wmf audio decoder (RDD remote) pending:0 waiting:1 eos:0 ds:2 sid:0 waitcdm:0
[Child 22732: MediaSupervisor #1]: V/MediaFormatReader MediaFormatReader[2c676971000] ::NotifyNewOutput: Received new Audio sample time:1277096 duration:14988
[Child 22732: MediaSupervisor #1]: D/MediaFormatReader MediaFormatReader[2c676971000] ::ReturnOutput: Resolved data promise for Audio [1277096, 1292084]
[Child 22732: MediaSupervisor #2]: V/MediaFormatReader MediaFormatReader[2c676971000] ::Update: Update(Audio) ni=0 no=1 in:56 out:57 qs=4294967295 decoding:0 flushing:0 desc:wmf audio decoder (RDD remote) pending:0 waiting:1 eos:0 ds:2 sid:0 waitcdm:0

First, when we tried to get the next audio sample, which hadn't been appended onto the buffered range yet, it would cause a demux failure with error NS_ERROR_DOM_MEDIA_WAITING_FOR_DATA and trigger drainning decoder. Then we got the first correct output [1277096, 1300315] which duration is 23219 (perfect, it's 1024 (AAC frames) / 44100 ~= 23219) But notice that, the MFT decoder also returns another incorrect output to us [1277096, 1292084] which duration is 14988, even if there should be nothing to output. So you can see in the end of the log, we input 56 audio samples, but got 57 audio outputs. Those kind of outputs seem only containing silence, and we kept receiving those samples whenever we call drain, which causes audio stuttering.

I can confirm that once we filter out those incorrect samples, we will no longer encounter this issue. In addition, an easy way to reproduce this issue is to switch the resolution, which would trigger this to drain the decoder as well.

Assignee: padenot → alwu
Flags: needinfo?(bvandyk)
Severity: S3 → S2
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Unspecified → Windows 11

Here are more details why I said the problem is in MFT decoder. The following are some debug log, ProcessDecode ... means we feed the input to MFTDecoder here. Decoded audio ... means we get the audio output from MFT decoder here (but I added one more check for discontinuity in my build). Some new log I added are, Ask MFT to drain here and MFT tell us ... here.

2021-10-14 18:45:40.471000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17786915
2021-10-14 18:45:40.471000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule Decoded audio sample! timestamp=17763695 duration=23219 currentLength=8192 discontinuity=0
2021-10-14 18:45:40.471000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17810135
2021-10-14 18:45:40.471000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule Decoded audio sample! timestamp=17786915 duration=23219 currentLength=8192 discontinuity=0
2021-10-14 18:45:40.471000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule Ask MFT to drain
2021-10-14 18:45:40.471000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule Decoded audio sample! timestamp=17810135 duration=23219 currentLength=8192 discontinuity=0
2021-10-14 18:45:40.471000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule Decoded audio sample! timestamp=17833355 duration=14988 currentLength=5288 discontinuity=0
2021-10-14 18:45:40.478000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule MFT tell us MF_E_TRANSFORM_NEED_MORE_INPUT, drain ended
2021-10-14 18:45:40.478000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule ProcessFlush, type=Audio
2021-10-14 18:45:40.478000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17740475
2021-10-14 18:45:40.478000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17763695
2021-10-14 18:45:40.478000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule Unable to retrieve channel layout. Ignoring
2021-10-14 18:45:40.478000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule Decoded audio sample! timestamp=0 duration=8231 currentLength=2904 discontinuity=1
2021-10-14 18:45:40.478000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17786915
2021-10-14 18:45:40.478000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule Decoded audio sample! timestamp=17763695 duration=23219 currentLength=8192 discontinuity=0
2021-10-14 18:45:40.478000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17810135
2021-10-14 18:45:40.478000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule Decoded audio sample! timestamp=17786915 duration=23219 currentLength=8192 discontinuity=0
2021-10-14 18:45:40.558000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17833355
2021-10-14 18:45:40.567000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule Decoded audio sample! timestamp=17810135 duration=23219 currentLength=8192 discontinuity=0
2021-10-14 18:45:40.567000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17856575
2021-10-14 18:45:40.567000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule Decoded audio sample! timestamp=17833355 duration=23219 currentLength=8192 discontinuity=0
2021-10-14 18:45:40.567000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17879794
2021-10-14 18:45:40.567000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule Decoded audio sample! timestamp=17856575 duration=23219 currentLength=8192 discontinuity=0
2021-10-14 18:45:40.567000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17903014
2021-10-14 18:45:40.567000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule Decoded audio sample! timestamp=17879794 duration=23219 currentLength=8192 discontinuity=0
2021-10-14 18:45:40.567000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule Ask MFT to drain
2021-10-14 18:45:40.567000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule Decoded audio sample! timestamp=17903014 duration=23219 currentLength=8192 discontinuity=0
2021-10-14 18:45:40.567000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule Decoded audio sample! timestamp=17926234 duration=14988 currentLength=5288 discontinuity=0
2021-10-14 18:45:40.567000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule MFT tell us MF_E_TRANSFORM_NEED_MORE_INPUT, drain ended
2021-10-14 18:45:40.578000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule ProcessFlush, type=Audio
2021-10-14 18:45:40.578000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17833355
2021-10-14 18:45:40.583000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17856575
2021-10-14 18:45:40.583000 UTC - [RDD 27332: MediaPDecoder #1]: D/PlatformDecoderModule Decoded audio sample! timestamp=0 duration=8231 currentLength=2904 discontinuity=1
2021-10-14 18:45:40.583000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule ProcessDecode, type=Audio, sample=17879794
2021-10-14 18:45:40.583000 UTC - [RDD 27332: MediaPDecoder #2]: D/PlatformDecoderModule Decoded audio sample! timestamp=17856575 duration=23219 currentLength=8192 discontinuity=0

So you can see, before we called draining, the last input we sent to MFT decoder is 17810135. But it returns two (17810135 with correct duration 23219 and another is 17833355 with incorrect duration 14988)

According to MS's documentation, draining should not return an incompleted partial sample, but it's apparently not true on Windows 11, the second sample MFT decoder returned was wrong!

In addition, another interesting thing is that, their document said The first sample after the drain operation must have the discontinuity attribute, and you can see the sample with discontinuity attribute is also a wrong one. It has timestamp 0 and incorrect duration 8231. The previous incorrect one is 14988 and the sum of these two incorrect duration is exactly 14988+8231=23219!

So it seems that MFT decoder splits an empty sample to two, and return them BEFORE AND AFTER draining. We also encounter a similar issue for video in bug 1718709 where the first video sample MFT decoder gave us is wrong!

On Windows 11, we found a bug in MFT decoder, when calling draining multiple times, even if there is nothing to drain, the decoder doesn't return MF_E_TRANSFORM_NEED_MORE_INPUT, but return an invalid sample instead. The decoder will also return another invalid sample after draining, those two samples are all partial size and combining them together makes the perfect size.

Those invalid samples are all silence and make us have a lots those samples whenever we drain the decoder, which causes audio stuttering. Therefore, we want to ignore those invalid samples.

At the time we wrote this patch, there is no any API to detect whether users are using Windows 11 or not, so we aren't able to make the change only happening on Windows 11 enviroment.

More details could be found in [1].

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1731430#c26

Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1117eee12de7
detect and discard the partial AAC output samples on Windows MFT decoder. r=bryce,padenot
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 95 Branch

Comment on attachment 9246061 [details]
Bug 1731430 - detect and discard the partial AAC output samples on Windows MFT decoder.

Beta/Release Uplift Approval Request

  • User impact if declined: Twitch and Youtube Live (and possibly any website that uses the AAC codec, such as spotify, amazon music, etc.) have audio glitches, and audio/video synchronization is broken, on Windows 11. We've been able to reproduce on any machine with Windows 11.

We don't have tests for this because there are no Windows 11 machine on treeherder yet.

  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Open any twitch stream on a Windows 11 machine for about 5 minutes. If A/V sync is correct after 5 minutes, this bug is fixed.
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): This is working around what is likely an OS bug. We think we understood what is going on and are able to confidently work around it.
  • String changes made/needed: none
Attachment #9246061 - Flags: approval-mozilla-beta?
Flags: qe-verify+
QA Whiteboard: [qa-triaged]

Comment on attachment 9246061 [details]
Bug 1731430 - detect and discard the partial AAC output samples on Windows MFT decoder.

Not crazy about taking this so late in the cycle, but the ease of reproducing on any Win11 machine combined with its increasing rollout makes me think it's probably better to get this shipped before Fx95 in December. Approved for 94.0b9.

Attachment #9246061 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

I've tried to reproduce this issue using two different laptops with Win 11 installed, but I'm not experience any audio pops on twitch livestreams. I've also deactivated low latency mode from twitch settings, thinking that it may help reproduce the bug (as suggested in the above comments). The audio continued to work as expected in my case, with an affected Nightly build 94.0a1 (20210920212918).

Hi fenixfurion! Could you please help us verify if this bug is fixed on your end, using Beta 94 and Nightly 95?

Flags: needinfo?(fenixfurion)

Verified fixed on Beta 94 and Nightly 95.
Older releases are broken as expected.

Thanks all!

Flags: needinfo?(fenixfurion)

verified fixed

Status: RESOLVED → VERIFIED

Hello,
This is an issue I have had the last few weeks. I just got Beta 94, I will do some tests and see if the problem is gone.
Thank you for the help!

Sincerely,
DD

Thanks, fenixfurion for you confirmation! I'm going to close this as verified fixed per comment 31.

DD, feel free to comment here, in case you still encounter the bug on the fixed builds.

Should we consider taking this on ESR91 also? It grafts cleanly.

Flags: needinfo?(alwu)

Comment on attachment 9246061 [details]
Bug 1731430 - detect and discard the partial AAC output samples on Windows MFT decoder.

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Twitch and Youtube Live (and possibly any website that uses the AAC codec, such as spotify, amazon music, etc.) have audio glitches, and audio/video synchronization is broken, on Windows 11. We've been able to reproduce on any machine with Windows 11.
  • User impact if declined: audio glitches on some popular media sites
  • Fix Landed on Version: 95
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): What we do in this patch is to filter out some incomplete samples, which contains no audio data. So doing that won't hurt on any meaningful audio data.

Although we don't have tests for this because there are no Windows 11 machine on treeherder yet. This patch has been baking on Fx95 for a long while, and we didn't see any regression so far, so it should be low risk to uplift this.

Flags: needinfo?(alwu)
Attachment #9246061 - Flags: approval-mozilla-esr91?

Comment on attachment 9246061 [details]
Bug 1731430 - detect and discard the partial AAC output samples on Windows MFT decoder.

Approved for 91.5esr.

Attachment #9246061 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
You need to log in before you can comment on or make changes to this bug.