Closed Bug 1416473 Opened 7 years ago Closed 7 years ago

Youtube throws and error after moving to a new position/time for the second time

Categories

(Core :: Audio/Video: Playback, defect)

58 Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla59
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- unaffected
firefox57 --- unaffected
firefox58 --- verified
firefox59 --- verified

People

(Reporter: aehm_key, Assigned: jya)

References

Details

(Keywords: regression)

Attachments

(8 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Build ID: 20171110100139

Steps to reproduce:

1. start to play a youtube video
2. klick in the timeline to jump to an new position
3. klick a second time in the timeline to jump to a new position

Last working version:
build_date: 2017-11-10 07:52:26.382000
changeset: 257e3b6e6cf823fa6a5b38dc4e68696f94acd0f6

First version with the bug:
build_date: 2017-11-10 08:16:02.743000
changeset: ee9d4052e9493b3e8bca00f3bead8f1ea22b933b

DEBUG : Found commit message:
Bug 1402283 - Enforce arena matching on moz_arena_{realloc,free}. r=njn
This enforces the API contract as described in memory/build/malloc_decls.h.
INFO : The bisection is done.


Actual results:

Youtube shows error message: An error has occurred.
https://support.google.com/youtube/answer/3037019?hl=en


Expected results:

The video should play from the new position.
Blocks: 1402283
Has Regression Range: --- → yes
Component: Untriaged → Memory Allocator
Keywords: regression
Product: Firefox → Core
It affects Firefox 58 & 59 from what I'm seeing. 

I'd also like to note that when clicking a new video timeline position the already downloaded video buffer is reset so I think the video completely reloads on timeline hop and spits the error on the 2nd hop.

Youtube debug info: 
{
  "ns": "yt",
  "el": "detailpage",
  "cpn": "UsAMmbtLvvZaek6n",
  "docid": "-CMIZLcrh7M",
  "ver": 2,
  "cmt": "100.358",
  "plid": "AAVd4SwtmFGzk7ky",
  "ei": "3d8JWsWaGZGvwQGTjJeYBQ",
  "fmt": "248",
  "fs": "0",
  "rt": "13.565",
  "of": "IQhBaIfgMt2H-X5t7FmTrw",
  "euri": "",
  "lact": 1,
  "cl": "175185942",
  "mos": 0,
  "state": "99",
  "vm": "CAEQARgE",
  "volume": 93,
  "c": "WEB",
  "cver": "2.20171108",
  "cplayer": "UNIPLAYER",
  "cbr": "Firefox",
  "cbrver": "59.0",
  "cos": "Windows",
  "cosver": "10.0",
  "hl": "en_US",
  "cr": "US",
  "len": "299.221",
  "fexp": "23700245,23700587,23702322,23702700,23704248,23705485,23705739,23706129,23708159,23708434,23708904,23708906,23708910,23709525,23709714,23710727,23711207,23711231,23711315,23711501,23712349,9407610,9422596,9431754,9449243,9453077,9458230,9463574,9474241,9474595,9475679,9476327,9477113,9480475,9480776,9484237,9484345,9485000,9486769,9487591,9488038,9489766",
  "afmt": "251",
  "vct": "0.000",
  "vd": "NaN",
  "vpl": "0.000-0.848,83.055-83.377,",
  "vbu": "",
  "vpa": "true",
  "vsk": "false",
  "ven": "false",
  "vpr": "1",
  "vrs": "0",
  "vns": "0",
  "vec": "null",
  "vvol": "0.93",
  "creationTime": 14394.275,
  "totalVideoFrames": 0,
  "droppedVideoFrames": 0,
  "corruptedVideoFrames": 0,
  "debug_error": {
    "errorCode": "fmt.decode",
    "errorDetail": "mediaElem.1;msg.NS_ERROR_DOM_MEDIA_DECODE_ERR (0x806e0004)",
    "message": "An error occurred. Please try again later.",
    "messageKey": "YTP_ERROR_GENERIC_WITHOUT_LINK"
  },
  "ismb": 11210000,
  "relative_loudness": "NaN",
  "optimal_format": "1080p",
  "user_qual": "hd1080",
  "debug_videoId": "-CMIZLcrh7M",
  "0sz": false,
  "op": "",
  "yof": false,
  "dis": "",
  "gpu": "ANGLE_(Intel(R)_HD_Graphics_620_Direct3D11_vs_5_0_ps_5_0)",
  "cgr": true,
  "debug_playbackQuality": "hd1080",
  "debug_date": "Mon Nov 13 2017 13:09:47 GMT-0500 (Eastern Standard Time)"
}
Yeah I've regressed this bug back to this 


------------------
Using url: https://hg.mozilla.org/integration/mozilla-inbound/json-pushes?changeset=382939cc1ff5dc276d7c7d70d14fece9fbd59e77&full=1
2017-11-13T16:33:55: DEBUG : Found commit message:
Bug 1409176 - handle decoder error in WMFMediaDataDecoder::ProcessDecode(). r=jya,mattwoodrow

The DXVA decoder might hit some error because of the hardware device status.
This patch try to pass the error code to the decoder promise object to
deal with the error.

MozReview-Commit-ID: IAj8gzKGF3j
------------------

Talks about decoder issues which are also mentioned in my debug code from YouTube commented above.
Thanks for finding the regression bug, Tyler! That is a huge help.

I'm moving this bug to the "Audio/Video Playback" Bugzilla component so the Firefox video developers will see it.
Blocks: 1409176
No longer blocks: 1402283
Component: Memory Allocator → Audio/Video: Playback
:jerry, it might be caused by Bug 1409176.
Flags: needinfo?(hshih)
I can't reproduce the issue with my laptop(win10 & nvidia 1070m video card). Here is my screen recording to play a video:
https://drive.google.com/file/d/1DzB5lNQpEydpBASr9mGou7qDb6hQvf_P/view?usp=sharing

Could you please upload your system information? The "about:support" page.
And could you also show the youtube link you tested before?
Flags: needinfo?(hshih)
Flags: needinfo?(aehm_key)
Flags: needinfo?(TylerMcC2013)
Hi guys, 

It seems that someone got ahead of me. I have also encountered this bug last week starting with Tuesday. I kept investigating it and discussing about it with another developer in the last days, so I kept delayed logging it.

Here are my findings:
First I thought that the issue only reproduces on my profile and it's a combination of addblocker and YouTube player. Using mozregression with my default profile forced on each build launched, I ended up to bug 1414596, which I also reopened because of this.
Later the bug was again closed and I found out that the issue has nothing to do with the adblocker, since disabling it does not fix the issue, but safe mode did. So after discussing with Jean-Yves who told me about the VP9 measurement at first playback, I have retried to reproduce the issue on a new profile. At first YT playback attempt the issue does not reproduces since the VP9 decoder does its measurements, but if you refresh the video page, the issue starts to occur at 100% rate after you seek the video twice. 

I've decided to rerun the regression window search using a new profile, that led me again to the exact same bug 1414596. But while doing this I have observed a second behavior behavior. While following the normal behavior, mozregression throws me after a while a build where if I seek just once it crashes the YT video.
Because of this I've decided to make a parallel regression range search following the second behavior. It led to bug 1409176, but I'm still not convinced that the issue found on second search is the right one. I'm saying this because when the issue occurs from the first seek, after a page refresh, I cannot reproduce it anymore. So this happens only once at the first seek attempt and after you can seek the YT video as much as you want.

After discussing with Jean-Yves on Friday evening, I've asked some details about the prefs used for VP9 decoder. What I found out is that if I alter the "media.benchmark.vp9.threshold" value to be at least 1 unit above the measured "media.benchmark.vp9.fps" value, the issue no longer occurs. The same thing happens if I set "media.mediasource.enabled" to "false". 
Based on this results, I'm still inclining to the original bug 1414596 or at least a combination of both. But I'm not a developer so it's up to you guys to figure it out.

Here is a recording of the issue containing also the behaviors when I'm altering the VP9 prefs: https://goo.gl/A4gU8V.
And here is my "about:support" page content, if it helps you in any way: https://goo.gl/F3d8P7.
My PC it's an i7 (3rd gen) combined with a ASUS GeForce GTX 1070 STRIX video card.
Here is my "about:support" page content http://tinyurl.com/y6uguhjh (no clean profile to be honest..).
This issue happend with all youtube videos I've tried.
Flags: needinfo?(aehm_key)
Yeah Paul when I did the regression if you ran it with a new profile it didn't catch anything at all. Had to run it with a premade one and just set it to reuse.

When I get home I'll comment my about:support. I do have all media codes enabled.
Flags: needinfo?(TylerMcC2013)
I'm confident the bug occurs on every video but I tested on https://www.youtube.com/watch?v=1ZAPwfrtAFY&pbjreload=10
And the html in that folder is my detailed system specs from Belarc Advisor. Forgot you can't view html rendered from google drive though so going to need to download and open in browser. Sorry.
I could not produce the problem with the STR in Comment 0, but I saw the similar problem recently on my one Win PC with WebRender.
I saw often the error recently during playback youtube video
See Also: → 1417287
Attached file about:support (text)
Attached file about_support_raw.txt
Attached file Specs.html (obsolete) —
Attached file Specs.html
Attachment #8928381 - Attachment is obsolete: true
Can someone delete my obsolete attachment please and thank you.
(In reply to Sotaro Ikeda [:sotaro] from comment #12)
> I could not produce the problem with the STR in Comment 0, but I saw the
> similar problem recently on my one Win PC with WebRender.

:sotaro, 
The error you saw is expected. I can see the dxgi device reset error code in your about:support.
失敗ログ
(#0): GP+[GFX1-]: [D3D11] 2 CreateTexture2D failure Size: Size(1920,1080)texture11: false Code: 0x887a0005
(#1): GP+[GFX1-]: Failed to OpenSharedResource for SyncObjectD3D11: 0x887a0005
The Bug 1409176 is trying to handle the video card device reset problem(when you update/remove your video card driver or un-plug the video card while playing a video).
I don't know why the error occurred when you seek the video. Could you reload the youtube video page and replay again? After reloading, does the problem still exist?
Flags: needinfo?(aehm_key)
Flags: needinfo?(TylerMcC2013)
:jya,
Any idea for the Bug 1409176 changes? I don't know why we will have the keyed_mutex or openSharedHandle() error when we seek the video.
Flags: needinfo?(jyavenard)
Yeah it still exists if you reload the same video. Lets you seek once then seems to reload the whole video from that point and then when you seek again it pops the error. Seems to occur for every video and the same video if reloaded. This is a bug that really annoys me so I'll be constantly on here from now for a few hours if need any more info from me. I'd gladly screen share and do whatever to see why you can not reproduce the bug.
Flags: needinfo?(TylerMcC2013)
Just put my YouTube Debug info as attachment for ease of use.
I found a device with i7 8550u cpu which could reproduce the video seek problem. I'm add some logs to check the reason now.
Sorry for this problem.
Okay, I'll be here if you need anything form me.
Same issue after reloading, like Tyler already described.
Flags: needinfo?(aehm_key)
When seeking, the decoder is drained and then flushed, often that reset the device. So could be related
Flags: needinfo?(jyavenard)
I think you need a device that support VP9 hardware decoding, so nvidia 10xx or Intel 7th gen and later
(In reply to Jean-Yves Avenard [:jya] from comment #30)
> I think you need a device that support VP9 hardware decoding, so nvidia 10xx
> or Intel 7th gen and later

Are you saying you need a device that supports VP9 hardware decoding to cause the bug or to prevent it? 

This bug is occurring on my 7th Gen Intel® Core™ i7-7500U Processor
(In reply to Tyler McCartney from comment #31)
> (In reply to Jean-Yves Avenard [:jya] from comment #30)
> > I think you need a device that support VP9 hardware decoding, so nvidia 10xx
> > or Intel 7th gen and later
> 
> Are you saying you need a device that supports VP9 hardware decoding to
> cause the bug or to prevent it? 
> 
> This bug is occurring on my 7th Gen Intel® Core™ i7-7500U Processor

to cause it...

well, more exactly, it shows when you're using VP9, and so far all the reports I've seen are people on windows on machines with a VP9 hardware decoder.

To fully confirm, set media.wmf.vp9.enabled to false on your machine. Can you reproduce the problem?

The changes made by bug 1409176 aren't specifically related to the VP9 decoder, yet somehow it occurs.
Maybe DXVA returns a particular error code when using the vp9 decoder which is treated as fatal error when it's not.. I don't know.
Setting media.wmf.vp9.enabled to false does fix the problem. 

Yeah I prefer VP9 and have ran with these config options with every version of Firefox pre-57 and have never had a problem till now.  

user_pref("media.mediasource.enabled", true);
user_pref("media.mediasource.webm.enabled", true);
user_pref("media.mediasource.webm.audio.enabled", true);
user_pref("media.mediasource.mp4.enabled", true);
user_pref("preference media.mediasource.ignore_codecs", true); 


Running Firefox with my NVIDIA GeForce GTX 950M also fixes the problem
I'm waiting the i7 7th gen device from my friend.
My laptop with nvidia 1070m support vp9. I can see my debug log for hardware dxva decoding. But I can't see the error when seeking. :(
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #34)
> I'm waiting the i7 7th gen device from my friend.
> My laptop with nvidia 1070m support vp9. I can see my debug log for hardware
> dxva decoding. But I can't see the error when seeking. :(

Try running your Firefox with your CPU's discrete graphics maybe?
The 950M doesn't support VP9.

You can see which decoder is in use by installing the media devtools panel (https://addons.mozilla.org/en-US/firefox/addon/devtools-media-panel/)

start playing the video, press Ctrl-Shit-I then go to the media tab on the right. Press Refresh Now , using the Start Timer will work too and will show what's happening in real time.
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #34)
> I'm waiting the i7 7th gen device from my friend.
> My laptop with nvidia 1070m support vp9. I can see my debug log for hardware
> dxva decoding. But I can't see the error when seeking. :(

make sure media.mediasource.webm.enabled is set to true (this will override any code that could disable VP9)
(In reply to Jean-Yves Avenard [:jya] from comment #36)
> The 950M doesn't support VP9.
> 
> You can see which decoder is in use by installing the media devtools panel
> (https://addons.mozilla.org/en-US/firefox/addon/devtools-media-panel/)
> 
> start playing the video, press Ctrl-****-I then go to the media tab on the
> right. Press Refresh Now , using the Start Timer will work too and will show
> what's happening in real time.

Yeah you said you need Nvidia 10xx to have support for VP9 decoding (which causes the bug) which is why I confirmed that my 950M does not produce the bug.
I can reproduce on my 2017 Dell XPS 15 (9560)

will look into it.
(In reply to Tyler McCartney from comment #38)
> Yeah you said you need Nvidia 10xx to have support for VP9 decoding (which
> causes the bug) which is why I confirmed that my 950M does not produce the
> bug.

The nvidia 960GT (not M) (e.g. all the maxwell architecture) do support it... so it can be a bit confusing.
Hi!

Valid solution, thanks! (In "about:config", change "media.wmf.vp9.enabled" to false).

Thanks!
Update the consecutive error counting method in MediaFormatReader::Update().
Then, we could have a limitation of decoder recreation time.

MozReview-Commit-ID: BXLcAhfdQFB
Attachment #8928481 - Flags: review?(jyavenard)
I got the device from my friend, but it's too late today. I will setup the environment and test again tomorrow.

:jya,
How do you think if we try to recreate the decoder first when we have that sync object error?
Now, we could have a limit of recreation time.
Flags: needinfo?(jyavenard)
Recreating the device appears unnecessary to me. It was never done before and worked just fine for years. I don't want to get an unexpected regression later on. 
I would much prefer if we can detect false positive and only act on that condition if it's an actual error. 

Eg only use the new code if we're certain it's an error. What error code is being returned, is it actually an error?
Flags: needinfo?(jyavenard)
Comment on attachment 8928481 [details] [diff] [review]
try to recreate a new decoder for dxva sync obj error.

Review of attachment 8928481 [details] [diff] [review]:
-----------------------------------------------------------------

this is not the reason why decoding fails...

it's a silly regression introduced in bug 1409176, that is not handling the MF_E_NOTACCEPTING message properly
https://searchfox.org/mozilla-central/rev/a662f122c37704456457a526af90db4e3c0fd10e/dom/media/platforms/wmf/WMFMediaDataDecoder.cpp#135
Attachment #8928481 - Flags: review?(jyavenard) → review-
Assignee: nobody → jyavenard
interestingly, this issue would only occur in nightly.
https://searchfox.org/mozilla-central/rev/a662f122c37704456457a526af90db4e3c0fd10e/dom/media/MediaPrefs.h#165

in beta and release we allow for multiple errors...
Very interesting indeed
Comment on attachment 8928638 [details]
Bug 1416473 - Don't treat MF_E_TRANSFORM_NEED_MORE_INPUT as fatal error.

https://reviewboard.mozilla.org/r/199874/#review205044
Attachment #8928638 - Flags: review?(matt.woodrow) → review+
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c2f6b0cea0d2
Don't treat MF_E_TRANSFORM_NEED_MORE_INPUT as fatal error. r=mattwoodrow
Can confirm that build fixes the bug when under the conditions that was causing the bug. 

media.wmf.vp9.enabled set as default/true
Video Decoder(video/vp9, 1920x1080 @ 30.01) : "wmf hardware video decoder (remote)"
Aehm and Tyler, thanks for reporting and testing this bug!

This is a YouTube(In reply to Jean-Yves Avenard [:jya] from comment #46)
> interestingly, this issue would only occur in nightly.
> https://searchfox.org/mozilla-central/rev/
> a662f122c37704456457a526af90db4e3c0fd10e/dom/media/MediaPrefs.h#165
> 
> in beta and release we allow for multiple errors...

jya, if we allow multiple decode errors in Beta and Release, does that mean we DON'T need to uplift this fix to Beta 58?
Flags: needinfo?(jyavenard)
Hey there, we (YouTube) would like to help verify the fix with our production telemetry or provide any other assistance we can.

We're seeing Fx57 have a 2% playback failure rate, and this is after a single transparent restart on our side. This failure rate is pretty extraordinary - an expectation from other browsers is about 0.02%, so we consider this to have major user impact.
Chris: Oh, we do want to uplift.
It's a simple fix. Error recovery isn't always possible and would cause to seek further than the seek point by typically around 4s (to the next keyframe) 

Strobe, this issue doesn't affect fx57. Only current nightly and beta 58.
What type of decoding error are you seeing?
Would it be MEDIA_ERR_SRC_NOT_SUPPORTED by any chance?

We have had reports about those spiking up in 57 but we haven't been able to pinpoint the problem yet.
Flags: needinfo?(jyavenard)
Yep, this is SRC_NOT_SUPPORTED in 57, so a different issue. Should we move to a new bug or continue here?

What I've found so far:

- The errors are indeed happening before playback starts (some browsers report src_not_supported after playback starts, against spec, this isn't the case here)

- Our logging isn't perfect about the exact callstack here without new code (which would take a few days to push), but it appears to be happening before 'sourceopen' fires

- Accordingly, it's not related to any particular codecs or media data

- Retries instantly, universally fail the same way

- The events are very strongly correlated; once a session gets in this state, it remains there

- This is happening on Windows and Mac, on all major OS versions; Linux and mobile devices are not affected

- Geographical regions associated with worse internet performance are less likely to be affected, curiously - this suggests a race condition to me (the slower the internet condition, the less likely a user is to be able to execute a series of interactions quickly)

- There's no immediately evident correlation with what we had just done in the previous playback; sometimes it appears that the first playback in a tab will trigger it
https://hg.mozilla.org/mozilla-central/rev/c2f6b0cea0d2
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Comment on attachment 8928638 [details]
Bug 1416473 - Don't treat MF_E_TRANSFORM_NEED_MORE_INPUT as fatal error.

Approval Request Comment
[Feature/Bug causing the regression]: 1409176
[User impact if declined]: Decoding error while watching YouTube video
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: yes (manual build provided and tested by users)
[Needs manual test from QE? If yes, steps to reproduce]: extra test would be good, steps are described in bug description
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: For a start it can't be any worse that what it is now, then we ignore a specific decoder return code and don't treat it as fatal error.
[String changes made/needed]: none
Attachment #8928638 - Flags: approval-mozilla-beta?
Comment on attachment 8928638 [details]
Bug 1416473 - Don't treat MF_E_TRANSFORM_NEED_MORE_INPUT as fatal error.

Fix a youtube playback issue. Beta58+.
Attachment #8928638 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify+
We managed to verify the fix on Windows 10 x64 (nVidia GTX 1070, driver v388.13), using latest Nightly 59 and beta 58.0b6 both with new profile and a dirty one. The bug is not reproducing anymore.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: