Closed Bug 1417869 Opened 2 years ago Closed 2 years ago

High rate of MEDIA_ERR_SRC_NOT_SUPPORTED error being thrown

Categories

(Core :: Audio/Video: Playback, defect, P2)

57 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox57 --- fixed
firefox58 --- fixed
firefox59 --- fixed

People

(Reporter: jya, Assigned: jwwang)

References

Details

Attachments

(6 files, 3 obsolete files)

21.88 KB, patch
jwwang
: review+
Details | Diff | Splinter Review
59 bytes, text/x-review-board-request
adw
: review+
Mardak
: review+
Details
59 bytes, text/x-review-board-request
jya
: review+
Details
102.96 KB, image/png
Details
1.91 KB, patch
Details | Diff | Splinter Review
705.20 KB, image/png
Details
Both Twitch and YouTube have reported spike of MEDIA_ERR_SRC_NOT_SUPPORTED being returned.

2% of Firefox 57 users are getting it watching YouTube (this is huge)

from bug 1416473 comment 56 

- The errors are indeed happening before playback starts (some browsers report
src_not_supported after playback starts, against spec, this isn't the case
here)

- Our logging isn't perfect about the exact callstack here without new code
(which would take a few days to push), but it appears to be happening before
'sourceopen' fires

- Accordingly, it's not related to any particular codecs or media data

- Retries instantly, universally fail the same way

- The events are very strongly correlated; once a session gets in this state,
it remains there

- This is happening on Windows and Mac, on all major OS versions; Linux and
mobile devices are not affected

- Geographical regions associated with worse internet performance are less
likely to be affected, curiously - this suggests a race condition to me (the
slower the internet condition, the less likely a user is to be able to execute
a series of interactions quickly)

- There's no immediately evident correlation with what we had just done in the
previous playback; sometimes it appears that the first playback in a tab will
trigger it

Note: Twitch had seen those spiking starting in 56.
Priority: -- → P2
Version: 55 Branch → 57 Branch
Strobe, would you by chance to log the content of the MediaError.message attribute?

I suspect it will be empty.

JW, got any ideas about this problem
Flags: needinfo?(strobe)
Flags: needinfo?(jwwang)
MediaError.message is empty as expected.
Flags: needinfo?(strobe)
No, I have no idea. We should insert descriptive messages into MediaError.message so we have a chance to know where this MEDIA_ERR_SRC_NOT_SUPPORTED comes from.
Flags: needinfo?(jwwang)
Attachment #8929334 - Flags: review?(jyavenard)
Attachment #8929335 - Flags: review?(jyavenard)
Attachment #8929336 - Flags: review?(jyavenard)
Comment on attachment 8929335 [details]
Bug 1417869. P2 - pass descriptive messages to NotifyLoadError().

https://reviewboard.mozilla.org/r/200656/#review205830

::: dom/html/HTMLMediaElement.cpp:583
(Diff revision 1)
>    } else {
>      // If InitializeDecoderForChannel() returned an error, fire a network error.
>      if (NS_FAILED(rv) && !mNextListener) {
>        // Load failed, attempt to load the next candidate resource. If there
>        // are none, this will trigger a MEDIA_ERR_SRC_NOT_SUPPORTED error.
> -      element->NotifyLoadError();
> +      element->NotifyLoadError(NS_LITERAL_CSTRING("Fail to init decoder"));

Failed to init decoder

::: dom/html/HTMLMediaElement.cpp:1282
(Diff revision 1)
>      }
>  
>      rv = channel->AsyncOpen2(loadListener);
>      if (NS_FAILED(rv)) {
>        // Notify load error so the element will try next resource candidate.
> -      aElement->NotifyLoadError();
> +      aElement->NotifyLoadError(NS_LITERAL_CSTRING("Fail to open channel"));

Failed to ...
Attachment #8929335 - Flags: review?(jyavenard) → review+
Comment on attachment 8929334 [details]
Bug 1417869. P1 - show more descriptive messages for MEDIA_ERR_SRC_NOT_SUPPORTED when SelectResource() fails.

https://reviewboard.mozilla.org/r/200654/#review205832

::: dom/html/HTMLMediaElement.cpp:2563
(Diff revision 1)
>      if (!mMediaSource->Attach(decoder)) {
>        // TODO: Handle failure: run "If the media data cannot be fetched at
>        // all, due to network errors, causing the user agent to give up
>        // trying to fetch the resource" section of resource fetch algorithm.
>        decoder->Shutdown();
> -      return NS_ERROR_FAILURE;
> +      return MediaResult(NS_ERROR_FAILURE, "Fail to attach MediaSource");

Failed to

::: dom/html/HTMLMediaElement.cpp:2574
(Diff revision 1)
>        LOG(LogLevel::Debug,
>            ("%p Failed to load for decoder %p", this, decoder.get()));
> -      return rv;
> +      return MediaResult(rv, "Fail to load decoder");
>      }
> -    return FinishDecoderSetup(decoder);
> +    rv = FinishDecoderSetup(decoder);
> +    return MediaResult(rv, "Fail to set up decoder");

Failed to

::: dom/html/HTMLMediaElement.cpp:2584
(Diff revision 1)
>    RefPtr<ChannelLoader> loader = new ChannelLoader;
>    nsresult rv = loader->Load(this);
>    if (NS_SUCCEEDED(rv)) {
>      mChannelLoader = loader.forget();
>    }
> -  return rv;
> +  return MediaResult(rv, "Fail to load channel");

Failed to
Attachment #8929334 - Flags: review?(jyavenard) → review+
Comment on attachment 8929336 [details]
Bug 1417869. P3 - pass descriptive messages to NetworkError().

https://reviewboard.mozilla.org/r/200658/#review205836

::: dom/media/mediasource/MediaSource.cpp:366
(Diff revision 1)
>    switch (aError.Value()) {
>    case MediaSourceEndOfStreamError::Network:
> -    mDecoder->NetworkError();
> +    mDecoder->NetworkError(MediaResult(NS_ERROR_FAILURE, "MSE network"));
>      break;
>    case MediaSourceEndOfStreamError::Decode:
>      mDecoder->DecodeError(NS_ERROR_DOM_MEDIA_FATAL_ERR);

can we also pass on the error returned by the MSE appendBuffer ? when its a decoding error thst occurred?
Attachment #8929336 - Flags: review?(jyavenard) → review+
Comment on attachment 8929336 [details]
Bug 1417869. P3 - pass descriptive messages to NetworkError().

https://reviewboard.mozilla.org/r/200658/#review205836

> can we also pass on the error returned by the MSE appendBuffer ? when its a decoding error thst occurred?

Which "appendBuffer" do you mean?
Comment on attachment 8929336 [details]
Bug 1417869. P3 - pass descriptive messages to NetworkError().

https://reviewboard.mozilla.org/r/200658/#review205836

> Which "appendBuffer" do you mean?

the one originally passed from https://searchfox.org/mozilla-central/source/dom/media/mediasource/SourceBuffer.cpp#494

but upon checking it is a different EndOfStream https://searchfox.org/mozilla-central/source/dom/media/mediasource/MediaSource.cpp#374 so its all good
Thanks for the review. Hope this will give us enough info to debug the MEDIA_ERR_SRC_NOT_SUPPORTED error.
Keywords: leave-open
Comment on attachment 8929334 [details]
Bug 1417869. P1 - show more descriptive messages for MEDIA_ERR_SRC_NOT_SUPPORTED when SelectResource() fails.

https://reviewboard.mozilla.org/r/200654/#review205872

::: dom/html/HTMLMediaElement.cpp:2571
(Diff revision 2)
>      nsresult rv = decoder->Load(mMediaSource->GetPrincipal());
>      if (NS_FAILED(rv)) {
>        decoder->Shutdown();
>        LOG(LogLevel::Debug,
>            ("%p Failed to load for decoder %p", this, decoder.get()));
> -      return rv;
> +      return MediaResult(rv, "Fail to load decoder");

failed to
Pushed by jwwang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b3227020403e
P1 - show more descriptive messages for MEDIA_ERR_SRC_NOT_SUPPORTED when SelectResource() fails. r=jya
https://hg.mozilla.org/integration/autoland/rev/f62967583a7e
P2 - pass descriptive messages to NotifyLoadError(). r=jya
https://hg.mozilla.org/integration/autoland/rev/c0088b6705c9
P3 - pass descriptive messages to NetworkError(). r=jya
Hi Strobe.
So our next nightly will no longer have an empty MediaError.message when this error is returned.

Could you guys keep an eye for when someone experience that problem with the current Firefox 59 and report that information back to us?

This will be extremely appreciated. Thank you.
Flags: needinfo?(strobe)
We should uplift the patches to Beta if they are not risky.
It look the the error is `NS_ERROR_FAILURE (0x80004005) - Media not allowed`.
[Tracking Requested - why for this release]: this causes extremely playback failure with several content provider (Twitch and YouTube
(In reply to Szatmary from comment #21)
> It look the the error is `NS_ERROR_FAILURE (0x80004005) - Media not allowed`.

Awesome thank you.

Are all the ones you saw due to that same error?

This comes from https://searchfox.org/mozilla-central/rev/c9214daa09e3eb8246b1448e469df1652a140825/dom/html/HTMLMediaElement.cpp#2518

This code hasn't changed in 4 years in HTMLMediaElement.

I'm not sure I know what makes docShell to allow media or not.

https://searchfox.org/mozilla-central/source/docshell/base/nsDocShell.cpp#2569

the default value of mAllowMedia is true, so nsDocShell::SetAllowMedia(bool aAllowMedia) was explicitly called with false.

At a glance, SetAllowMedia is called with the value returned by GetParentDocShell()->GetAllowedMedia()

:smaug, :gijs are you aware of any changes that has occurred with 57 that could cause a different behaviour and the docShell reporting that it doesn't support media?
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(bugs)
That is the only message in our logs for Firefox 59 (other than blank)
Ok... it can no longer be blank in the current central...

That you're getting one nightly user, considering the low count of nightly user is pretty significant.
Assignee: nobody → jwwang
Comment on attachment 8929334 [details]
Bug 1417869. P1 - show more descriptive messages for MEDIA_ERR_SRC_NOT_SUPPORTED when SelectResource() fails.

Approval Request Comment
[Feature/Bug causing the regression]: none
[User impact if declined]: it would be hard to diagnose on why we have such high failure rate with very popular video sites.
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: this is not a fix, just adding details to error messages.
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: all three patches to be uplifited
[Is the change risky?]: no. We only add more details to error messages returned.
[Why is the change risky/not risky?]:
[String changes made/needed]: no. the strings used aren't translatable.
Attachment #8929334 - Flags: approval-mozilla-beta?
Same for YT, 'NS_ERROR_FAILURE (0x80004005) - Media not allowed'.
Flags: needinfo?(strobe)
The relevant code at least in docshell is quite old and mAllowMedia is true by default.

https://searchfox.org/mozilla-central/rev/c9214daa09e3eb8246b1448e469df1652a140825/toolkit/components/thumbnails/content/backgroundPageThumbsContent.js#42
uses the property. Do we load more pages as thumbnails?
Flags: needinfo?(bugs)
Could it be that it's when people load pages from say the Highlights / Topsite page and somehow the old property is incorrectly kept and media is disallowed?
Looks like it doesn't apply to beta:

grafting 436888:c0088b6705c9 "Bug 1417869. P3 - pass descriptive messages to NetworkError(). r=jya"
merging dom/html/HTMLMediaElement.cpp
merging dom/html/HTMLMediaElement.h
merging dom/media/ChannelMediaDecoder.cpp
merging dom/media/ChannelMediaResource.cpp
merging dom/media/MediaDecoder.cpp
merging dom/media/MediaDecoderOwner.h
 warning: conflicts while merging dom/media/ChannelMediaDecoder.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
Flags: needinfo?(jwwang)
I will prepare a patch for beta=57.
I mean beta=58.
Approval Request Comment
[Feature/Bug causing the regression]:none
[User impact if declined]:it would be hard to diagnose on why we have such high failure rate with very popular video sites.
[Is this code covered by automated tests?]:no
[Has the fix been verified in Nightly?]:this is not a fix, just adding details to error messages.
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]:none
[Is the change risky?]:no
[Why is the change risky/not risky?]:We only add more details to error messages returned.
[String changes made/needed]:none
Flags: needinfo?(jwwang)
Attachment #8929966 - Flags: review+
Attachment #8929966 - Flags: approval-mozilla-beta?
Attachment #8929334 - Flags: approval-mozilla-beta?
(In reply to Jean-Yves Avenard [:jya] from comment #29)
> Could it be that it's when people load pages from say the Highlights /
> Topsite page and somehow the old property is incorrectly kept and media is
> disallowed?

The thumbnails aren't from the same docshell tree as the actual page, so I don't see how... In any case I am not the best person to ask about docshell changes in 57, but Olli has already answered this.

It's possible that AS is doing too many / substantially more screenshots than about:newtab before it, and that that is causing the issue to surface now when, effectively, media has never run inside the thumbnailed pages. You could check this with the AS people. Perhaps there are (other?) ways you could identify the background requests, and see if we can establish that the numbers youtube/twitch are reporting are coming from non-background requests?
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(m3u8)
I have a theory that those errors aren't actual page load, but thumbnails being generated in Firefox activity stream performing a full page fetch. In those thumbnail media elements aren't allowed and would return the MEDIA_ERR_SRC_NOT_SUPPORTED if the site attempted to play.

It's just a theory. I'm not even sure what would be a way to detect that.

Would there be anything in your logs that could validate or invalidate this theory?
Flags: needinfo?(strobe)
This does sound like the background page thumbs, which we added a blacklist to prevent twitch.tv from being thumbnailed in bug 1412505.

The reported failure rates, are these per <video> load attempt or per "session?" In particular with twitch, I believe we saw high CPU usage because immediately on fail, it would try again to fail again and again. If the reported failures are including each attempt, that could be artificially high for the number of users this affects.

Firefox does use the loaded page to capture a thumbnail, but that happens only after Firefox realizes it's a top site and the user visits it, so before that, a background request is made. Bug 1413650 fixed some issues there for 58 with pinned top sites not capturing loaded pages that were pinned.

(In reply to Jean-Yves Avenard [:jya] from comment #36)
> Would there be anything in your logs that could validate or invalidate this
> theory?
These background thumbnails are requested with these additional flags:
https://searchfox.org/mozilla-central/source/toolkit/components/thumbnails/content/backgroundPageThumbsContent.js#36-50

Maybe most useful for detecting a difference in the reported failures is that these should be anonymous / cookie-less requests.
See Also: → 1412505, 1413650
From some initial testing on Nightly 59, it looks like youtube channel and watch pages that are attempting to autoplay a video are trying 3 times?

youtube.com messages:
<source> element has no “src” attribute. Media resource load failed.  www.youtube.com
All candidate resources failed to load. Media load paused.  www.youtube.com

youtube channel messages:
<source> element has no “src” attribute. Media resource load failed.  UCUT8RoNBTJvwW1iErP6…
All candidate resources failed to load. Media load paused.  UCUT8RoNBTJvwW1iErP6…
JavaScript warning: https://www.youtube.com/yts/jsbin/player-vflM013co/en_US/base.js, line 3237: Error: This error message will be blank when privacy.resistFingerprinting = true.  If it is really necessary, please add it to the whitelist in MediaError::GetMessage: NS_ERROR_FAILURE (0x80004005) - Media not allowed
^^ repeated 3 times total

youtube watch / video messages:
JavaScript warning: https://www.youtube.com/yts/jsbin/player-vflM013co/en_US/base.js, line 3237: Error: This error message will be blank when privacy.resistFingerprinting = true.  If it is really necessary, please add it to the whitelist in MediaError::GetMessage: NS_ERROR_FAILURE (0x80004005) - Media not allowed
^^ repeated 3 times total


twitch messages (reverting the thumbnail blacklist change):
JavaScript warning: https://cvp.twitch.tv/2.1.1/mediaplayer.min.js, line 1: Error: This error message will be blank when privacy.resistFingerprinting = true.  If it is really necessary, please add it to the whitelist in MediaError::GetMessage: NS_ERROR_FAILURE (0x80004005) - Media not allowed
^^ repeated 6038 times total
(In reply to Olli Pettay [:smaug] from comment #28)
> The relevant code at least in docshell is quite old and mAllowMedia is true
> by default.
> 
> https://searchfox.org/mozilla-central/rev/
> c9214daa09e3eb8246b1448e469df1652a140825/toolkit/components/thumbnails/
> content/backgroundPageThumbsContent.js#42
> uses the property. Do we load more pages as thumbnails?

Since mAllowMedia will never be set by the user script, I think it should be OK to ignore the "Media not allowed" error without reporting it to the user.
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #39)
> (In reply to Olli Pettay [:smaug] from comment #28)
> > The relevant code at least in docshell is quite old and mAllowMedia is true
> > by default.
> > 
> > https://searchfox.org/mozilla-central/rev/
> > c9214daa09e3eb8246b1448e469df1652a140825/toolkit/components/thumbnails/
> > content/backgroundPageThumbsContent.js#42
> > uses the property. Do we load more pages as thumbnails?
> 
> Since mAllowMedia will never be set by the user script, I think it should be
> OK to ignore the "Media not allowed" error without reporting it to the user.

We certainly don't want to simply ignore it. This mechanism was put in place for a reason. We don't want active media elements on those types of page load.
Even if the error isn't reported, we need a way to prevent things from starting, so that resource usage is kept low (CPU, memory usage etc...)
(In reply to Jean-Yves Avenard [:jya] from comment #40)
> We certainly don't want to simply ignore it. This mechanism was put in place
> for a reason.

The mechanism was added by bug 759964. IIUC, it is used when loading media in a hidden frame where the user has no control over the playback. So I think it makes sense to silence the error in this case.
Attachment #8929334 - Attachment is obsolete: true
Attachment #8929335 - Attachment is obsolete: true
Attachment #8929336 - Attachment is obsolete: true
Attachment #8930382 - Flags: review?(adw)
Attachment #8930383 - Flags: review?(jyavenard)
Comment on attachment 8930383 [details]
Bug 1417869. P2 - abort the load algorithm early if media not allowed to load.

https://reviewboard.mozilla.org/r/201524/#review206814
Attachment #8930383 - Flags: review?(jyavenard) → review+
Comment on attachment 8930382 [details]
Bug 1417869. P1 - Backed out changeset bf0e4dd83b3b of bug 1412505.

https://reviewboard.mozilla.org/r/201522/#review207026
Attachment #8930382 - Flags: review+
Comment on attachment 8930382 [details]
Bug 1417869. P1 - Backed out changeset bf0e4dd83b3b of bug 1412505.

https://reviewboard.mozilla.org/r/201522/#review207120
Attachment #8930382 - Flags: review?(adw) → review+
Thanks for the reviews!
Pushed by jwwang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6cf9fb193b1b
P1 - Backed out changeset bf0e4dd83b3b of bug 1412505. r=adw,Mardak
https://hg.mozilla.org/integration/autoland/rev/80d8ceae7039
P2 - abort the load algorithm early if media not allowed to load. r=jya
https://hg.mozilla.org/mozilla-central/rev/6cf9fb193b1b
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Flags: needinfo?(strobe)
Flags: needinfo?(m3u8)
Comment on attachment 8930382 [details]
Bug 1417869. P1 - Backed out changeset bf0e4dd83b3b of bug 1412505.

Approval Request Comment
[Feature/Bug causing the regression]: None...
[User impact if declined]: Unnecessarily igh error rate seens by web sites, also causing bug 1412505.
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: no
[Needs manual test from QE? If yes, steps to reproduce]: attempting to reproduce 1412505 would achieve the requirements
[List of other uplifts needed for the feature/fix]: all 3 patches of this bug
[Is the change risky?]: no
[Why is the change risky/not risky?]: we make the error silent for something that isn't per spec anyway (taking a screen capture while not allowing media elements to be created)
[String changes made/needed]: none
Attachment #8930382 - Flags: approval-mozilla-beta?
Keywords: leave-open
Target Milestone: --- → mozilla59
Using the latest Nightly 59 20171122103138, I was able to get a thumbnail for twitch (no longer blacklisted and no high CPU resulting in no / failed thumbnail).
Comment on attachment 8929966 [details] [diff] [review]
1417869_beta_58_add_logs.patch

Fix an unnecessarily high error rate issue for video playback. Beta58+.
Attachment #8929966 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8930382 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Jean-Yves, we would really like to see this in the upcoming 57 dot release.  Do you think it reasonable to request uplift to release?
Flags: needinfo?(jyavenard)
fix seems easy enough...
Flags: needinfo?(jyavenard)
Comment on attachment 8930383 [details]
Bug 1417869. P2 - abort the load algorithm early if media not allowed to load.

Approval Request Comment
[Feature/Bug causing the regression]: Activity Streams making screen capture since 57
[User impact if declined]: Web sites will see high number of playback errors. Additionally, for some sites it will cause the CPU usage to hit 100% and slow down everything to a crawl as they attempt to reload the media element in a loop ; problem amplified by Activity Stream attempting to do another screen capture when the previous one failed
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: unknown
[Needs manual test from QE? If yes, steps to reproduce]: yes: the steps described in bug 1412505 will do
[List of other uplifts needed for the feature/fix]: just P2 (the need for P1 is only with 58)
[Is the change risky?]: no
[Why is the change risky/not risky?]: the behaviour of forbidding loading media element when called from a thumbnail isn't per spec to start with. As such, any subsequent behaviour doesn't have to be per spec either. The change only makes the failure to load a media element while done in a thumbnail silent.
[String changes made/needed]:
Attachment #8930383 - Flags: approval-mozilla-release?
Guys, the patch doesn't apply to beta (and probably not release).
Because the source is different enough, I would rather not do the rebase myself.
Correction, P1 isn't for 58 either, bug  1412505 didn't get uplifted to 58
Comment on attachment 8930382 [details]
Bug 1417869. P1 - Backed out changeset bf0e4dd83b3b of bug 1412505.

As said in comment #58
Attachment #8930382 - Flags: approval-mozilla-beta+ → approval-mozilla-beta-
Comment on attachment 8930383 [details]
Bug 1417869. P2 - abort the load algorithm early if media not allowed to load.

[Triage Comment]
If I understood correctly!
Attachment #8930383 - Flags: approval-mozilla-beta+
Sylvestre, he reason P2 doesn't apply to 57 is due to 1417869_beta_58_add_logs.patch only found in 58.

Could we uplift that too? if not I'll rebase it for release.
Attachment #8931616 - Attachment description: . P2 - abort the load algorithm early if media not allowed to load. r=jya, a=sledru → 57 - abort the load algorithm early if media not allowed to load. r=jya, a=sledru
Attachment #8930383 - Flags: approval-mozilla-release? → approval-mozilla-release+
We tested on several platforms (Windows 10 x64, Windows 7 x64 and MacOS 10.13.2) but we were unable to reproduce the issue.
Can someone who managed to reproduce it confirm that the issue is fixed in 57.0.1?
Looks like twitch's page has been updated to not cause high CPU in 57.0. The Browser Console in 57.0 still shows the MediaError:GetMessage and doesn't in 57.0.1.

Tested with https://ftp.mozilla.org/pub/firefox/candidates/57.0.1-candidates/build2/mac/en-US/Firefox%2057.0.1.dmg
You need to log in before you can comment on or make changes to this bug.