Closed Bug 1531344 Opened 7 months ago Closed 5 months ago

Youtube stream fails due to "Invalid Top-Level Box"

Categories

(Core :: Audio/Video: Playback, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla68
Tracking Status
firefox68 --- verified

People

(Reporter: bryce, Assigned: dragana, NeedInfo)

References

Details

(Keywords: parity-chrome)

Attachments

(2 files)

Reported on reddit here.

STR:

  • Load https://www.youtube.com/watch?v=hHW1oY26kxQ
  • Leave stream running. A few minutes appears to be enough to repro.
  • Playback halts, console shows error:
    Media resource blob:https://www.youtube.com/de2d4534-db99-47c1-b5aa-5a7b35785f58 could not be decoded, error: Error Code: NS_ERROR_FAILURE (0x80004005) Details: virtual mozilla::MediaResult __cdecl mozilla::MP4ContainerParser::IsInitSegmentPresent(mozilla::MediaByteBuffer *): Invalid Top-Level Box:�0�
  • Note that the characters we log for the top level box are not always the same.

Expected behaviour:

  • Playback continues uninterrupted/without error.

Further information:
I am confident I've used the stream in the STR before for testing youtube live streaming (such as in for 1450952, though I don't think I explicitly link it in that bug), as it is a popular and consistently online. As such, I believe this issue is new, either due to a change with us/youtube.

While searching for a regression range, I'm seeing this fail in old versions of Firefox (2018-02-28, Firefox 63). As we haven't noticed the issue in those versions previously, I wonder if YT may have changed? Will reach out.


Other technical details:

Looks like we're arriving here via the segment parser loop (rest of stack omitted):

	xul.dll!mozilla::MP4ContainerParser::AtomParser::Init(const mozilla::MP4ContainerParser & aParser, const mozilla::MediaByteBuffer * aData, mozilla::MP4ContainerParser::AtomParser::StopAt aStop) Line 453	C++
 	xul.dll!mozilla::MP4ContainerParser::IsInitSegmentPresent(mozilla::MediaByteBuffer * aData) Line 384	C++
 	xul.dll!mozilla::TrackBuffersManager::SegmentParserLoop() Line 706	C++

I'm seeing other grabage atoms like �*� (00 15 2A 00) when we run into the failure case. These don't look anything like what we'd expect, so I suspect there's a mismatch between the stream structure and the structure Firefox is expecting, such that we're trying to treat some data as init data that is not.

Duplicate of this bug: 1526128

Copying 3rd comment from the dupe in case the information is useful for debugging this:

For reference, this is the livestream is question: https://www.youtube.com/watch?v=hHW1oY26kxQ

Using the plugin, while the stream is playing, there is only one media source listed. When it begins buffering, a second shows up. They are listed below (apologies for the lack of indentation):

0: blob:https://www.youtube.com/a7705d47-5d66-b841-b937-e5007f40c76a 😵
https://pastebin.com/cS0bbhpA

1: blob:https://www.youtube.com/6e8e5320-146a-ea48-9dd4-94a596a07b10 😵
https://pastebin.com/Vk4DGPXg

Discussion with YouTube indicate that they believe the issue may be caused by Firefox not rejecting incomplete requests, while chrome does. It's possible the page changed at some point such that this behaviour is now causing the the bustage we see.

They provided the attached test case to highlight the issue.

Attachment #9047470 - Attachment description: chunks-2.zip → Test case from Google highlighting what the believe to be the issue.
Attachment #9047470 - Attachment description: Test case from Google highlighting what the believe to be the issue. → Test case from Google highlighting what they believe to be the issue.

baku, is fetch/Transfer-Encoding: chunked handling we see in the example something you have familiarity with?

Flags: needinfo?(amarchesini)

I just created a similar bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1532012

Does it to me on 65.0.2 and latest Nightly in clean profiles, at least when watching NASA TV https://www.youtube.com/watch?v=21X5lGlDOfg

baku, is fetch/Transfer-Encoding: chunked handling we see in the example something you have familiarity with?

fetch yes, but transfer-encoding no. mayhemer?

Flags: needinfo?(amarchesini) → needinfo?(honzab.moz)

I have the same problems (65, 66b, 67 nightly) with all youtube live streams.
It could be related to the bug which I reported a while ago... Bug #1518879

Duplicate of this bug: 1532012

I will take a look.

Flags: needinfo?(dd.mozilla)

:kershaw just pointed out that setting network.http.enforce-framing.http1 makes Firefox behave like Chrome when it come to chunking.
Jean-Yves can you double check if that make the test no longer fail?

Flags: needinfo?(jyavenard)

Just some information below.

The test case from google shows the difference of handling broken chunk response between firefox and chrome.
We treat the response from http://localhost:8080/foobar as a normal response because the default http framing check [1] is not strict. If we want to reject this incomplete response, just set "network.http.enforce-framing.http1" to true.

But it seems that turning on this pref could cause more download issues according to [2].

[1] https://searchfox.org/mozilla-central/rev/3e0f1d95fcf8832413457e3bec802113bdd1f8e8/modules/libpref/init/all.js#1906
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=237623

If there is anything still needed from me, please ask ni? from me again.

Flags: needinfo?(honzab.moz)

The pref was added in bug 1088850 because of the breakage enforce framing caused. That bug is not that old I do not fill comfortable just flipping the pref.

Necko could expose info that a chunk was incomplete to the media and than media can reject it. in that way we can have not-strict behavior for everything else and strict behavior for video.

Flags: needinfo?(dd.mozilla)

Nils, how urgent is this?

If network.http.enforce-framing.http1 fixes the problem, we could add a flag to enforce the behavior per channel

Flags: needinfo?(drno)

Something has changed in either 65.0.2 or YouTube's code.. not doing it now or at least not as easily reproducible. network.http.enforce-framing.http1 is set to false currently. I set it to true to test and then flipped it back and haven't seen the issue afterwards.

Agreed that something has changed so we're not seeing the issue any more. I'm fairly confident that the change is on youtube's end, as I'd tested this with versions ranging from 62 - 67 and was seeing the issue across the board.

While the immediate issue may be gone, I remain concerned about the possibility of this happening again. The issue was present in Firefox and Edge because both have the default lenient framing behaviour detailed above. With edge moving towards Blink, we are at risk of being the only engine that will get clipped by this behaviour should YT start relying on it again.

Nils, Jya, I'd appreciate your thoughts. I think the approach discussed in comment 15 sounds sensible.

Keywords: parity-chrome

nothing to add.

Flags: needinfo?(jyavenard)

I'm dropping this down to P2, as I think resourcing from the media side is going to be tough for this cycle.

Priority: P1 → P2

I've been seeing this again for https://www.youtube.com/watch?v=HMdUcchBYRA

Error:

Media resource blob:https://www.youtube.com/86de3477-a2db-45d3-ad3e-27631af4da31 could not be decoded, error: Error Code: NS_ERROR_FAILURE (0x80004005) Details: virtual mozilla::MediaResult mozilla::MP4ContainerParser::IsInitSegmentPresent(mozilla::MediaByteBuffer *): Invalid Top-Level Box:=8-

Dragana, with this cropping up again, how much work do you think it would be on the network side to expose this as a flag so media code could use strict behaviour?

Flags: needinfo?(dd.mozilla)

Yeah.. seeing this again with 66.0.3. Was watching https://www.youtube.com/watch?v=DPfHHls50-w and after awhile the player said an error occurred.

In the console this was present a little before it happened:

Media resource blob:https://www.youtube.com/897cb3ad-cd6a-4e39-94ce-336fd39372f9 could not be decoded, error: Error Code: NS_ERROR_FAILURE (0x80004005)
Details: virtual mozilla::MediaResult __cdecl mozilla::MP4ContainerParser::IsInitSegmentPresent(mozilla::MediaByteBuffer *): Invalid Top-Level Box:erX watch

(In reply to Bryce Seager van Dyk (:bryce) from comment #21)

Dragana, with this cropping up again, how much work do you think it would be on the network side to expose this as a flag so media code could use strict behaviour?

Not a lot. I will make a patch.

Assignee: nobody → dd.mozilla
Status: NEW → ASSIGNED
Flags: needinfo?(dd.mozilla)

(In reply to Bryce Seager van Dyk (:bryce) from comment #21)

Dragana, with this cropping up again, how much work do you think it would be on the network side to expose this as a flag so media code could use strict behaviour?

Can you point me to the code that is making the request, so that I know what interface you need, on nsIHttpChannel or nsIRequest.

Flags: needinfo?(bvandyk)

Thinking about this further, I'm now uncertain as to if media code will actually be responsible for the network requests. My understanding is for the cases making use of Media Source Extensions, XHR or fetch() will be used to retrieve media, after which we'll add it to our source buffers. My concern here is YT will be using generic XHR/fetch(), so that we can't patch it in a media specific fashion, because media code is not doing the requests.

:jya, given your familiarity with MSE, is the above concern legitimate, or have I missed something?

Flags: needinfo?(bvandyk) → needinfo?(jyavenard)

Seeing this right now trying to watch coverage of the fire at Notre Dame at this link: https://www.youtube.com/watch?v=xGbmWOfdXcQ

Not sure how much this will help - it happens within 40 seconds or so each time.

Media resource blob:https://www.youtube.com/7858a4de-da5e-4ae6-aede-8fa73ea6dac3 could not be decoded, error: Error Code: NS_ERROR_FAILURE (0x80004005)
Details: virtual mozilla::MediaResult mozilla::MP4ContainerParser::IsInitSegmentPresent(mozilla::MediaByteBuffer *): Invalid Top-Level Box: �\

It is basically impossible to watch this stream in Firefox, which is super unfortunate; I had to open another browser.

Moving back to P1. P2 was based on the problem appearing to have subsided, but wanting to implement a fix for the future. Since this issue is currently happening, this is something we want to fix post haste.

Priority: P2 → P1

Some of the failure that cause us to turn strict-framing off(bug 1088850) has been fix some have not.

For example bug 1083090 comment #15 and #17 will be broken with this patch. The links are failing in Chrome too.

I decided only to make framing strict for chunked-encoding that should fix the youtube problem.

We may expect some breakage out of this patch.

(In reply to Bryce Seager van Dyk (:bryce) from comment #25)

Thinking about this further, I'm now uncertain as to if media code will actually be responsible for the network requests. My understanding is for the cases making use of Media Source Extensions, XHR or fetch() will be used to retrieve media, after which we'll add it to our source buffers. My concern here is YT will be using generic XHR/fetch(), so that we can't patch it in a media specific fashion, because media code is not doing the requests.

:jya, given your familiarity with MSE, is the above concern legitimate, or have I missed something?

You are correct, there's no real way to detect if XHR request is used with MSE or not.

Flags: needinfo?(jyavenard)

Dragana, I have some comments in the pending review, maybe we should talk about them directly, what do you think?

Flags: needinfo?(dd.mozilla)
Attachment #9060670 - Attachment description: Bug 1531344 - Bes strict about incorrect chunked encoding. r=mayhemer → Bug 1531344 - Be strict about incorrect chunked encoding. r=mayhemer
Flags: needinfo?(dd.mozilla)
Keywords: checkin-needed

Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/723587a2ae49
Be strict about incorrect chunked encoding. r=mayhemer

Keywords: checkin-needed

Backed out changeset 723587a2ae49 (Bug 1531344) for netwerk/test/unit/test_* failures

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=xpcshell&fromchange=6d2d7856e468e04234f4efad320fa26d222a2eb9&tochange=036e5224f2a2087ae768aa208eeb00dfbd14f3c2&selectedJob=245189533

Backout link: https://hg.mozilla.org/integration/autoland/rev/036e5224f2a2087ae768aa208eeb00dfbd14f3c2

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=245189533&repo=autoland&lineNumber=9905

20:44:22 INFO - TEST-START | netwerk/test/unit/test_partial_response_entry_size_smart_shrink.js
20:44:22 WARNING - TEST-UNEXPECTED-FAIL | netwerk/test/unit/test_partial_response_entry_size_smart_shrink.js | xpcshell return code: 0
20:44:22 INFO - TEST-INFO took 302ms
...
20:44:22 INFO - TEST-START | netwerk/test/unit/test_content_length_underrun.js
20:44:23 WARNING - TEST-UNEXPECTED-FAIL | netwerk/test/unit/test_content_length_underrun.js | xpcshell return code: 0
20:44:23 INFO - TEST-INFO took 352ms

Flags: needinfo?(dd.mozilla)

I have fix the tests.

Flags: needinfo?(dd.mozilla)
Keywords: checkin-needed

Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5b5baa11e448
Be strict about incorrect chunked encoding. r=mayhemer

Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

In an attempt to reproduce, the error message did not appear as per the report.
Probably YT made some changes on their side.
The only error listed for the attached TC is:
NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIInterfaceRequestor.getInterface]

Verified with 68.0b13 and there was still no error thrown as per the report so marking the bug as such.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Hi folks. Glad to see this got fixed. In terms of the inconsistency of test results, unterminated chunked-encoded responses are the mechanism by which HTTP 1.1 allows for a transmission failure during the response to be detected as a failure by the client. YouTube relies on this part of HTTP 1.1 to know when media segments for certain livestream content have been truncated by a network-level issue. Since such issues generally do not occur at steady-state on good networks, it is expected that repros would be inconsistent. I can confirm from YouTube's metrics that this issue is no longer occurring in Firefox 68+.

Regressions: 1567211
You need to log in before you can comment on or make changes to this bug.