[MSE] WebM segments with Tags elements not properly handled.

RESOLVED FIXED in Firefox 58

Status

()

P3
normal
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: jya, Assigned: alwu)

Tracking

(Blocks: 1 bug)

Trunk
mozilla58
Points:
---

Firefox Tracking Flags

(platform-rel +, firefox58 fixed)

Details

(Whiteboard: [platform-rel-Youtube])

Attachments

(6 attachments)

(Reporter)

Description

2 years ago
"We're trying to make some changes to our live VP9 streams and are running into some problems on Firefox. We want to add a Tags element to carry around some extra metadata. However, we've found that this causes append errors when trying to append to Firefox with MSE. The videos play fine on Chrome, if we access the videos directly through Firefox (not through MSE), or if we remove the Tags element. Any ideas what could be happening here? I've attached a minimal example with a WebM and MP4 file. This is all using the Firefox 53 on linux."
platform-rel: --- → ?
Whiteboard: [platform-rel-Youtube]
This bug is on YT's radar. Adding to plat rel +
platform-rel: ? → +
I did some more trials on this bug and this is what i found. MSE fails with WebM files that contain *any* element after the 'Tracks' element (even a 3 byte void element after 'Tracks' trips it up).

So i tried moving the 'Tags' element before the 'Tracks' and 'Info' element and it worked with MSE (See [1]). And of course, all these combinations work fine when loaded directly (i.e.) not via MSE.

[1] Here's the mkvinfo of the file that worked (i'm also attaching the file that worked):

+ EBML head
|+ EBML version: 1
|+ EBML read version: 1
|+ EBML maximum ID length: 4
|+ EBML maximum size length: 8
|+ Doc type: webm
|+ Doc type version: 4
|+ Doc type read version: 2
+ Segment, size unknown
|+ Tags
| + Tag
|  + Simple
|   + Name: http://youtube.com/streaming/metadata/segment/102015
|   + String: Sequence-Number: 448
...
|+ Segment information
| + Timecode scale: 1000000
| + Muxing application: google
| + Writing application: google
|+ Segment tracks
| + A track
|  + Track number: 1 (track ID for mkvmerge & mkvextract: 0)
|  + Track UID: 16351900942963154
|  + Track type: video
|  + Lacing flag: 0
|  + Default duration: 33.333ms (30.000 frames/fields per second for a video track)
|  + Codec ID: V_VP9
|  + Video track
|   + Pixel width: 1280
|   + Pixel height: 720
|+ Cluster
Posted video tags_in_front.webm
This is the file that worked (as described in comment #2).
(Reporter)

Comment 4

2 years ago
This is the same issue that causes bug 1341228.

The webm parser use to check for the presence of webm media segment is very crude and is expecting things in a particular way.

It was custom designed for working with youtube back in the days (as AFAIK, they are still the only one using MSE with webm).
Ah, it does look like the same issue.

If you can point me to roughly where the parser code might be failing to handle this, i'm happy to spend time and try to fix this. Please let me know!
(Reporter)

Comment 6

2 years ago
https://dxr.mozilla.org/mozilla-central/source/dom/media/mediasource/ContainerParser.cpp#160

As you can see, it only checks if the byte buffer starts with Cues or Cluster

We need something smarter, we have the WebMBufferedParser that allows checking a binary buffer and supports resume. That's what to use...
I just tried adding a check for 'Tags' element there and that seemed to work. Would that be an acceptable fix for this particular case?
(Reporter)

Comment 8

2 years ago
Only if "Tags" now always define the start of a webm media segment.
(Reporter)

Comment 9

2 years ago
(Oops validated too early) And is always followed by a cluster.
(In reply to Jean-Yves Avenard [:jya] from comment #4)
> It was custom designed for working with youtube back in the days (as AFAIK,
> they are still the only one using MSE with webm).

Shaka Player also supports MSE with WebM.
(Assignee)

Comment 11

a year ago
Per comment9, does it mean we can simply check Tags "0x1254c367" as a criteria for media segment?
In addition, I couldn't find the description your said in comment9 in spec, do you mind help me point where is it?
Thanks!
Flags: needinfo?(jyavenard)
(Reporter)

Comment 12

a year ago
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #11)
> Per comment9, does it mean we can simply check Tags "0x1254c367" as a
> criteria for media segment?

no.

While it may work around some particular cases, it's not a fix.

Say all you appended to the source buffer was a tag, and nothing else. We would incorrectly assume it was a media segment, while there's nothing to demux.

We need to properly scan the input buffer like with do with MP4 rather than just checking the first 4 bytes of the input buffer.


> In addition, I couldn't find the description your said in comment9 in spec,
> do you mind help me point where is it?
> Thanks!

https://w3c.github.io/media-source/webm-byte-stream-format.html#webm-media-segments

"A WebM media segment is a single Cluster element."

A webm media segment isn't defined by the presence of a tag.
Flags: needinfo?(jyavenard)
(Assignee)

Updated

a year ago
Assignee: nobody → alwu
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Reporter)

Comment 16

a year ago
mozreview-review
Comment on attachment 8921392 [details]
Bug 1362440 - part2 : parsing entire content to decide whether it's media segment.

https://reviewboard.mozilla.org/r/192430/#review197638

::: commit-message-2ba9b:3
(Diff revision 2)
> +Bug 1362440 - part1 : check entire content to ensure we have valid cluster in media segment.
> +
> +We should accept the cluster element which followes cues or tags element

the description is incorrect I believe.

It should have nothing to do with cues or tags. Those two elements shouldn't even appear in the description.

The definition of a webm media segment is a webm cluster. As such, finding a cluster is all that matters.

::: dom/media/mediasource/ContainerParser.cpp:359
(Diff revision 2)
> +  {
> +    WebMBufferedParser parser(0);
> +    nsTArray<WebMTimeDataOffset> mapping;
> +    ReentrantMonitor dummy("dummy");
> +    parser.Append(aData->Elements(), aData->Length(), mapping, dummy);
> +    return parser.EndSegmentOffset(0) > 0;

this function doesn't do what you use it for.

Additionally, it will only works if it has found an init segment first.

As you're creating a new WebmBufferedParser for just checking if you have a media segment, this parsing will always fail.

WebMBufferedParser is designed to parse a buffer that must start with an init segment. It can stop/resume at any point, but it must be used to parse a valid webm in the first place.

So this code will only with if aData is made of an init segment (EBML) + Media Segment

You're logic can't work. You can't create a new WebMBufferedParser for each call of HasCluster... It must be an iterative / progressive process..

::: dom/media/webm/WebMBufferedParser.cpp:139
(Diff revision 2)
>          mSkipBytes = mElement.mSize.mValue;
>          mState = CHECK_INIT_FOUND;
>          break;
> +      case TAGS_ID:
> +        mSkipBytes = mElement.mSize.mValue;
> +        mState = FIND_CLUSTER_SYNC;

this assumes that a after a tag you should ignore all until a cluster is found.

Why is that?
The spec for a tag certainly doesn't state so:
https://www.webmproject.org/docs/container/#tagging

There's also plenty of tags element type:
https://matroska.org/technical/specs/tagging/index.html
(Assignee)

Comment 17

a year ago
(In reply to Jean-Yves Avenard [:jya] from comment #16)
> this function doesn't do what you use it for.
> 
> Additionally, it will only works if it has found an init segment first.
> 
> As you're creating a new WebmBufferedParser for just checking if you have a
> media segment, this parsing will always fail.
> 
> WebMBufferedParser is designed to parse a buffer that must start with an
> init segment. It can stop/resume at any point, but it must be used to parse
> a valid webm in the first place.
> 
> So this code will only with if aData is made of an init segment (EBML) +
> Media Segment
> 
> You're logic can't work. You can't create a new WebMBufferedParser for each
> call of HasCluster... It must be an iterative / progressive process..

No, it can work.

For example, when you got the media segment started with Tags element, then,
1. go to case TAGS_ID
2. next state would be FIND_CLUSTER_SYNC
3. if you find the CLUSTER, the mClusterEndOffset would be set

so, if the return value of EndSegmentOffset(0) is larger then zero, which mean we have found the cluster.

However, I should add another case CUES_ID, to make sure it could also work as above.

> this assumes that a after a tag you should ignore all until a cluster is
> found.
> 
> Why is that?
> The spec for a tag certainly doesn't state so:
> https://www.webmproject.org/docs/container/#tagging
> 
> There's also plenty of tags element type:
> https://matroska.org/technical/specs/tagging/index.html

Do we have any need to parse the content of the tags element here?
I thought we just want to make sure we could find the cluster after tags.

If we don't have the specific usage for the content of tags, I think we could ignore them.
And that thing should be done in the real webm demuxer, instead of here.
Flags: needinfo?(jyavenard)
(Reporter)

Comment 18

a year ago
And what if the media segment doesn't start with a tag but another element type? (some content for example have cues prior the cluster).

You construct the WebMBufferedParser with an offset value of 0.

so the default value of WebMBufferedParser::mState is READ_ELEMENT_ID. This can only work if the content starts with a valid EBML element as you can otherwise very easily hit a code path that expect a particular information have previously been seen (like timescale and so forth)

If partial data was added it will fail too.

Remember that appendBuffer can be called with any content, one that may not yet be valid yet because the stuff is partial.
Flags: needinfo?(jyavenard)
(Assignee)

Comment 19

a year ago
I found that something wrong in my previous patch,

(1) Don't need to add TAGS_ID in WebMBufferedParser::Append()
In fact, the only thing we care is whether we have the Cluster, so we could ignore other EBML ID like Tags, Cues. When we do that, the next state is READ_ELEMENT_ID [1], so it would repeat to read the next byte in the buffer, and it would eventually update the mClusterOffset if we have it.

The flow would like that,
READ_ELEMENT_ID -> find some other EBML ID -> skip data and read next ID -> .... -> Find Cluster ID

[1] http://searchfox.org/mozilla-central/rev/d30462037ffea383e74c42542c820cf65b2b144e/dom/media/webm/WebMBufferedParser.cpp#143

(2) Should not assume the Cues would always be appended before the Cluster
In mediasource-config-change-webm-a-bitrate.html, I found the Cues would be appended after Cluster.

(3) Should modify test case to ensure we append Cluster after Cues
In some test cases, we would append Cues in the first time, and then append the Cluster in next time. It causes we couldn't find the Cluster after seeing Cues.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Reporter)

Comment 24

a year ago
mozreview-review
Comment on attachment 8921410 [details]
Bug 1362440 - part1 : add timecode checking for parser

https://reviewboard.mozilla.org/r/192450/#review198098

Why can't the cues be appended alone?

it's perfectly valid content. It just means that the data should be ignored and the buffered range won't change after an appendbuffer has completed.

If cues causes problems when appended alone, it only shows problem in your other code. it's valid
Attachment #8921410 - Flags: review?(jyavenard) → review-
(Reporter)

Comment 25

a year ago
mozreview-review
Comment on attachment 8921813 [details]
Bug 1362440 - part3 : add tests.

https://reviewboard.mozilla.org/r/192818/#review198100

::: dom/media/mediasource/test/test_WebMTagsBeforeCluster.html:33
(Diff revision 1)
> +      info("- wait for updateend -");
> +      await once(sb, "updateend");
> +
> +      info("- call end of stream -");
> +      ms.endOfStream();
> +      await once(ms, "sourceended");

adds a test that the buffered range isn't empty.
Attachment #8921813 - Flags: review?(jyavenard) → review+
(Reporter)

Comment 26

a year ago
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #19)

> (3) Should modify test case to ensure we append Cluster after Cues
> In some test cases, we would append Cues in the first time, and then append
> the Cluster in next time. It causes we couldn't find the Cluster after
> seeing Cues.

doing an appendBuffer with just cues, followed by another appendBuffer starting with a cluster is okay and perfectly valid.

It should be handled accordingly.

As the cues isn't a required element, per spec it should just be ignored.
(Reporter)

Comment 27

a year ago
(In reply to Alastor Wu [:alwu][please needinfo me][GMT+8] from comment #19)
> I found that something wrong in my previous patch,
> 
> (1) Don't need to add TAGS_ID in WebMBufferedParser::Append()
> In fact, the only thing we care is whether we have the Cluster, so we could
> ignore other EBML ID like Tags, Cues. When we do that, the next state is
> READ_ELEMENT_ID [1], so it would repeat to read the next byte in the buffer,
> and it would eventually update the mClusterOffset if we have it.
> 

when you search for a cluster id, it does the search byte by byte. This is likely going to cause problem as you could append garbage content followed by a cluster.

it will believe it's valid, only to fail later because the WebM demuxer itself will always expect valid data.

I'm fairly convinced that you need the WebMBufferedParser be fed content right from the start, and re-use that WebMBufferedParser as more data is added.
(Reporter)

Comment 28

a year ago
mozreview-review
Comment on attachment 8921392 [details]
Bug 1362440 - part2 : parsing entire content to decide whether it's media segment.

https://reviewboard.mozilla.org/r/192430/#review198104

::: dom/media/mediasource/ContainerParser.cpp:357
(Diff revision 3)
> +  bool MaybeContainCluster(const MediaByteBuffer* aData) const
> +  {
> +    MOZ_ASSERT(aData->Length() > 4);
> +
> +    // A WebM media segment is a single cluster element, but sometimes user
> +    // would append following kinds of element before cluster element.

that comment doesn't make much sense to me.

::: dom/media/mediasource/ContainerParser.cpp:362
(Diff revision 3)
> +    // would append following kinds of element before cluster element.
> +    bool startWithValidEBMLID = false;
> +    // 0x1c53bb6b Cues
> +    if ((*aData)[0] == 0x1c && (*aData)[1] == 0x53 && (*aData)[2] == 0xbb &&
> +        (*aData)[3] == 0x6b) {
> +      startWithValidEBMLID = true;

an EBML is the start of a webm and makes a webm init segment.

There's no such thing as a EBML ID.

EBML means Extensible Binary Meta Language.

It's the type of document

::: dom/media/mediasource/ContainerParser.cpp:375
(Diff revision 3)
> +    else if ((*aData)[0] == 0x10 && (*aData)[1] == 0x43 && (*aData)[2] == 0xa7 &&
> +             (*aData)[3] == 0x70) {
> +      startWithValidEBMLID = true;
> +    }
> +
> +    if (!startWithValidEBMLID) {

so any other elements before a cluster would make it return false.

There's no reason other element can't be found before a cluster.

and they should be ignored as the spec say. Not assume we don't have a cluster after.

In any case, only checking the first 4 bytes was always a hack to start with.

We need to do things properly, that is not assume that data will be added without ever be truncated or split at odd points.

::: dom/media/webm/WebMBufferedParser.cpp:272
(Diff revision 3)
>  }
>  
> +int64_t
> +WebMBufferedParser::GetClusterOffset() const
> +{
> +  return mClusterOffset > 0 ? mClusterOffset : -1;

return mClusterOffset

either mClusterOffset is -1 or it's a positive value. So those extra tests serve no purpose
Attachment #8921392 - Flags: review?(jyavenard) → review-
(Reporter)

Comment 29

a year ago
http://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/2017.html?timestamp=1508934343443 contains test that add webm split at random interval, good way to test...

Now with those new test, I'm not sure which one it was.

A good way to test, would be to have a page that takes a webm with multiple cluster, and split it randomly so that appendBuffer is called with partial data, and where a cluster may not be found exactly at the start of the input buffer.

The buffered range at the end should be the same as if the webm had been appended in one block.

that test would currently fail because we always expect either cues or cluster ID to be found in the first 4 bytes.
(Reporter)

Comment 30

a year ago
sorry, forgot to add the link:
the previous test page I was using was http://yt-dash-mse-test.commondatastorage.googleapis.com/unit-tests/2015.html?timestamp=1508934463315

I believe test 30 was one of those tests. though youtube has changed the test code, it only adds a tiny bit now. it use to randomly cut a 2MB webm and add those random size segments.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Reporter)

Updated

a year ago
Attachment #8921410 - Flags: review?(jyavenard)
(Reporter)

Comment 35

a year ago
mozreview-review
Comment on attachment 8921392 [details]
Bug 1362440 - part2 : parsing entire content to decide whether it's media segment.

https://reviewboard.mozilla.org/r/192430/#review198968

LGTM, passing to kinetik for review.
Attachment #8921392 - Flags: review?(jyavenard)
(Reporter)

Updated

a year ago
Attachment #8921392 - Flags: review?(kinetik)
(Reporter)

Comment 36

a year ago
mozreview-review
Comment on attachment 8922711 [details]
Bug 1362440 - part4 : parsing entire content to decide whether it's init segment.

https://reviewboard.mozilla.org/r/193806/#review198970

::: dom/media/mediasource/ContainerParser.cpp:144
(Diff revision 1)
> -        (*aData)[3] == 0xa3) {
> -      return NS_OK;
> -    }
> +    WebMBufferedParser parser(0);
> +    nsTArray<WebMTimeDataOffset> mapping;
> +    ReentrantMonitor dummy("dummy");
> +    bool result = parser.Append(aData->Elements(), aData->Length(), mapping,
> +                                dummy);
> +    if (!result) {

Please add a test appending a partial init segment, to ensure that things work as expected and that loadedmetadata is only fired once the complete init segment is received

(there may be such test already, if so please ignore)
Attachment #8922711 - Flags: review?(jyavenard) → review+

Comment 37

a year ago
mozreview-review
Comment on attachment 8921410 [details]
Bug 1362440 - part1 : add timecode checking for parser

https://reviewboard.mozilla.org/r/192450/#review199308
Attachment #8921410 - Flags: review?(kinetik) → review+

Comment 38

a year ago
mozreview-review
Comment on attachment 8921392 [details]
Bug 1362440 - part2 : parsing entire content to decide whether it's media segment.

https://reviewboard.mozilla.org/r/192430/#review199310
Attachment #8921392 - Flags: review?(kinetik) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 43

a year ago
Comment on attachment 8921813 [details]
Bug 1362440 - part3 : add tests.

Ask for a review again because of adding a new test which is required from comment36.
Attachment #8921813 - Flags: review+ → review?(jyavenard)
(Reporter)

Comment 45

a year ago
mozreview-review
Comment on attachment 8921813 [details]
Bug 1362440 - part3 : add tests.

https://reviewboard.mozilla.org/r/192818/#review199466

::: commit-message-ce495:1
(Diff revision 3)
> +Bug 1362440 - part3 : add test.

add tests

plural, seeing that you have two.
Attachment #8921813 - Flags: review?(jyavenard) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 50

a year ago
Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5b08911eb21f
part1 : add timecode checking for parser r=kinetik
https://hg.mozilla.org/integration/autoland/rev/9d8678fd3525
part2 : parsing entire content to decide whether it's media segment. r=kinetik
https://hg.mozilla.org/integration/autoland/rev/f072fb95a7b8
part3 : add tests. r=jya
https://hg.mozilla.org/integration/autoland/rev/c66bfddbbdfe
part4 : parsing entire content to decide whether it's init segment. r=jya
(Assignee)

Comment 53

a year ago
Do we need to uplift this change to ESR or Release?
Flags: needinfo?(ajones)
(Reporter)

Comment 54

a year ago
IMHO, youtube shouldn't be using that type of content with firefox, when they know we don't support it (they were the ones who reported the bug afterall)

Comment 55

a year ago
I would think uplifting to ESR would be a minimum.  Uplifting to release would probably be good as well as I've seen others opening up Chrome so they could get the youtube link to work.

Comment 56

a year ago
So, is this why https://www.youtube.com/watch?v=ddFvjfvPnqk throws up an error in the player when using 57.0.1 but works fine in 58beta8?
> So, is this why https://www.youtube.com/watch?v=ddFvjfvPnqk throws up an error in the player when using 57.0.1 but works fine in 58beta8?

Very likely, yes.
(Reporter)

Comment 58

a year ago
Vignesh, can YouTube stop sending those webm streams to Firefox 57?
Flags: needinfo?(vigneshv)
> Vignesh, can YouTube stop sending those webm streams to Firefox 57?

yes, turns out this was a recent regression (youtube had a workaround for firefox and the workaround started failing recently due to a bug). we are working on a fix internally that will be pushed out soon.
Flags: needinfo?(vigneshv)

Comment 60

a year ago
As a workaround, setting media.webm.enabled to false in about:config at least let's you play the stream in 57. Just toggle it back to true when done with the stream.
FWIW I'm seeing some complaints about this issue in release 57.0.1 on twitter:

https://twitter.com/gcpascutto/status/937712194485194755

Should we consider flipping the pref in release via a system addon until beta can ship the fix?
(Reporter)

Comment 62

a year ago
This is a YouTube bug that they will fix shortly...

Plus turning off webm alltogether (not just YouTube) would prevent playback on many systems with no h264 decoders
> This is a YouTube bug that they will fix shortly...

We have a fix that's scheduled to go out on wednesday. We are also looking into other things that may fix it earlier.


> Plus turning off webm alltogether (not just YouTube) would prevent playback on many systems with no h264 decoders

The number of live streams affected by this is actually quite small (compared to all of youtube), so i don't think we should disable webm for all firefox users.
Flags: needinfo?(ajones)
You need to log in before you can comment on or make changes to this bug.