Closed
Bug 835381
Opened 11 years ago
Closed 10 years ago
Matroska MKV video files are not WebM - don't try to play them
Categories
(Core :: Audio/Video, defect)
Tracking
()
RESOLVED
FIXED
mozilla22
People
(Reporter: gingerbread_man, Assigned: padenot)
References
Details
Attachments
(3 files, 2 obsolete files)
3.01 KB,
patch
|
kinetik
:
review+
|
Details | Diff | Splinter Review |
5.65 KB,
patch
|
kinetik
:
review+
|
Details | Diff | Splinter Review |
3.96 KB,
patch
|
kinetik
:
review+
|
Details | Diff | Splinter Review |
Blame probably goes to bug 567077. I take it the confusion is due to the fact that WebM is based on Matroska. Firefox tries to play MKV files, and displays the following error message: Video can't be played because the file is corrupt. I'm not sure what triggers this when downloading MKV files from the web. For example, the first link downloads the file from a random mirror, so it might trigger the problem, or it might not. The other two direct mirror links definitely trigger it. http://sourceforge.net/projects/matroska/files/test_files/cover_art.mkv/download http://freefr.dl.sourceforge.net/project/matroska/test_files/cover_art.mkv http://garr.dl.sourceforge.net/project/matroska/test_files/cover_art.mkv Setting the Content-Disposition header to Inline using the InlineDisposition extension appears to always trigger the problem. https://addons.mozilla.org/firefox/addon/inlinedisposition/ It also always happens when trying to open local files from an HTML index. https://support.mozilla.org/questions/948571 Alternatively, with H.264 support being added to Firefox, maybe revisit bug 476727
Comment 1•11 years ago
|
||
HTTP/1.1 200 OK Date: Mon, 28 Jan 2013 17:31:35 GMT Server: Apache/2.2.8 (FreeBSD) Last-Modified: Thu, 23 Sep 2010 19:16:39 GMT ETag: "7c8fd12-25a53ab-490f219f7d3c0" Accept-Ranges: bytes Content-Length: 39474091 Connection: close Content-Type: application/octet-stream
Status: UNCONFIRMED → NEW
Component: File Handling → Video/Audio
Ever confirmed: true
Assignee | ||
Comment 2•11 years ago
|
||
Yep, the spec says that using more bytes should be considered [1]. I'll give it a shot. [1]: http://mimesniff.spec.whatwg.org/#identifying-a-resource-with-an-unknown-media-type
Assignee: nobody → paul
Status: NEW → ASSIGNED
Assignee | ||
Comment 3•11 years ago
|
||
Matthew, is putting this in nestegg still planned? You mention that it is something we could consider in [1]. [1]: https://groups.google.com/a/webmproject.org/d/topic/webm-discuss/bi9G_QuDq9s/discussion
Comment 4•11 years ago
|
||
No plan, I haven't thought about it much since that email to be honest. If there are code sharing wins to be had, I'm all for it.
Assignee | ||
Comment 5•11 years ago
|
||
I'll probably put the sniffing code in libnestegg, then. Not sure when I'll get to it, though.
Assignee | ||
Comment 6•10 years ago
|
||
I just init-ing the context here to check if the media is WebM. Strictly speaking, this does not only check the DocType, but well, if the sniffing fails, playing the webm file won't work anyways. It will probably be terrible to spec, though.
Attachment #713077 -
Flags: review?(kinetik)
Assignee | ||
Comment 7•10 years ago
|
||
I have a xpcshell test coming up for that, but somehow it does not work and it's late here. I've checked that if I load an mkv file as a top-level document, in a way that it should be sniffed (either served with "Content-Type: application/octet-stream", or without Content-Type), a download window appears, and if I load it in a <video> tag, it says "video format or mime type not supported".
Attachment #713078 -
Flags: review?(kinetik)
Assignee | ||
Comment 8•10 years ago
|
||
(The xpcshell test that does not work for some reason. Necko keep sniffing the files as text/plain, I believe something is wrong with the way I send back the data in httpd.js).
Comment 9•10 years ago
|
||
Comment on attachment 713077 [details] [diff] [review] Add an API to nestegg to sniff for webm. r= Review of attachment 713077 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/libnestegg/include/nestegg.h @@ +331,5 @@ > + * @param buffer A buffer containing the beginning of a media file. > + * @param size The size of the buffer. > + * @retval 0 The file is not a WebM file. > + * @retval 1 The file is a WebM file. */ > +int nestegg_sniff(unsigned char * buffer, uint32_t size); Declare buffer "unsigned char const * buffer", then you can remove the cast from the caller in the other patch. Rename "size" to "length" to match the existing naming convention for buffers and change its type to size_t. ::: media/libnestegg/src/nestegg.c @@ +2085,5 @@ > + > +/* Three functions that implement the nestegg_io interface, operating on a > + * sniff_buffer. */ > +struct sniff_buffer { > + unsigned char* buffer; Space before *. @@ +2086,5 @@ > +/* Three functions that implement the nestegg_io interface, operating on a > + * sniff_buffer. */ > +struct sniff_buffer { > + unsigned char* buffer; > + uint32_t size; size_t, length. @@ +2087,5 @@ > + * sniff_buffer. */ > +struct sniff_buffer { > + unsigned char* buffer; > + uint32_t size; > + uint64_t offset; int64_t. @@ +2093,5 @@ > + > +static int > +buffer_read(void * buffer, size_t length, void * user_data) > +{ > + struct sniff_buffer * sb = (struct sniff_buffer*)user_data; Space before * and after ). @@ +2097,5 @@ > + struct sniff_buffer * sb = (struct sniff_buffer*)user_data; > + int rv = 1; > + size_t available = sb->size - sb->offset; > + > + if (available < length ) { Remove available and make this: if (sb->size - sb->offset < length), and you can return 0 immediately here. Then remove rv and return 1 at the end of the function. @@ +2110,5 @@ > + > +static int > +buffer_seek(int64_t offset, int whence, void * user_data) > +{ > + struct sniff_buffer * sb = (struct sniff_buffer*)user_data; Space before * and after ). @@ +2136,5 @@ > + > +static int64_t > +buffer_tell(void * user_data) > +{ > + struct sniff_buffer * sb = (struct sniff_buffer*)user_data; Space before * and after ). @@ +2144,5 @@ > +int > +nestegg_sniff(unsigned char * buffer, uint32_t size) > +{ > + nestegg_io io; > + nestegg* context; Space before *. @@ +2146,5 @@ > +{ > + nestegg_io io; > + nestegg* context; > + struct sniff_buffer user_data; > + int rv; Call this r to match the rest of the code. @@ +2156,5 @@ > + io.read = buffer_read; > + io.seek = buffer_seek; > + io.tell = buffer_tell; > + io.userdata = &user_data; > + rv = nestegg_init(&context, io, NULL, size) == 0 ? 1 : 0; Make this: r = nestegg_init(&context, io, NULL, size); if (r == 0) { nestegg_destroy(context); } return r == 0;
Attachment #713077 -
Flags: review?(kinetik) → review+
Comment 10•10 years ago
|
||
Comment on attachment 713078 [details] [diff] [review] Use the new libnestegg API to sniff for WebM. r= Review of attachment 713078 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/mediasniffer/nsMediaSniffer.cpp @@ +67,5 @@ > } > > +static bool MatchesWebM(const uint8_t* aData, const uint32_t aLength) > +{ > + return nestegg_sniff((uint8_t*)aData, aLength) ? true : false; The cast should be unnecessary with the changes suggested for the other patch.
Attachment #713078 -
Flags: review?(kinetik) → review+
Assignee | ||
Comment 11•10 years ago
|
||
> @@ +2097,5 @@
> > + struct sniff_buffer * sb = (struct sniff_buffer*)user_data;
> > + int rv = 1;
> > + size_t available = sb->size - sb->offset;
> > +
> > + if (available < length ) {
>
> Remove available and make this:
> if (sb->size - sb->offset < length),
> and you can return 0 immediately here. Then remove rv and return 1 at the end
> of the function.
If I do that, then the sniffing fails every time and we can't play WebM files
when sniffing any more, because we fail in |ne_parse| with this stack:
#1 0x00007ffff2bffff3 in ne_io_read (io=0x7fffd2d76140, buffer=0x7fffd2093020, length=3800) nestegg.c:502
#2 0x00007ffff2c00641 in ne_read_binary (ctx=0x7fffcffef820, val=0x7fffd3f81cf8, length=3800) nestegg.c:702
#3 0x00007ffff2c00efc in ne_read_simple (ctx=0x7fffcffef820, desc=0x7ffff5254480 <ne_track_entry_elements+576>, length=3800) nestegg.c:949
#4 0x00007ffff2c01123 in ne_parse (ctx=0x7fffcffef820, top_level=0x0, max_offset=512) nestegg.c:1013
#5 0x00007ffff2c02513 in nestegg_init (context=0x7fffffffc210, io=..., callback=0x0, max_offset=512) nestegg.c:1540
#6 0x00007ffff2c03abb in nestegg_sniff (buffer=0x7fffd6582000 "\032E\001", length=512) nestegg.c:2157
When writing |buffer_read|, I figured it would be good to have the same
semantics as |webm_read|, that is, return the data even if eof is reached,
even though nestegg does not know how many bytes were read by this call
(and most probably has to compute it using |tell|).
Assignee | ||
Comment 12•10 years ago
|
||
And a test that works.
Attachment #713481 -
Flags: review?(kinetik)
Assignee | ||
Updated•10 years ago
|
Attachment #713080 -
Attachment is obsolete: true
Comment 13•10 years ago
|
||
(In reply to Paul Adenot (:padenot) from comment #11) > If I do that, then the sniffing fails every time and we can't play WebM files > when sniffing any more, because we fail in |ne_parse| with this stack: Hm, that's bad, because the caller doesn't know how much of the buffer has been filled. I'll file a separate bug to deal with that.
Updated•10 years ago
|
Attachment #713481 -
Flags: review?(kinetik) → review+
Comment 14•10 years ago
|
||
Comment on attachment 713077 [details] [diff] [review] Add an API to nestegg to sniff for webm. r= Now that I've thought about this more, it can't work reliably as-is. From a random sampling of 42 WebM files I have handy, nestegg_sniff as currently implemented fails on 6 of them. The problem is that the sniffer truncates the buffer at an arbitrary offset (e.g. 512 bytes) with no knowledge of the buffer contents. The max_offset machinery in nestegg requires max_offset to be a valid element end point, see bug 782457 comment 6. (In reply to Matthew Gregan [:kinetik] from comment #13) > Hm, that's bad, because the caller doesn't know how much of the buffer has > been filled. I'll file a separate bug to deal with that. I investigated this further: nestegg never examines the buffer when an EOF has been returned. The change in behaviour was caused by not updating sb->offset in buffer_read, which changes the result of ne_io_tell and causes the max_offset test in ne_parse to consistently fail.
Attachment #713077 -
Flags: review+ → review-
Comment 15•10 years ago
|
||
nestegg_init wants to reads up to the beginning of the first Cluster element. Of the 42 WebM files I have handy, the first cluster starts between 309 and 9429 bytes, with the median at 4389 bytes. A large chunk (3291 bytes) is usually Vorbis CodecPrivate, within the audio Track element. We probably can't expect to see more than the EBML header with the "webm" doctype in the first 512 bytes. It's common (but not guaranteed, due to the potential size of the SeekHead, any Void elements, layout, etc.) that the Tracks element will be present in the first 512 bytes (definitely not the complete audio track, given the size of the Vorbis CodecPrivate), so we could use that if present to increase the probability of a match (e.g. we could confirm the codecs).
Assignee | ||
Comment 16•10 years ago
|
||
Could you upload your test files somewhere so I can test? I've got a new version cooking up, and I've been searching for test files, but I've failed to find a comprehensive collection. This sniffing stuff is mainly here to "probe" the file for a type, not to discard every possible invalid file. I believe it is acceptable to check just the EBML id and the DocType, which would effectively mean "I'm a matroska container, and I say that I am a WebM file". The file not being valid (because it is not vorbis/vp8) does not matter too much at that point, and our subsequent error message ("Video can't be played because the file is corrupt.") would be adequate. On the other hand, discarding a valid WebM file would be bad.
Comment 17•10 years ago
|
||
(In reply to Paul Adenot (:padenot) from comment #16) > Could you upload your test files somewhere so I can test? I've got a new > version cooking up, and I've been searching for test files, but I've failed > to find a comprehensive collection. It's not a curated collection, just random stuff I've saved while working on bugs and testing. I didn't want to upload the entire set, but I've put the first 16kB of each file up here: http://flim.org/~kinetik/webm-samples.tar.xz Hopefully that's useful enough.
Assignee | ||
Comment 18•10 years ago
|
||
So, this patch works with every file I tested (two files from the archive you uploaded have a matroska doctype, "23.webm" and "34.webm", and are correctly sniffed as non-WebM).
Attachment #714334 -
Flags: review?(kinetik)
Assignee | ||
Updated•10 years ago
|
Attachment #713077 -
Attachment is obsolete: true
Comment 19•10 years ago
|
||
Comment on attachment 714334 [details] [diff] [review] Add an API to nestegg to sniff for webm. Review of attachment 714334 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/libnestegg/src/nestegg.c @@ +2095,5 @@ > +buffer_read(void * buffer, size_t length, void * user_data) > +{ > + struct sniff_buffer * sb = (struct sniff_buffer *) user_data; > + > + if (sb->length - sb->offset < length ) { Remove the space between length and ).
Attachment #714334 -
Flags: review?(kinetik) → review+
Comment 20•10 years ago
|
||
I fixed a couple of warnings in github, so please make sure you update to 40353d16d3345263cb839cae5603f50f6b394885 when you check this in.
Assignee | ||
Comment 21•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/5d873401eb7b https://hg.mozilla.org/integration/mozilla-inbound/rev/a1534b828aa7 https://hg.mozilla.org/integration/mozilla-inbound/rev/b143a7fb7f39 (I updated the mozilla-central copy of libnestegg to github's HEAD).
Assignee | ||
Comment 22•10 years ago
|
||
Red on Windows because I forgot to add the new symbol to the symbol list, pushed a followup. https://hg.mozilla.org/integration/mozilla-inbound/rev/ca980a7779d4
Comment 23•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/5d873401eb7b https://hg.mozilla.org/mozilla-central/rev/a1534b828aa7 https://hg.mozilla.org/mozilla-central/rev/b143a7fb7f39 https://hg.mozilla.org/mozilla-central/rev/ca980a7779d4
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Reporter | ||
Comment 24•10 years ago
|
||
Thank you for fixing this for remote files. The behavior for links to MKV files from a local HTML index hasn't changed though. At least it can be worked around with the aforementioned type="video/x-matroska" attribute for the links.
You need to log in
before you can comment on or make changes to this bug.
Description
•