Closed Bug 778436 Opened 12 years ago Closed 6 years ago

WebM files with zero length EBML basic integer and float types are treated as corrupt

Categories

(Core :: Audio/Video: Playback, defect)

13 Branch
x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED INACTIVE

People

(Reporter: jochen.wilhelmy, Unassigned)

Details

Attachments

(4 files)

Attached video bug.webm
User Agent: Mozilla/5.0 (Windows NT 5.1; rv:13.0) Gecko/20100101 Firefox/13.0.1
Build ID: 20120614114901

Steps to reproduce:

play attached webm video from file system.
a TimeCode (0xe7) is zero and therefore has zero data length. this is ok according to ebml standard. when changing to length of one it works.


Actual results:

firefox says the file is corrupted


Expected results:

should play (the video has 65 gray frames at rate of 7.5 fps)
Attachment #646863 - Attachment mime type: application/octet-stream → video/webm
Component: Untriaged → Video/Audio
Product: Firefox → Core
(In reply to Jochen from comment #0)
> a TimeCode (0xe7) is zero and therefore has zero data length. this is ok
> according to ebml standard.

Can you please provide a reference to where this is defined?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Looking at the Matroska spec (http://matroska.org/technical/specs/index.html), Timecode is defined with an element type of unsigned integer and has no default value.  Unsigned integer is defined as big-endian, any size from 1 to 8 octets.

Having said that, mkvinfo can parse this file.
(In reply to Matthew Gregan [:kinetik] from comment #1)
> (In reply to Jochen from comment #0)
> > a TimeCode (0xe7) is zero and therefore has zero data length. this is ok
> > according to ebml standard.
> 
> Can you please provide a reference to where this is defined?
http://matroska.org/technical/specs/rfc/index.html
I used the generic ebml rfc that says values with length 0 are 0 or 0.0 for float
Thanks.  Timecode is an unsigned int rather than a float, but the RFC says zero length uints behave the same way.  So the Matroska spec and the EBML RFC define these basic types slightly differently, presumably that's an accident and one of them is in error.  I'll raise it on the mailing list.

Out of curiosity, what did you use to generate the test file?
Assignee: nobody → kinetik
Status: NEW → ASSIGNED
I used my own muxer to create the webm. the video stream itself is created by libvpx. why an own muxer? it's a kind of experment: i have created a "structure definition language" with antlr3 and with this i can describe the structure of webm. from this c++ structures, a writer and (soon) a reader are generated. this makes it really easy, lack of documentation is the most difficult part ;-). do you know an example source that muxes vorbis into the webm?

for example the structure definition for the ebml header looks like this:
ebml 0x1A45DFA3 : EBML =
{
	ebmlVersion 0x4286 : ubyte 1;
	ebmlReadVersion 0x42F7 : ubyte 1;
	ebmlMaxIDLength 0x42F2 : ubyte 1;
	ebmlMaxSizeLength 0x42F3 : ubyte 1;
	
	docType 0x4282 : string;
	docTypeVersion 0x4287 : ubyte 1;
	docTypeReadVersion 0x4285 : ubyte 1;
};

before the colon is a list of names (c++ name, ebml "name"), after the colon a list of type and modifiers where types can be defined inline by = {...}.
the last number (the 1's) defines a fixed number of bytes to write and is only an optimization so that at runtime the size of the ebml header is known except for the string length.
Summary: webm file does not play → WebM files with zero length EBML basic types are treated as corrupt
Summary: WebM files with zero length EBML basic types are treated as corrupt → WebM files with zero length EBML basic integer types are treated as corrupt
Also raised on matroska-devel: http://lists.matroska.org/pipermail/matroska-devel/2012-August/004217.html
Summary: WebM files with zero length EBML basic integer types are treated as corrupt → WebM files with zero length EBML basic integer and float types are treated as corrupt
doesn't seem to bother anyone. It seems that people are using the rfc for parsing and the sourceforge spec for writing to be on the safe side. I would assume the rfc as the "most official" document.
i found another little inconsitency: matroska spec says that the default value of DocType is "matroska", but the ebml rfc has no default value for DocType. I understand that if a field has the default value when writing you don't have to write the field at all. For example if TimecodeScale is 1000000 you don't have to write it and the reader assumes 1000000 if no TimecodeScale field is found. This would mean that if you want to write a "matroska" DocType then you don't have to write the DocType. But then a generic ebml reader can't figure out the DocType

-jochen
I added another file that does not get played by firefox because the CodecPrivate field is empty
(In reply to Jochen from comment #9)
> i found another little inconsitency: matroska spec says that the default
> value of DocType is "matroska", but the ebml rfc has no default value for
> DocType. I understand that if a field has the default value when writing you
> don't have to write the field at all. For example if TimecodeScale is
> 1000000 you don't have to write it and the reader assumes 1000000 if no
> TimecodeScale field is found. This would mean that if you want to write a
> "matroska" DocType then you don't have to write the DocType. But then a
> generic ebml reader can't figure out the DocType

The Matroska spec is wrong, there can't possibly be a default specified in the Matroska spec for any of the values in the EBML header.  This doesn't affect us, anyway, since the "webm" doctype is required.  Please report the spec issue to matroska-devel@lists.matroska.org.

(In reply to Jochen from comment #11)
> I added another file that does not get played by firefox because the
> CodecPrivate field is empty

The WebM container spec states that there is no CodecPrivate for V_VP8. Treating a present but zero sized CodecPrivate as non-present seems incorrect to me, and I don't believe a compliant muxer should produce files containing zero sized CodecPrivate elements for V_VP8.
Say there's an unknown chunk with zero length. Of course after I found out that the empty CodecPrivate is the problem by stepping through nestegg.c, I will not generate it. But I think for EBML compatibiliy nestegg.c should be able to parse empty chunks because the rfc states BINARY = *BYTE and not BINARY = +BYTE.
Hi,

 I have a webm file that similarily to this bug does not play and instead displays the "video is corrupt" image, attached. It was created with gstreamer via mediagoblin, and it plays fine with chrome / midori. The output from libwebm's sample utility is attached as well. If this is not the same issue (I couldn't be sure reading the transcript) I'll create a new bug. The original file is at https://mediagoblin.telavivmakers.org/mgoblin_media/media_entries/16/P9300786-640p.webm

Alon
Thanks for the report.  That seems to be a separate problem.  I've filed bug 852002 and CCed you.
Assignee: kinetik → nobody
Component: Audio/Video → Audio/Video: Playback
Mass closing do to inactivity.
Feel free to re-open if still needed.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: