Closed Bug 481529 Opened 11 years ago Closed 9 years ago

Support for Kate overlay streams in <video> tag

Categories

(Core :: Audio/Video, enhancement)

enhancement
Not set

Tracking

()

VERIFIED WONTFIX

People

(Reporter: ogg.k.ogg.k, Unassigned)

References

()

Details

Attachments

(1 file, 11 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060214 Firefox/1.0.7
Build Identifier: N/A

Firefox recently added support for Ogg streams (Theora and Vorbis).

Subtitles, captions, and related overlay elements may be multiplexed
in Ogg alongside other codecs. A C codec library is available at
http://libkate.googlecode.com/ with a BSD licence. The library used
by Firefox to handle Ogg streams, liboggplay, has support for Kate
streams.

Display can be done in two ways - using another library (libtiger,
a C rendering library based on Pango and Cairo) to render overlays
on top of the incoming video, or, if needed, custom by Firefox from
the raw decoded Kate data (eg, generating HTML). The use of direct
rendering using libtiger is also supported by liboggplay (recent
version needed).



Reproducible: Always

Steps to Reproduce:
1. Run Firefox 3.1beta2, or latest repo version.
2. Try to display an Ogg video with a Kate stream through the <video> tag
3.
Actual Results:  
Subtitles are not displayed.


Expected Results:  
Subtitles should be able to be selected for display.


More info about Kate streams: http://wiki.xiph.org/index.php/OggKate

libkate codec library: http://libkate.googlecode.com/

libtiger rendering library: http://libtiger.googlecode.com/

A sample video with embedded Kate streams: http://people.xiph.org/~oggk/elephants_dream/elephantsdream-with-subtitles.ogg (51 MB)
Component: General → Video/Audio
Product: Firefox → Core
QA Contact: general → video.audio
Attached patch Proof of concept patch (obsolete) — Splinter Review
Attached is a proof of concept patch.
Most of the space is taken by adding libkate (the codec), libtiger (the renderer) and syncing to a newer liboggplay.
Whenever a Kate stream is found, the first one is automatically enabled.

All streams present in an Ogg stream may be queried to get their category and language, so a menu may eventually be built for selection of a particular stream.

A sample web page using <video> with a simple Kate stream for subtitles which works with this proof-of-concept patch may be found at:

http://people.xiph.org/~oggk/elephants_dream/elephantsdream-video.html

Note that Kate streams using PNG images will not display the PNG images as the Cairo shipping with Firefox doesn't build cairo-png.c (including it in the build doesn't even compile). Other complex Kate streams (eg, styling, positioning, motions, etc) should "just work".

Feedback on this preliminary proof of concept would be much appreciated.

Thanks
Here is an updated patch.

If a Kate stream is encountered, the first one will be played. Ultimately,
the list of streams and their languages/categories should be added to the
right click menu, but this patch just demonstrates the ability.

This patch is monolithic for convenience, but is maintained as a series
of patches in my git tree, so can be applied step by step.

Note that I had trouble tracking down a crash, which turned out to be
two versions of Cairo being used, with different layouts for cairo_state.
Upgrading my system Cairo to the same version FF uses fixed this. I am
not sure at all how to fix this in the Makefile system, however.

Comments and feedback would be most welcome.
Attachment #366702 - Attachment is obsolete: true
Hi,

setting as a request for 1.9.2, mostly so I can get feedback on whether
this will be considered, and feedback on the basic idea before I go any
further (eg, menus with the list of streams/languages in a video to
select one track).

Thanks
Flags: wanted1.9.2?
Attached patch Updated to keep it applying (obsolete) — Splinter Review
Updated to apply cleanly again.
Feedback welcome :)
Attachment #380251 - Attachment is obsolete: true
Applies to newest (mostly, one file (ogg decoder) moved to another directory).
Feedback still most welcome :D
Thanks
Attachment #384289 - Attachment is obsolete: true
Can the dependency on Pango be removed? We're looking at removing the use of liboggplay and using the lower level Ogg libraries. How much does this affect this patch?
It is possible, and would just prevent the use of the rendering library as is, which is the part that relies on Pango (though I wanted to see medium term if I could depend on just Harfbuzz).

The patch is (in my git tree) split into several separate patches, and the first
few only add libkate and plug it to the Ogg decoder, which receives raw UTF-8 text.
As a proof of concept, I just printed that text on the console, but it could be sent to the HTML pipeline (I haven't looked at how easy/hard this would be).

The second set of patches add rendering, and could then be omitted. One would then either have to write a renderer without Pango but using Firefox's HTML renderer, or adapt the rendering lib to use just Harfbuzz internally, or just disregard part of the formatting. Which option would be best here would be left to be determined.

As a side note, it would also mean that a stream would render differently from other libtiger based renderers (eg, VLC, or (soon) GStreamer), though it could be integrated into the DOM.
Flags: wanted1.9.2? → wanted1.9.2-
Following a quick mail exchange with roc, he said it might be a good
idea to explain in a few words what Kate is good for, so here is a
quick overview of the codec, its capabilities, and the associated libs:

Kate is an overlay codec, which can carry text and images, as well as animate
properties of those. In particular, Kate strenms may be embedded in Ogg for
things like captions, subtitles, lyrics, etc.
The basic use of a Kate stream is simple subtitles, whether text or image based
(the latter being mostly for DVD style subtitles).
More complex uses would be karaoke, where properties of text are animated
(eg, a style/color change cutoff pointer is moved with time, text position is
moved across the screen, etc).

There are two libraries I've been working on:
- libkate is the codec library, encoding and decoding Kate streams. No
  dependencies (libogg is optional for an ogg oriented layer).
- libtiger is a rendering library, which uses Pango/Cairo to render streams
  that libkate decodes.

If only text is required, one could decode a Kate stream using libkate, and
receive UTF-8 text, without needing libtiger, and do its own rendering.

There is a Java decoder and renderer pair too (used in newer versions of
Cortado), which support only text and static images (that is, the main uses
of text and image subtitles).

So, at the simplest, it's text or image based subtitles. Mux several languages
along with Theora/Vorbis, and select at runtime which one, if any, you want,
among the languages and categories.
Next release will add fine grained metadata (while keeping forward/backward
compatibility).

Any questions, feel free to ask.
I have opened bug 515898 where I have attached a patch that implements user interface for subtitles. Please test it and give me some feedback. It would be nice if we could mix that patch with the kate streams one.

I've been talking to Silvia Pfeiffer about a suggestion to add subtitling markup support to html5: itext nodes as children of the video element would describe various sources of subtitling/captioning content. If we use the ui implemented in the patch from bug 515898 to select ogg embedded subtitles, then later the same ui can be used to handle the itext-referenced subtitles (in case it gets accepted by whatwg). It would be simply a matter of calling this.addSubtitles on DOMNodeInserted events. 

Also, it can be used by addons such as the one that I am implementing right now (from which I refactored some portions into that patch).
This UI patch works fine here. Only thing is that there is no 'none' entry to disable subtitles, but that should be easy to add.

As for using this menu to select embedded subtitles, I don't see an easy way to tell the Ogg code about the selection, or retrieve the list of available languages, I'll dig a bit more to see how those can be made to talk with each other.
I should have brought this up earlier, sorry ... but we're probably going to stop using liboggplay on trunk at some point, for various reasons but mainly because it adds a lot of complexity we don't need.
I mentioned the possibility of removing liboggplay in comment 6 along with the pango dependency issue.
Two proof of concept patches that link Felipe Corrêa da Silva Sanches' patch to mine:
- expose the list of subtitle languages from a video
- alllow selecting one of those

There is still no 'none' option, but that would be easy (selecting index -1).

Applying the Kate patch, then Felipe's menu patch, then these two gives you
a Firefox where you can switch embedded subtitles from the menu.

Note that I'm not 100% sure I'm using strings correctly, the available string
classes (nsAutoString et al) are a bit confusing as to what should be used where (to me).

Also missing is the translation of language codes to actual user friendly strings (eg, en_GB -> English), and using the stream categories.
Sounds like you forgot to actually attach the patches you mention.
oops, patches now attached.
I'd like to point out 2 issues in the patches you sent:

* non-standard DOM extension
* proper abstraction for multiple sources of subtitles would be good

--------
non-standard DOM extension

These extra DOM attributes you introduced are non-standard. I'd like to hear from the mozilla experts about the ideal procedures in this case. Do we implement non standard stuff and submit it to standardization processes? Or do we have some other way to code it that does not involve extending DOM?
 
--------
multiple sources of subtitles

My initial patch provided an addSubtitles method that should be responsible for adding menuitems to the user interface and then, once one of the items is clicked, parse_content function would be called in order to detect which kind of subs are we trying to display and delegate a proper subtitle renderer.

Ideally, it should do one of these:
* parse an SRT file and render it through a XUL Label overlay;
* invoke the internal Kate renderer;
* parse and render some other subtitles syntax. (I hope we can figure out a standard syntax for that)

It is good to maintain this abstraction so that it is easier to incorporate new subtitles sources later. Perhaps the itext nodes with subtitling info that Silvia has drafted to be incorporated in html5.
(In reply to comment #17)
> * non-standard DOM extension

ogg.k.ogg.k asked me what to do about this on irc. I advised them to come up with an API that they think is suitable and we can look at what needs to be done later.

This is an experiment in how to expose subtitles and we can change it to whatever is needed (eg. appending 'moz' to the extended API) before landing.
I've been working on standardising the API for accessibility issues for <video> and <audio>. The draft specification is here: https://wiki.mozilla.org/Accessibility/HTML5_captions. I think the DOM in use with libkate should be the same as the one we use with out-of-band subtitle files, so I will have to analyse the interface that ogg.k.ogg.k ' s patch defined and see how we can harmonise these.

The intention is to eventually include the API that we define into the standard, once all browser vendors can agree on it. There is going to be a video accessibility workshop with the next W3C meeting in November, so if we can show some of this stuff working smoothly, we may be able to get agreement on the API that we define.

(In reply to comment #18)
> (In reply to comment #17)
> > * non-standard DOM extension
> 
> ogg.k.ogg.k asked me what to do about this on irc. I advised them to come up
> with an API that they think is suitable and we can look at what needs to be
> done later.
> 
> This is an experiment in how to expose subtitles and we can change it to
> whatever is needed (eg. appending 'moz' to the extended API) before landing.
Are you looking at using liboggz directly? What functionality in liboggplay is it that you don't need? Is it the rendering bits? Just curious...

(In reply to comment #6)
> Can the dependency on Pango be removed? We're looking at removing the use of
> liboggplay and using the lower level Ogg libraries. How much does this affect
> this patch?
> non-standard DOM extension

I've added a minimal one, waiting for comments. Chris Double mentioned an array instead of a string to be parsed for the list of languages, this might be better. In any case, there's not much point in coding details that would become irrelevant  if the API changes, so I'll wait for comments on those before going much further. In particular, the 'category' is left out, as it's more Kate specific, though also used in itext.

> multiple sources of subtitles

This should be easy to add a "multiplexed" existing SRT parser, by storing a
function instead of an index in the 'track' attribute for the menu item. The
function would presumably do whatever it needs to select subtitles for any
particular format.
Silvia: I think you've written a Javascript SRT parser, maybe you can merge it in ?

About liboggplay:
If/when it is removed, I'll just update my patch, it doesn't absolutely need liboggplay, nor the Pango/Cairo rendering part if we're just going to get the UTF-8 text and give it to Javascript (in which case it'd use the same code that Felipe's patch used for the test subtitles).
(In reply to comment #21)
> > multiple sources of subtitles
> 
> This should be easy to add a "multiplexed" existing SRT parser, by storing a
> function instead of an index in the 'track' attribute for the menu item. The
> function would presumably do whatever it needs to select subtitles for any
> particular format.
> Silvia: I think you've written a Javascript SRT parser, maybe you can merge it
> in ?

It's at http://svn.annodex.net/itext/javascript/srt.js .

Haven't had the time yet to check out how it all fits together and whether the API fits with what I defined in https://wiki.mozilla.org/Accessibility/HTML5_captions . But I am most excited about all this development!
This patch makes internal and external subtitles work in the same way.
Both are added to the selection menu.
Rendering through libtiger is disabled, and text is sent via a DOMString to JS.
Parenthetically, this means time-overlapping events are not supported - only one at a time, and any styling/positioning is also discarded.
To be applied after the others.
Still proof of concept, unpolished, of course.
Attached patch updated to latest tree (obsolete) — Splinter Review
Attachment #386353 - Attachment is obsolete: true
Attached patch Merged patch - kate + menus (obsolete) — Splinter Review
As a convenience, this all-in-one patch contains the kate stream support, Felipe's subtitles menu patch, and the glue needed to connect the two.
(In reply to comment #23)
> Created an attachment (id=401207) [details]
> merge interna and external subtitles
> 
> This patch makes internal and external subtitles work in the same way.
> Both are added to the selection menu.
> Rendering through libtiger is disabled, and text is sent via a DOMString to JS.
> Parenthetically, this means time-overlapping events are not supported - only
> one at a time, and any styling/positioning is also discarded.
> To be applied after the others.
> Still proof of concept, unpolished, of course.

Patch applied nicely. However, to compile it, I had to

#undef HAVE_TIGER

in media/liboggplay/src/liboggplay/config.h

and add to the /toolkit/themes/winstripe/global/media directory the user interface icon of Felipe from <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=515898">bug 515898</a>

it will compile nicely.
Indeed, I found it as a change to the build system that I needed to match in my new makefiles. On my machine, it was finding an installed copy of tiger.h and using it instead. The all-in-one patch is now updated.
Attachment #401712 - Attachment is obsolete: true
This was done by the render part of the patch, now unused.
Attachment #401769 - Attachment is obsolete: true
Today I have implemented an XBL binding that renders annotation balloons just like those used in youtube.

You can see a demo of it here:
http://bighead.poli.usp.br/~juca/code/svg/xbl/annotations/ann_example.xul

It uses SVG to render the balloon. This demo does not parse an annotation script. It just renders an example ballon and follows the mouse cursor. But mixing it with our current patches would not be hard.

Something similar could be done in order to render Kate streams perhaps.
Status: UNCONFIRMED → NEW
Ever confirmed: true
With the new video backend in, I've updated the patch for it.

It's pretty pointless doing more than a proof of concept to elicit comments now, so this merely decodes the first Kate stream and prints the incoming text and timing on stdout as it's decoded.
Attachment #400617 - Attachment is obsolete: true
Attachment #400618 - Attachment is obsolete: true
Attachment #401207 - Attachment is obsolete: true
Attachment #401711 - Attachment is obsolete: true
Attachment #401901 - Attachment is obsolete: true
With the arrival of WebM there are no plans to support additional codecs within the Ogg container. Thanks for the efforts that you've put into the original patch. For subtitle support we'll most likely be looking at implementing WebVTT.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Status: RESOLVED → VERIFIED
That unfortunate - I was hoping that we could encapsulate WebVTT into Ogg through the use of Kate and then support in-band captions/subtitles/etc from within Ogg. This patch would provide almost all that is required for that already. It would need to be updated for the current codebase though.
If I understand you correctly, wouldn't taking that approach that only provide WebVTT support for the Ogg backend? Which would mean if we uesd Kate we'd effectively be carrying two WebVTT implementations - Kate's for Ogg support and ours for other backends.
How do you plan to include WebVTT into Ogg? Are you planning on making your own mapping? Or are you planning on working with the Xiph community to put WebVTT into Ogg? Are you already working on an implementation for how to encapsulate WebVTT in Ogg?
My questions about WebVTT in Ogg via Kate were about the presentation of the subtitles rather than how the WebVTT data is embedded in the file itself.

Any cross-backend implementation in the browser is likely to use whatever means provided by the container libraries to extract the WebVTT data and then have the browser use that data to render. ie. I wouldn't use Kate to render the subtitles and return the burnt-in video+subtitle image to us. Possibly I misunderstood what you were suggesting in comment 32.

I haven't got any plans to work out how to encapsulate WebVTT in Ogg. I would defer to more Ogg experienced people than myself for that.
Oh, there is a misunderstanding. What I referred to was Kate as a mapping of text tracks into Ogg. Think of it like Theora as the codec for video - Kate is the codec for text.

There is also a library called libkate, which together with a library called libtiger does the rendering. But you don't need to use libtiger for rendering - in fact, in a browser environment, libtiger makes no sense (and if I understand correctly, the patch did not use libtiger). When adapted, you could use libkate to extract the WebVTT data, which is what the attached patch did IIUC. But if you are unhappy about that patch, then of course you can also do your own codebase.
You need to log in before you can comment on or make changes to this bug.