Bug 629350 (webvtt)

[meta] Tracking bug for WebVTT implementation

NEW
Assigned to

Status

()

enhancement
P2
normal
9 years ago
24 days ago

People

(Reporter: anti-stress, Assigned: alwu)

Tracking

(Depends on 18 bugs, Blocks 3 bugs, 7 keywords)

Dependency tree / graph
Bug Flags:
sec-review +

Firefox Tracking Flags

(Not tracked)

Details

()

Attachments

(2 attachments)

User-Agent:       Mozilla/5.0 (Windows NT 5.1; rv:2.0b10) Gecko/20100101 Firefox/4.0b10
Build Identifier: 

The track element (within HTML5) allows in particular to subtitle videos which is quite important especialy for non-English speakers since a lot of video on the Web are in English.
As a french speaker I'm particularly interested in subtitling web videos to spraed their messages.
At present time the best thing to do is to use JavaScript (Universal subtitles widget, JQuery-srt...) which is not optimal since Planet Websites and RSS feed reader don't allow JavaScript.
Being able to use an HTML element would be a great improvement for i18n and a11y.
The track HTML5 element seems perfect for that.
Thanks

http://blog.gingertech.net/2010/10/02/state-of-media-accessibility-in-html5/
http://www.w3.org/TR/html5/video.html#the-track-element


Reproducible: Always

Steps to Reproduce:
Use the track HTML5 element in a web page
Actual Results:  
track element is not taken into accound

Expected Results:  
it should allow to display (in particular) subtitles without the need og JavaScript
Couldn't find an existing bug on this, so confirming.
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
Version: unspecified → Trunk
seems to be a dup of Bug 620664
Bug 620664 was about adding support for the element to the HTML5 parser.  This bug is about implementing the element's functionality.
Depends on: 620664
Thanks.
Could we expect having preliminary support for WEBVTT files in Firefox this year (Firefox 6 or 7) since that Bug 620664 has already done his part ?

That would help open video on the web to succeed. As a french user, and considering that a lot of videos are in english, i really would like the feature to be implemented ASAP. This is about about adding a new possibility to video on the web which seems more important to me than only making existant things faster (which is also great :-)
Blocks: 663647
Blocks: html
Assignee: nobody → giles
I'm glad to see that somone (tnahk you Ralph Giles !) picked up this bug
I would love to help but i have no skill, sorry :-/

i don't know if it may help, but WebKit has done some progresses there :
https://bugs.webkit.org/show_bug.cgi?id=62882
https://bugs.webkit.org/show_bug.cgi?id=62881
Blocks: 690737
Ralph, are you going to take care about accessibility in this bug or should I file new one for this?
Whiteboard: [parity-ie]
Hi, is there a roadmap concerning this feature implementation ?
Right. Roadmap.

I'm slowly implementing this feature. There's a branch on github if you want to contribute patches, but I don't quite have something useful yet. The plan:

* Write a toy parser, just enough for the first demo
  (done, https://github.com/rillian/webrtc)
* Implement the HTML5 track element in the parser
  (already done by hsivonen in bug 620664)
* Implement the track element as an XPCOM interface
  (partially done. Patches on https://github.com/rillian/firefox/commits/webvtt)
* Add an anonymous content div to nsVideoFrame for caption display
  (done, patches on the same branch)

- Hook the parser up to a texttrack decoder and display captions
  (not done; this will be the first demo)
- Rewrite the parser library
  (will require security review)
- Probably rewrite the patch set in response to review comments
- Attempt to land basic webvtt caption support
- Support text track enable/disable and language preference matching in the default controls
- Support for rendering instructions, javascript interfaces, accessibility hooks
- Support text tracks encapsulated in Ogg and WebM media files
Indeed, I did. Thanks for the correction.

I had seen annevk's validator. Very cool!
Demo patch. This isn't in any shape for use, it's just me figuring out how things work.

It does show (short!) webvtt overlays. I've been testing with https://people.xiph.org/~giles/2012/sample.html
(In reply to alexander :surkov from comment #6)
> Ralph, are you going to take care about accessibility in this bug or should
> I file new one for this?

I'd like to support accessibility as I go along, but I need help with what to do. I tried the demo patch with a screen reader, but it didn't seem to see the captions. Is there an aria-role I can supply to make them visible? Some nsFrame attribute?
With WebVTT cues, if they were rendered into the normal dom, I'd suggest adding a aria-live attribute. That would get the screen reader to read it out as the text appears on screen (thus solving accessibility for type=description at the same time as for other types). Since I assume you are rendering the text into the shadow dom, you will have to figure out if you can make the elements in the shadow dom as accessible.

And, btw, the video controls are not accessible either - they would need a @tabindex to be reachable by keyboard and then roles on them such as "button" and @label or @aria-label to provide a short announcement text.
(In reply to Silvia Pfeiffer from comment #14)
> Since I assume you
> are rendering the text into the shadow dom, you will have to figure out if
> you can make the elements in the shadow dom as accessible.

accessibility should pick it up since nsVideoFrame::AppendAnonymousContentTo is fitted. You could check accessible by DOM Inspector (Accessible Tree view). So all you need is you should put aria-live attribute on that anonymous div.

Silvia, does aria-live="assertive" sound reasonable?

also it'd be great if you can add a11y mochitest:
1) fix tree/test_media.html - http://mxr.mozilla.org/mozilla-central/source/accessible/tests/mochitest/tree/test_media.html?force=1
2) add elm/test_media_track.html to see if we show/hide events are fired for changed captions and container-live object attribute is exposed on event targets.

please let me know if you need more details

(In reply to Silvia Pfeiffer from comment #14)
> And, btw, the video controls are not accessible either - they would need a
> @tabindex to be reachable by keyboard and then roles on them such as
> "button" and @label or @aria-label to provide a short announcement text.

well, they aren't reachable by tabbing and we have a bug for that but it sounds as different issue, no?
(In reply to alexander :surkov from comment #15)
> 
> Silvia, does aria-live="assertive" sound reasonable?

Absolutely. You don't want "polite" because then you might miss some text.


> (In reply to Silvia Pfeiffer from comment #14)
> > And, btw, the video controls are not accessible either - they would need a
> > @tabindex to be reachable by keyboard and then roles on them such as
> > "button" and @label or @aria-label to provide a short announcement text.
> 
> well, they aren't reachable by tabbing and we have a bug for that but it
> sounds as different issue, no?

Fair enough. :-)
I notice there hasn't been much work on this lately.  I'm rather interested in the metadata "kind" of track.  Might it be simpler to implement that first (before subtitles, etc.), and work on the rest later?
I don't think so. I mean, the actual rendering is a separate piece, but most of the work to be done between here and there is writing a better parser.

I'm not working on this at the moment though, so if you're interested in continuing the work in the current patch, feel free.
I'm happy to offer guidance is someone else wants to work on this in the meantime.
Whiteboard: [parity-ie] → [parity-ie] [mentor=rillian] [lang=c++]
Ralph do you have any bite sized pieces that could get reviewed and landed? (Prefed or #ifdef'd off I guess?)
Whiteboard: [parity-ie] [mentor=rillian] [lang=c++] → [parity-ie-chrome] [mentor=rillian] [lang=c++]
The plumbing to the video element and the overlay div for displaying the captions should be ready for review and can land without needing a pref since there's no way to feed them from web content.

The next pieces that need work are a non-toy webvtt parser, and the TextTrack dom interface, which IIRC needs some cleanup before it's ready for review.
Whiteboard: [parity-ie-chrome] [mentor=rillian] [lang=c++] → [parity-ie] [parity-chrome] [mentor=rillian] [lang=c++]
BTW, a class as Seneca college is working on this bug during the current semester.
Ok, We want to get this going again, and this :humphd's Seneca college class is going to help.

First step is to get this patch up to date. Several class and file names have changed, so it isn't going to apply cleanly.

Then, we need to hide it behind a pref, split it into logical pieces, get them reviewed by the appropriate folks, and landed.

The current patch assumes you've checked out the webvtt parse into media/webvtt, but that's not going to work for in-tree code. One of the questions for review-time is how we should resolve that. Probably import the current release in a separate bug, then rely on the runtime pref to block access until we're more confident in the implementation.

We can land the display part separately, though, since it doesn't depend on the parser.
> Then, we need to hide it behind a pref, split it into logical pieces, get
> them reviewed by the appropriate folks, and landed.

I'd be interested to hear more about how you want to split it up.
The three obvious pieces are: the nsVideoFrame changes to add the display div, the import and build support for the parser library in media/webvtt, and the WebVTTDecoder (would be better as TextDecoder?) implementation in content/media/webvtt.

The TextTrack, TextTrackCue, etc. stubs should be rewritten to use the new webidl compiler.
I started converting the IDL in Ralph's patch to use our new webidl bindings, and there is an issue with TextTrackCue as currently defined, which uses a union of primitive types for the line attribute.  I talked to bz and this is not allowed per the WebIDL spec.  I filed https://www.w3.org/Bugs/Public/show_bug.cgi?id=20651.

Given:

enum AutoKeyword { "auto" };

[Constructor(double startTime, double endTime, DOMString text)]
interface TextTrackCue : EventTarget {
...
           attribute (long or AutoKeyword) line;



Traceback (most recent call last):
  File "/Users/dave/Sites/repos/mozilla-central/config/pythonpath.py", line 56, in <module>
    main(sys.argv[1:])
  File "/Users/dave/Sites/repos/mozilla-central/config/pythonpath.py", line 48, in main
    execfile(script, frozenglobals)
  File "/Users/dave/Sites/repos/mozilla-central/dom/bindings/GlobalGen.py", line 78, in <module>
    main()
  File "/Users/dave/Sites/repos/mozilla-central/dom/bindings/GlobalGen.py", line 56, in main
    parserResults = parser.finish()
  File "/Users/dave/Sites/repos/mozilla-central/dom/bindings/parser/WebIDL.py", line 4148, in finish
    production.finish(self.globalScope())
  File "/Users/dave/Sites/repos/mozilla-central/dom/bindings/parser/WebIDL.py", line 552, in finish
    member.finish(scope)
  File "/Users/dave/Sites/repos/mozilla-central/dom/bindings/parser/WebIDL.py", line 2104, in finish
    t = self.type.complete(scope)
  File "/Users/dave/Sites/repos/mozilla-central/dom/bindings/parser/WebIDL.py", line 1376, in complete
    [self.location, t.location, u.location])
WebIDL.WebIDLError: error: Flat member types of a union should be distinguishable, Long is not distinguishable from AutoKeyword (Wrapper), TextTrackCue.webidl line 22:21
           attribute (long or AutoKeyword) line;
                     ^
<builtin type>

TextTrackCue.webidl line 22:30
           attribute (long or AutoKeyword) line;
Canvas has the same union approach: http://www.whatwg.org/specs/web-apps/current-work/#2dcontext . Might be worth checking how that got resolved.
We have webidl for CanvasRenderingContext2D using the new bindings now, but it does all of its unions without using primitive types.  That's the issue here.  We just need that fixed.
(In reply to Ralph Giles (:rillian) from comment #25)
> The three obvious pieces are: the nsVideoFrame changes to add the display
> div, the import and build support for the parser library in media/webvtt,
> and the WebVTTDecoder (would be better as TextDecoder?) implementation in
> content/media/webvtt.

I don't think we need WebVTTDecoder to inherit from BuiltinDecoder (which has been renamed to MediaDecoder since this patch was first written).

MediaDecoder is designed to be the only decoder owned by the nsHTMLMediaElement, and includes a state machine and so on that assumes one contained audio and video track. We'd be better off having some kind of custom TextTrackDecoder object that manages the libwebvtt decoder and co-operates with the MediaDecoderStateMachine (or the nsHTMLMediaElement if we can do it at that level). Then we don't pull in all the unnecessary cruft that comes in with being a MediaDecoder subclass.
(In reply to Chris Pearce (:cpearce) from comment #29)
> We'd be better off having some kind of
> custom TextTrackDecoder object that manages the libwebvtt decoder and
> co-operates with the MediaDecoderStateMachine (or the nsHTMLMediaElement if
> we can do it at that level). Then we don't pull in all the unnecessary cruft
> that comes in with being a MediaDecoder subclass.
Is that for better combining of <audio> elements with webvtt also?
I think not inheriting from Builtin/MediaDecoder would make it easier and simpler to implement.

And if the TextTrackDecoder (or whatever we call it) co-operates with MediaDecoderStateMachine or nsHTMLMediaElement we'll be have support for both <video> and <audio> elements, since both nsHTMLVideoElement and nsHTMLAudioElement inherit from nsHTMLMediaElement.

Does that answer your question?
(In reply to Chris Pearce (:cpearce) from comment #31)
> Does that answer your question?
Yes, thank you. My interest is in the placement of subtitles in relation to the <audio> element with and without the "controls" attribute, but I think that's more for bug 515898
Depends on: 830879
I've rewritten Ralph's original patch (thanks for doing so much ground work) to use the new WebIDL bindings--I'm positive they aren't 100% correct yet, but this is hopefully not far off--among other things.  We'll fix it in post, as they say.  I just want to put this here for reference.  In order to build this patch, you have to also apply https://bug830879.bugzilla.mozilla.org/attachment.cgi?id=702428, see bug 830879.

At this point I want to hand the patch off to my students, so we'll be using this bug as a tracking bug, and filing smaller tickets in order to parallelize development.  We'll break bits of this patch out into those separate bugs.

NOTE: there is also work happening in https://github.com/mozilla/webvtt/pull/1 to get the libwebvtt C/C++ parser reviewed.
Depends on: 833386
Depends on: 833388
Depends on: 833385
Depends on: 833382
Depends on: 833403
Flags: sec-review?
:cdiehl - would peach be good for this? If so then :rforbes can work on it
[moved from 833403]
Flags: sec-review?(rforbes)
Flags: needinfo?(cdiehl)
Yes, I will mentor rforbes in fuzzing WebVTT.
Flags: needinfo?(cdiehl)
A note for us to follow-up on...

While rebasing today I hit an issue that required me to change the WebIDL for HTMLMediaElement for addTextTrack(), which is going to need a spec bug filed:

11:32 < humph> bz: the HTMLMediaElement has AddTextTrack( string, [optional] 
               string, [optional] string )
11:32 < humph> bz: which needs to call TextTrack() with strings
11:33 < bz> humph: that IDL is bogus
11:33 < bz> If the label argument was omitted, let label be the empty string.
11:33 < bz> If the language argument was omitted, let language be the empty string.
11:33 < bz> That's in the prose
11:33 < bz> should just be in the IDL and be done with it
11:33 < bz> So: TextTrack addTextTrack(DOMString kind, optional DOMString label = 
            "", optional DOMString language = "");
11:33 < bz> File spec bugs?
Yes, sure, file a bug: https://www.w3.org/Bugs/Public/enter_bug.cgi?product=HTML%20WG or https://www.w3.org/Bugs/Public/enter_bug.cgi?product=WHATWG or both. :-)

"optional" is still relatively new to IDL, so we're slowly bringing it into the HTML spec.
I'm a devotee of FOSS and working at Gallaudet University, the world's only accredited liberal arts university for deaf students (and we have an elementary school and secondary school for deaf students on the same campus), I'm planning to follow this thread pretty closely. I just started using WEBVTT with the misunderstanding that support was broader.  I'm not sure what I can do to help, other than test periodically...
(In reply to dc.loco from comment #38)
> I'm a devotee of FOSS and working at Gallaudet University

Hi there, thanks for your interest. Right now we most need coding and testing, so please do follow along if you're able to do either of those things!
Duping forward to the bug with patches...
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 833385
(In reply to :Ms2ger from comment #42)
> Duping forward to the bug with patches...
> 
> *** This bug has been marked as a duplicate of bug 833385 ***

Umm, shouldn't all the dependencies of this one be added to the other one as well, then? I did that for the html5 and html5test bugs, but it probably needs to be done for the rest as well.
We were using this as a tracking bug. Please leave it open.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Keywords: meta
Depends on: vtt-css-extensions
Depends on: 865401
Blocks: 865407
No longer blocks: 865407
Depends on: 865407
Alias: webvtt
I have developed a very simple track/textrack polyfill (http://jsfiddle.net/trixta/QZJTM/) and also a script for styleable controls (https://github.com/aFarkas/jMediaelement).

My problem with the current Track spec and implementations as a webdeveloper are the following:

1. We need a way to shrink the rectangel in which the cues are displayed using CSS. 
The webvtt features for positioning are not suitable for all usecases. As soon as we develope custom styleable controls, which are placed over the video element, we need a way to "reserve" this space for those controls. Due to the fact, that this is depending on the style of our webpage and our mediaplayer and not related to the contetnt of the vtt file, it has to be defined using CSS. For example:

::cuedisplay {
    top: 0;
    left: 0;
    right: 0;
    bottom: 40px; /* space is needed for overlaying custom styleable controls at the bottom of the video */
}

2. There is no 'trackmode' change event.
While we have a lot of special events for adding/removing tracks and cuechange/cueexit/cueenter. We do not have an event for the case, where a user changes the mode of a track. Again, this is for example needed for custom styleable controls. If the user changes the mode using the context menue from showing/disabled to disabled/showing we need an event for this to update for example the visual state of our controls. For example:

track.addEventListener('modechange', function(e){
    if(this.mode == 'showing'){
        //do something
    } else {
        //do something else    
    }
});

Currently all custom styleable mediaplayers with track support, removedo not relay on the native implementations and handle the "texttrack" display with script. This is a shame. :-(

I know this is mainly spec related, but you guys are bringing the web forward :-D
That's a great point Alexander Farkas, it would be a good for you to file a bug on http://dev.w3.org/html5/webvtt/ WRT this (and if possible, CC :rillian, :caitp, :reyre, and whoever else)
It might actually be better suited for http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#the-track-element actually, since that requirement would probably be desirable for any timed text format
Cool, I've cc-ed them to the WHATWG, so Ian can work on it, too (i.e. whoever gets to it faster).
Depends on: 875169
Depends on: 876505
Depends on: 879426
Depends on: 879431
Depends on: 880064
Depends on: 880094
Blocks: 880711
Depends on: 880851
Depends on: 881475
Depends on: 881976
Depends on: 881978
Depends on: 882131
Depends on: 882299
Depends on: 882535
Depends on: 882661
Depends on: 882700
Depends on: 882718
No longer depends on: 882299
No longer depends on: 881976
No longer depends on: 879431
No longer depends on: 867823
No longer depends on: 868519
Depends on: 882817
Depends on: 882915
Depends on: 883122
Depends on: 883843
Depends on: 884507
Depends on: 884879
Depends on: 886748
Depends on: 887463
Depends on: 887934
Depends on: 890051
Depends on: 891052
Depends on: 891381
See Also: → 891661
Depends on: 895091
Depends on: 909993
Depends on: 913016
Depends on: 917945
Depends on: 918289
Depends on: 920088
Depends on: 921484
Flags: sec-review?(rforbes) → sec-review?(cdiehl)
Depends on: 949642
Depends on: 949643
Depends on: 950049
Depends on: 952130
This is clearly your bug now, Rick. :-)
Assignee: giles → rick.eyre
Thanks Ralph :-).
Depends on: 960184
Flags: sec-review?(cdiehl) → sec-review+
Depends on: 969506
QA Contact: alexandra.lucinet
Depends on: 941701
Depends on: 974017
Depends on: 976580
Depends on: 977302
Depends on: 978163
Depends on: 981691
Depends on: 982183
Depends on: 983182
Depends on: 983207
Depends on: 985484
WebVTT is missing from Beta release notes (please see http://www.mozilla.org/en-US/firefox/29.0beta/releasenotes/), although it's enabled by default on Firefox 29 beta 1. Any thoughts?
relnote-firefox: --- → ?
Depends on: 992664
No longer depends on: 886748
WebVTT is going to be disabled for 29 and 30 (cf bug 981280).
Until we don't know in which release it is going to ship in, I cannot update the tracking flags accordingly...
Rick, can you confirm, as suggested in bug 981280, that we are going to ship it for 31?
Thanks
Flags: needinfo?(rick.eyre)
Yep, we will be shipping in FF31. Finally! :)
Flags: needinfo?(rick.eyre)
Excellent! Added back to the release notes.
Mentor: giles
Whiteboard: [parity-ie] [parity-chrome] [mentor=rillian] [lang=c++] → [parity-ie] [parity-chrome] [lang=c++]
Depends on: 1028581
Depends on: 1035582
Depends on: 1059541
Component: Audio/Video → Audio/Video: Playback
Depends on: 1206304
Depends on: 1242599
Depends on: 1242594
Depends on: 1270122
Depends on: 1010707
Blocks: 1275492
Depends on: 1275808
Blocks: 1276129
No longer blocks: 1276129
Depends on: 1276129
Depends on: 1276130
Depends on: 1274884
Depends on: 1276830
Depends on: 1276831
Depends on: 1276832
Depends on: 1276833
Depends on: webvtt-wpt
Depends on: 1277437
Depends on: 1278164
Depends on: 1280644
Depends on: 1281418
Depends on: 1283417
Depends on: 1283803
Depends on: 1285897
Depends on: 1286497
Depends on: 1286751
No longer blocks: 1275492
Depends on: 1275492
Depends on: 1307710
Depends on: 1334112
Depends on: 1338030
Depends on: 1338031
Assignee: rick.eyre → nobody
Depends on: 1451747
Mass bug change to replace various 'parity' whiteboard flags with the new canonical keywords. (See bug 1443764 comment 13.)
Whiteboard: [parity-ie] [parity-chrome] [lang=c++] → [lang=c++]
Depends on: 1488673
Severity: enhancement → normal
Status: REOPENED → NEW
Priority: -- → P2
Summary: Implement the track element → [meta] Implement WebVTT
Depends on: 1527688
Severity: normal → enhancement
Depends on: 1528420
Depends on: 1509446
Depends on: 1527874

Use this bug as the only one tracking bug for all WebVTT implementation.

Mentor: giles
QA Contact: adalucin
Summary: [meta] Implement WebVTT → [meta] Tracking bug for WebVTT implementation
Whiteboard: [lang=c++]
Assignee: nobody → alwu
Depends on: 1531863
Depends on: 1534862
Depends on: 1534888
Depends on: 1534904
Depends on: 1535005
Depends on: 1535223
Depends on: 1536762
Depends on: 1544455
Depends on: 1545587
Depends on: 1548731
Depends on: 1548923
Depends on: 1550633
Depends on: 1551045
Depends on: 1551385
Depends on: 1552081
Depends on: 1555197
Depends on: 1555825
Depends on: 1555836
Depends on: 1555849
Depends on: 1556581
Depends on: 1453774
Depends on: 1161738
Depends on: 1557882
Depends on: 1464012
Depends on: 1305732
Depends on: 1541452
Depends on: 1562021
Depends on: 1562353
You need to log in before you can comment on or make changes to this bug.