WebVTT parser is not able to deal with partial lines

RESOLVED FIXED in Firefox 60

Status

()

defect
P2
normal
RESOLVED FIXED
2 years ago
5 months ago

People

(Reporter: baku, Assigned: baku)

Tracking

58 Branch
mozilla60
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox60 fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

vtt.jsm assumes that parse() receives full lines with \r\n. But this is not true because this method is called by WebVTTParserWrapper.parse()

https://dxr.mozilla.org/mozilla-central/source/dom/media/webvtt/WebVTTParserWrapper.js#34

and from there WebVTTListener::OnDataAvailable()

https://dxr.mozilla.org/mozilla-central/source/dom/media/WebVTTListener.cpp#132

where ::ReadSegments() is used. Definitely this doesn't split the content by line.
Posted patch webtt.patch (obsolete) — Splinter Review
Note that this doesn't fix the webvtt parser. The parser doesn't follow the spec at all, but that is a different bug.

Here I wrote a parser able to deal with complete lines with a better state machine. It passes all the WPTs, but I also added a xpcshell test for partial lines.
Attachment #8947419 - Flags: review?(bugs)
Blocks: 1405974
Priority: -- → P2
Comment on attachment 8947419 [details] [diff] [review]
webtt.patch

rillian, I think it would be better if you reviewed this.
I'd need to review webvtt spec and the whole implementation before understanding the changes.
Attachment #8947419 - Flags: review?(bugs) → review?(giles)
I'm the typical reviewer, although I'm not a jsm expert. Note I won't be able to get to this until tomorrow.
Comment on attachment 8947419 [details] [diff] [review]
webtt.patch

Review of attachment 8947419 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for your patience. I think I follow the code changes now, and the idea is fine. Some comments though. Please address those and ask for review again, just so I can make sure I understand the changes.

::: dom/media/webvtt/tests/test_parser.js
@@ +7,5 @@
> +function str2ab(str) {
> +  var buf = new ArrayBuffer(str.length);
> +  var bufView = new Uint8Array(buf);
> +  for (var i=0, strLen=str.length; i < strLen; i++) {
> +    bufView[i] = str.charCodeAt(i);

Isn't this going to mangle anything which isn't ascii? That will still work for the tests in this patch, but I'd rather not leave bad code lying around to break later. Or worse, be copied into other tests.

Can you not use TextEncoding to do the same thing? That would be cleaner as well as more correct and (probably) faster. I don't know if there's a way to get at the underlying ArrayBuffer, but maybe the Uint8Array could be passed to the parser instead?

@@ +43,5 @@
> +             "- They won't get in your hair. They're after the bugs.\n", ],
> +    cue: 3, region: 0 },
> +
> +  // Note
> +  { input: [ "WEBVTT - This file has cues.\n", "\n", "NOTE what" ],

"This file has no cues.\n" ?

::: dom/media/webvtt/vtt.jsm
@@ -1159,2 @@
>          var pos = 0;
> -        while (pos < buffer.length && buffer[pos] !== '\r' && buffer[pos] !== '\n') {

You can drop the length bound here because the earlier `while` guarantees this one will terminate inside the buffer?

@@ +1336,4 @@
>            }
>  
> +          if (/^REGION|^STYLE/.test(line)) {
> +            // The line is another REGION/STYLE, parse tempStr then reset tempStr.

// ... parse and reset substatebuffer.

@@ +1346,2 @@
>  
> +          self.substatebuffer += " " + line;

This could use a flow comment as well, connecting to to comment block at the top. Something like,

// We weren't able to parse the line as a header. Accumulate and return.
Attachment #8947419 - Flags: review?(giles)
> @@ -1159,2 @@
> >          var pos = 0;
> > -        while (pos < buffer.length && buffer[pos] !== '\r' && buffer[pos] !== '\n') {
> 
> You can drop the length bound here because the earlier `while` guarantees
> this one will terminate inside the buffer?

Right. Because "while (/\r\n|\n|\r/.test(this.buffer)) {" guarantees that I'll find a \n or a \r in the buffer.

New patch, soon! Thanks for the review.
Posted patch webtt.patchSplinter Review
Attachment #8947419 - Attachment is obsolete: true
Attachment #8948908 - Flags: review?(giles)
Comment on attachment 8948908 [details] [diff] [review]
webtt.patch

Review of attachment 8948908 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/webvtt/tests/test_parser.js
@@ +2,5 @@
> +
> +const {classes: Cc, interfaces: Ci, utils: Cu, results: Cr} = Components;
> +
> +Cu.import("resource://gre/modules/vtt.jsm");
> +Cu.import("resource://gre/modules/XPCOMUtils.jsm");

The needs updating for bug 1432992; s/Cu/ChromeUtils/.

@@ +4,5 @@
> +
> +Cu.import("resource://gre/modules/vtt.jsm");
> +Cu.import("resource://gre/modules/XPCOMUtils.jsm");
> +
> +XPCOMUtils.defineLazyGetter(this, "gTextEncoder", () => new TextEncoder());

I'm not sure the lazy getter pays for itself here. I guess error messages will have a better order?

::: dom/media/webvtt/vtt.jsm
@@ +28,5 @@
>   */
>  
>  var Cu = Components.utils;
>  Cu.import('resource://gre/modules/Services.jsm');
> +Cu.import("resource://gre/modules/XPCOMUtils.jsm");

Bug 1432992 bitrotted this part. I think Cu is just renamed to ChromeUtils; it should be obvious how to resolve the conflict.

@@ +1376,2 @@
>          if (self.state === "HEADER") {
> +          // parseHeader return false if the same line doesn't need to be parsed

// parseHeader returns <- verb number should agree with subject.
Attachment #8948908 - Flags: review?(giles) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/55a9b2fe876c
WebVTT parser must be able to deal with partial lines, r=rillian
https://hg.mozilla.org/mozilla-central/rev/55a9b2fe876c
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.