Closed Bug 1434884 Opened 6 years ago Closed 6 years ago

WebVTT parser is not able to deal with partial lines

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla60

Tracking Flags:

Tracking

Status

firefox60

---

fixed

People

(Reporter: baku, Assigned: baku)

References

Details

Attachments

(1 file, 1 obsolete file)

webtt.patch 6 years ago Andrea Marchesini [:baku] 17.51 KB, patch		Details \| Diff \| Splinter Review
webtt.patch 6 years ago Andrea Marchesini [:baku] 17.54 KB, patch	rillian : review+	Details \| Diff \| Splinter Review

Andrea Marchesini [:baku]

Assignee

Description

•

6 years ago

vtt.jsm assumes that parse() receives full lines with \r\n. But this is not true because this method is called by WebVTTParserWrapper.parse()

https://dxr.mozilla.org/mozilla-central/source/dom/media/webvtt/WebVTTParserWrapper.js#34

and from there WebVTTListener::OnDataAvailable()

https://dxr.mozilla.org/mozilla-central/source/dom/media/WebVTTListener.cpp#132

where ::ReadSegments() is used. Definitely this doesn't split the content by line.

Andrea Marchesini [:baku]

Assignee

Comment 1

•

6 years ago

Attached patch webtt.patch (obsolete) — Details — Splinter Review

Note that this doesn't fix the webvtt parser. The parser doesn't follow the spec at all, but that is a different bug.

Here I wrote a parser able to deal with complete lines with a better state machine. It passes all the WPTs, but I also added a xpcshell test for partial lines.

Attachment #8947419 - Flags: review?(bugs)

Andrea Marchesini [:baku]

Assignee

Updated

•

6 years ago

Blocks: 1405974

Andreas Farre [:farre]

Updated

•

6 years ago

Priority: -- → P2

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 2

•

6 years ago

Comment on attachment 8947419 [details] [diff] [review]
webtt.patch

rillian, I think it would be better if you reviewed this.
I'd need to review webvtt spec and the whole implementation before understanding the changes.

Attachment #8947419 - Flags: review?(bugs) → review?(giles)

Ralph Giles (:rillian)

Comment 3

•

6 years ago

I'm the typical reviewer, although I'm not a jsm expert. Note I won't be able to get to this until tomorrow.

Ralph Giles (:rillian)

Comment 4

•

6 years ago

Comment on attachment 8947419 [details] [diff] [review]
webtt.patch

Review of attachment 8947419 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for your patience. I think I follow the code changes now, and the idea is fine. Some comments though. Please address those and ask for review again, just so I can make sure I understand the changes.

::: dom/media/webvtt/tests/test_parser.js
@@ +7,5 @@
> +function str2ab(str) {
> +  var buf = new ArrayBuffer(str.length);
> +  var bufView = new Uint8Array(buf);
> +  for (var i=0, strLen=str.length; i < strLen; i++) {
> +    bufView[i] = str.charCodeAt(i);

Isn't this going to mangle anything which isn't ascii? That will still work for the tests in this patch, but I'd rather not leave bad code lying around to break later. Or worse, be copied into other tests.

Can you not use TextEncoding to do the same thing? That would be cleaner as well as more correct and (probably) faster. I don't know if there's a way to get at the underlying ArrayBuffer, but maybe the Uint8Array could be passed to the parser instead?

@@ +43,5 @@
> +             "- They won't get in your hair. They're after the bugs.\n", ],
> +    cue: 3, region: 0 },
> +
> +  // Note
> +  { input: [ "WEBVTT - This file has cues.\n", "\n", "NOTE what" ],

"This file has no cues.\n" ?

::: dom/media/webvtt/vtt.jsm
@@ -1159,2 @@
>          var pos = 0;
> -        while (pos < buffer.length && buffer[pos] !== '\r' && buffer[pos] !== '\n') {

You can drop the length bound here because the earlier `while` guarantees this one will terminate inside the buffer?

@@ +1336,4 @@
>            }
>  
> +          if (/^REGION|^STYLE/.test(line)) {
> +            // The line is another REGION/STYLE, parse tempStr then reset tempStr.

// ... parse and reset substatebuffer.

@@ +1346,2 @@
>  
> +          self.substatebuffer += " " + line;

This could use a flow comment as well, connecting to to comment block at the top. Something like,

// We weren't able to parse the line as a header. Accumulate and return.

Attachment #8947419 - Flags: review?(giles)

Andrea Marchesini [:baku]

Assignee

Comment 5

•

6 years ago

> @@ -1159,2 @@
> >          var pos = 0;
> > -        while (pos < buffer.length && buffer[pos] !== '\r' && buffer[pos] !== '\n') {
> 
> You can drop the length bound here because the earlier `while` guarantees
> this one will terminate inside the buffer?

Right. Because "while (/\r\n|\n|\r/.test(this.buffer)) {" guarantees that I'll find a \n or a \r in the buffer.

New patch, soon! Thanks for the review.

Andrea Marchesini [:baku]

Assignee

Comment 6

•

6 years ago

Attached patch webtt.patch — Details — Splinter Review

Attachment #8947419 - Attachment is obsolete: true

Attachment #8948908 - Flags: review?(giles)

Ralph Giles (:rillian)

Comment 7

•

6 years ago

Comment on attachment 8948908 [details] [diff] [review]
webtt.patch

Review of attachment 8948908 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/webvtt/tests/test_parser.js
@@ +2,5 @@
> +
> +const {classes: Cc, interfaces: Ci, utils: Cu, results: Cr} = Components;
> +
> +Cu.import("resource://gre/modules/vtt.jsm");
> +Cu.import("resource://gre/modules/XPCOMUtils.jsm");

The needs updating for bug 1432992; s/Cu/ChromeUtils/.

@@ +4,5 @@
> +
> +Cu.import("resource://gre/modules/vtt.jsm");
> +Cu.import("resource://gre/modules/XPCOMUtils.jsm");
> +
> +XPCOMUtils.defineLazyGetter(this, "gTextEncoder", () => new TextEncoder());

I'm not sure the lazy getter pays for itself here. I guess error messages will have a better order?

::: dom/media/webvtt/vtt.jsm
@@ +28,5 @@
>   */
>  
>  var Cu = Components.utils;
>  Cu.import('resource://gre/modules/Services.jsm');
> +Cu.import("resource://gre/modules/XPCOMUtils.jsm");

Bug 1432992 bitrotted this part. I think Cu is just renamed to ChromeUtils; it should be obvious how to resolve the conflict.

@@ +1376,2 @@
>          if (self.state === "HEADER") {
> +          // parseHeader return false if the same line doesn't need to be parsed

// parseHeader returns <- verb number should agree with subject.

Attachment #8948908 - Flags: review?(giles) → review+

Pulsebot

Comment 8

•

6 years ago

Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/55a9b2fe876c
WebVTT parser must be able to deal with partial lines, r=rillian

Cristian Brindusan [:cbrindusan]

Comment 9

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/55a9b2fe876c

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox60: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla60

Nobody; OK to take it and work on it

Updated

•

5 years ago

Component: DOM → DOM: Core & HTML

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

WebVTT parser is not able to deal with partial lines

Categories

(Core :: DOM: Core & HTML, defect, P2)

Tracking

()

People

(Reporter: baku, Assigned: baku)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 1 obsolete file)

Description

Comment 1

Updated

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Attachment

General

Description

File Name

Content Type