Closed Bug 1283803 Opened 3 years ago Closed 3 years ago

[webvtt] Modify the webvtt parsing algorithm for header and cue identifier

Categories

(Core :: Audio/Video: Playback, defect, P3)

Other Branch
defect

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox50 --- fixed

People

(Reporter: alwu, Assigned: alwu)

References

(Blocks 1 open bug)

Details

Attachments

(5 files)

Fork from bug1278748

For webvtt, the parsing rule doesn't totally equal to syntax rule, and the parsing rules are more tolerant to author errors than the syntax allows. [1]

However, our parser only considers syntax rule. It causes that we can't parse the invalid vtt files.

[1] https://w3c.github.io/webvtt/#conformance-classes
Blocks: 1278748
No longer blocks: webvtt-wpt
Mass change P2 -> P3
Priority: P2 → P3
Although we fix most parsing errors in bug1278748, there are still some little problems in parsing.

In this bug, We would like to fix 5 remaining web-platform-tests and they're related with the header and cue identifier parsing.
Summary: [webvtt] Implement the webvtt parsing algorithm → [webvtt] Modify the webvtt parsing algorithm
Summary: [webvtt] Modify the webvtt parsing algorithm → [webvtt] Modify the webvtt parsing algorithm for header and cue identifier
Review commit: https://reviewboard.mozilla.org/r/64208/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/64208/
Attachment #8771357 - Flags: review?(giles)
Attachment #8771358 - Flags: review?(giles)
Attachment #8771359 - Flags: review?(giles)
Attachment #8771360 - Flags: review?(giles)
Comment on attachment 8771357 [details]
Bug 1283803 - part1 : modify function parsingHeader.

https://reviewboard.mozilla.org/r/64208/#review61906

::: dom/media/webvtt/vtt.jsm:272
(Diff revision 1)
> +    return input.indexOf("-->") !== -1;
> +  }
> +
> +  function maybeIsTimeStampFormat(input) {
> +    return /^\s*(\d+:)?(\d{2}):(\d{2})\.(\d+)\s*-->\s*(\d+:)?(\d{2}):(\d{2})\.(\d+)\s*/.test(input);
> +  }

If \v isn't a whitespace character in webvtt, is it safe to use \s here?

::: dom/media/webvtt/vtt.jsm
(Diff revision 1)
> +
> +          if (/^REGION|^STYLE/i.test(line)) {
> +            parseOptions(line, function (k, v) {
> -          switch (k) {
> +              switch (k) {
> -          case "Region":
> +              case "Region":
> -            // 3.3 WebVTT region metadata header syntax

You did a case-insensitive match. If the goal is to accept invalid files with the REGION and STYLE header lines containing lower-case letters, you should `switch(k.toUpperCase()` and compare against "REGION" (all capitals, to match the spec).
Attachment #8771357 - Flags: review?(giles) → review+
Comment on attachment 8771358 [details]
Bug 1283803 - part2 : modify vtt parsing algorithm.

https://reviewboard.mozilla.org/r/64210/#review61912

Great to see more tests passing!

::: dom/media/webvtt/vtt.jsm:1279
(Diff revision 1)
> +          self.cue = new self.window.VTTCue(0, 0, "");
> +        }
> +      }
> +
> +      // Paring cue identifier and the identifier should be unique.
> +      // Return true if the input is a cue identifier.

typo s/Paring/Parsing/

Also in the commit message.
Attachment #8771358 - Flags: review?(giles) → review+
Comment on attachment 8771359 [details]
Bug 1283803 - part3 : rename function parseSignature.

https://reviewboard.mozilla.org/r/64212/#review61914
Attachment #8771359 - Flags: review?(giles) → review+
Attachment #8771360 - Flags: review?(giles) → review+
(In reply to Ralph Giles (:rillian) on PTO until July 18 from comment #7)
> 
> If \v isn't a whitespace character in webvtt, is it safe to use \s here?
> 

We still have the white space check afterward, it would fail when the input contains "/v".
Here I just want to make sure the parsing state can be changed correctly, that is why I use the term "maybe" for the checking function.
 
> You did a case-insensitive match. If the goal is to accept invalid files
> with the REGION and STYLE header lines containing lower-case letters, you
> should `switch(k.toUpperCase()` and compare against "REGION" (all capitals,
> to match the spec).

Yes, I want to accept the lower-case letters for REGION and STYLE.
Thanks, I would modify it!
Comment on attachment 8771357 [details]
Bug 1283803 - part1 : modify function parsingHeader.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64208/diff/1-2/
Attachment #8771358 - Attachment description: Bug 1283803 - part2 : modify vtt paring algorithm. → Bug 1283803 - part2 : modify vtt parsing algorithm.
Comment on attachment 8771358 [details]
Bug 1283803 - part2 : modify vtt parsing algorithm.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64210/diff/1-2/
Comment on attachment 8771359 [details]
Bug 1283803 - part3 : rename function parseSignature.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64212/diff/1-2/
Comment on attachment 8771360 [details]
Bug 1283803 - part4 : handle unicode \u0000 and \uFFFD.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64550/diff/1-2/
Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/544795f12d67
part1 : modify function parsingHeader. r=rillian
https://hg.mozilla.org/integration/autoland/rev/348a4cebfe89
part2 : modify vtt parsing algorithm. r=rillian
https://hg.mozilla.org/integration/autoland/rev/2913e3ccf060
part3 : rename function parseSignature. r=rillian
https://hg.mozilla.org/integration/autoland/rev/1f511a0e1484
part4 : handle unicode \u0000 and \uFFFD. r=rillian
Backout by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/242b9e254d65
Backed out changeset 1f511a0e1484 for Media test failed
https://hg.mozilla.org/integration/autoland/rev/f82ca603aede
Backed out changeset 2913e3ccf060 
https://hg.mozilla.org/integration/autoland/rev/be05774c8594
Backed out changeset 544795f12d67 
https://hg.mozilla.org/integration/autoland/rev/37cc0da01187
Backed out changeset 348a4cebfe89
Comment on attachment 8771357 [details]
Bug 1283803 - part1 : modify function parsingHeader.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64208/diff/2-3/
Comment on attachment 8771358 [details]
Bug 1283803 - part2 : modify vtt parsing algorithm.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64210/diff/2-3/
Comment on attachment 8771359 [details]
Bug 1283803 - part3 : rename function parseSignature.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64212/diff/2-3/
Comment on attachment 8771360 [details]
Bug 1283803 - part4 : handle unicode \u0000 and \uFFFD.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64550/diff/2-3/
Attachment #8772337 - Flags: review?(giles) → review+
Pushed by alwu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5bdb5eb0d9f5
part1 : modify function parsingHeader. r=rillian
https://hg.mozilla.org/integration/autoland/rev/96fbf1b61d8f
part2 : modify vtt parsing algorithm. r=rillian
https://hg.mozilla.org/integration/autoland/rev/4a0f37eba059
part3 : rename function parseSignature. r=rillian
https://hg.mozilla.org/integration/autoland/rev/3cadccaea6c9
part4 : handle unicode \u0000 and \uFFFD. r=rillian
https://hg.mozilla.org/integration/autoland/rev/2f178db86d26
part5 : fix the fail test. r=rillian
You need to log in before you can comment on or make changes to this bug.