Closed Bug 562310 Opened 10 years ago Closed 10 years ago

Reject repeated commas when parsing |comma-wsp|-separated attributes |viewBox|, |points|, and from/by/to/values on animateMotion

Categories

(Core :: SVG, defect)

defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: dholbert, Assigned: dholbert)

References

Details

Attachments

(7 files, 9 obsolete files)

783 bytes, image/svg+xml
Details
5.63 KB, patch
roc
: review+
Details | Diff | Splinter Review
5.39 KB, patch
roc
: review+
Details | Diff | Splinter Review
3.84 KB, patch
roc
: review+
Details | Diff | Splinter Review
4.06 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
28.41 KB, patch
roc
: review+
Details | Diff | Splinter Review
6.48 KB, patch
roc
: review+
Details | Diff | Splinter Review
Attached image testcase
As noted in my patch for bug 436418 (quoted at the end of bug 436418 comment 60), we currently are very lax about SVG attributes that take |comma-wsp|-separated values.  (for example, the |viewBox| attribute, and the |points| attribute on a <polyline> element)

e.g. for |viewBox|, the spec says:
>  The value of the viewBox attribute is a list of four numbers <min-x>,
> <min-y>, <width> and <height>, separated by whitespace and/or a comma,
http://www.w3.org/TR/SVG/coords.html#ViewBoxAttribute

"Whitespace" there can mean any number of whitespace characters, but "a comma" is singular -- we shouldn't allow multiple commas (nor should we allow commas at the very beginning of the string, before the first number).

Currently, we treat whitespace and commas identically, for comma-wsp-separated attributes.  We shouldn't. In particular, we should satisfy these requirements:
 (a) We should accept *at most* 1 comma between values -- any more should be flagged as a parse error.
 (b) We should reject commas at the beginning and end of the string.
 (c) We should allow any amount of whitespace between values (on either side of the comma, if there is a comma).

Currently Firefox fails on (a) and (b), but passes (c).

Opera 10.10 satisfies all three (a/b/c), with one exception: it accepts values that include a single comma at the end of the string.  I interpret the spec's language "separated by [...] a comma" to mean that commas should only appear *between* values, not after the last value, so IMHO Opera is incorrect there.  Note that for the |values| attribute on <animate>, we reject semicolons at the end of the string.

Anyway, I'm filing this bug on making Mozilla's parsing more strict for these attributes.
Attached patch fix v1 (obsolete) — Splinter Review
Here's the fix, which layers on top of all the animateMotion patches in bug 436418.

It adds the class "SVGParserUtils", which has two chief components:
 - An abstract class "GenericSubstringParser".  (It's a slightly-tweaked version of GenericValueParser from bug 436418.  This patch replaces GenericValueParser with GenericSubstringParser.)
 - A generic parsing method, "ParseCharSeparatedString", for parsing strings that contain a list of values delimited by a particular character (e.g. comma or semicolon).  This method splits up the string into subcomponents based on the arguments that it's given, and it hands each subcomponent off to its GenericSubstringParser argument for parsing.

The rest of this patch is uses of the above tools to replace existing parsing code, for... 
 - the "values" attribute in nsSMILAnimationFunction
 - nsSVGPointList-valued attributes
 - nsSVGViewBox-valued attributes

This patch passes SVG reftests & mochitests locally, and I think it's ready for review, but I want to run it through TryServer first as a sanity check.  So, just flagging for feedback right now.
Attachment #442056 - Flags: feedback?(roc)
Did you know that there is a nsCommaSeparatedTokenizer class? (see nsSVGFeatures.cpp for an example of usage) Would using that make your code simpler?

Optional comma separated numbers are also parsed in nsSVGElement::ParseNumberOptionalNumber and ParseIntegerOptionalInteger.

And yet another wrinkle is that I believe that SMIL and SVG differ about whether you can begin/end with whitespace. For SVG it's disallowed in most cases whereas SMIL allows it I think.

You should disallow whitespace and the beginning/end of viewBox for instance and add a test for that.
(In reply to comment #2)
> Did you know that there is a nsCommaSeparatedTokenizer class? (see
> nsSVGFeatures.cpp for an example of usage) Would using that make your code
> simpler?

Did not know about that -- will look into it...

> And yet another wrinkle is that I believe that SMIL and SVG differ about
> whether you can begin/end with whitespace. For SVG it's disallowed in most
> cases whereas SMIL allows it I think.
[...]
> You should disallow whitespace and the beginning/end of viewBox for instance

I was actually just looking back over that -- currently the patch disallows initial & terminal whitespace for "points" & "viewBox" (because I was thinking SVG was stricter, like you're suggesting).  But SVG 1.2 Tiny actually says that both allow whitespace at begin & end:

> viewbox ::=
>    wsp* viewboxSpec wsp*
http://www.w3.org/TR/SVGMobile12/coords.html#ViewBoxAttributeEffectOnSiblingAttributes

> points-data:
>    wsp* coordinate-pairs? wsp*
http://www.w3.org/TR/SVGMobile12/shapes.html#PointsBNF

As for SVG 1.1 -- AFAIK, it doesn't specify whether begin/end whitespace is allowed for viewBox, but it does specify that it's allowed for <list-of-points> (for polyline 'points' attr):
> list-of-points:
>    wsp* coordinate-pairs? wsp*
http://www.w3.org/TR/SVG11/shapes.html#PointsBNF

So, anyway, I think begin/end whitespace *should* be allowed for these attributes (though the currently-posted patch rejects begin/end whitespace).
(In reply to comment #3)
> (In reply to comment #2)
> > Did you know that there is a nsCommaSeparatedTokenizer class? (see
> > nsSVGFeatures.cpp for an example of usage) Would using that make your code
> > simpler?
> 
> Did not know about that -- will look into it...

I actually don't think nsCommaSeparatedTokenizer helps us much here, at least in its current state.  Looks like it *requires* a comma as a delimiter, whereas for |points| & |viewBox| & from/by/to/values on animateMotion, the comma is merely *allowed* (but not required) as part of the delimiter. (i.e. the comma in comma-wsp is optional)

We could potentially extend nsCommaSeparatedTokenizer to take a boolean argument that lets us make the comma optional, though. (Note that ParseCharSeparatedString in the attached patch does this.)  And it'd be nice to allow the comma to be replaced with a custom separator char, e.g. a semicolon for the |values| attribute in SMIL animation.
Attached patch fix v2 (obsolete) — Splinter Review
This version ignores whitespace at begin & end of attributes, per comment 3.
Attachment #442056 - Attachment is obsolete: true
Attachment #442068 - Flags: feedback?
Attachment #442056 - Flags: feedback?(roc)
(In reply to comment #4)

> We could potentially extend nsCommaSeparatedTokenizer to take a boolean
> argument that lets us make the comma optional, though. (Note that
> ParseCharSeparatedString in the attached patch does this.)  And it'd be nice to
> allow the comma to be replaced with a custom separator char, e.g. a semicolon
> for the |values| attribute in SMIL animation.

Or have some template based class maybe.
Comment on attachment 442068 [details] [diff] [review]
fix v2

>+    if (next != end) {
>+      // Find beginning of next token (making sure there aren't multiple
>+      // separators before it).

Find the beginning of the next token...

>+
>+    if (mIsHalfwayThroughAPoint) {
>+      // We were halfway through a point at beginning of this function, so now

...at the beginning...

Seems much better than the code it replaced.
Cool! I'm currently working on merging this code into nsCommaSeparatedTokenizer.  (re-shaping it into a generalized nsCharSeparatedTokenizer)

I'll post a new patch (or probably a few patches actually) to do this in a bit.
Status: NEW → ASSIGNED
Depends on: animateMotion
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1272498597.1272500596.29090.gz
WINNT 5.2 mozilla-central debug test crashtest on 2010/04/28 16:49:57
s: mw32-ix-slave06
(argh, sorry, that last comment was for bug 555889)
Attachment #442068 - Flags: feedback?
Breaking this patch into 5 parts, so each part is self-contained and easy to quickly understand.

This first part is just a file-rename -- s/nsCommaSeparatedTokenizer.h/nsCharSeparatedTokenizer.h/.  No actual code changes, aside from updating all #includes of that file.

I'm not sure who I should ask for review on ns[Comma|Char]SeparatedTokenizer.h changes -- peterv, since he originally wrote nsWhitespaceTokenizer? or sicking, who forked that to nsCommaSeparatedTokenizer.h?  Or roc, who touched nsCommaSeparatedTokenizer.h most recently?
Attachment #442068 - Attachment is obsolete: true
Attachment #442809 - Flags: review?
(In reply to comment #12)
> I'm not sure who I should ask for review on ns[Comma|Char]SeparatedTokenizer.h
> changes -- peterv, since he originally wrote nsWhitespaceTokenizer? or sicking,
> who forked that to nsCommaSeparatedTokenizer.h?  Or roc, who touched
> nsCommaSeparatedTokenizer.h most recently?

(or one of the xpcom owners/peers, since the file is in /xpcom/ds/ ?)
This patch begins the actual code changes.  It:
  - Converts nsCommaSeparatedTokenizer class into nsCharSeparatedTokenizer<SeparatorChar>
  - Adds a typedef for |nsCommaSeparatedTokenizer|, so that I don't have to change clients of this class yet.[1][2]

[1] ...except for one instance where another file calls  lastTokenEndedWithComma(), which is now lastTokenEndedWithSeparator().
[2] Note: The nsCommaSeparatedTokenizer typedef will be removed in Part 4.
Attachment #442813 - Flags: review?
This patch adds a flag to make the separator optional, for SVG's "comma-wsp"-separated value lists.

When that flag is set, the basic idea is: a chunk of whitespace counts as a token-separator too, if it doesn't have a SeparatorChar on either side of it.  So e.g. the strings "5   5"  "5, 5" "5   ,5" and "  5 , 5" would all produce the same stream of two tokens.

I'm not sure whether this flag is better as a member variable or a templated variable -- currently it's a member variable, but I'm happy to change it if requested.
Attachment #442816 - Flags: review?
(Note that Part 3 updates the nsCommaSeparatedTokenizer typedef to have the new flag turned *off*, to preserve the existing nsCommaSeparatedTokenizer behavior.)
This patch removes the temporary nsCommaSeparatedTokenizer typedef, and substitutes in what it actually translates to -- a nsCharSeparatedTokenizer<','>(PR_TRUE).  (where the PR_TRUE means "separator char is required between tokens", per patch Part 3)
Attachment #442818 - Flags: review?(roc)
This patch actually adds the new uses of nsCharSeparatedTokenizer, in SMIL and SVG code.  In particular, this patch...
 - changes nsSVGPointList, nsSVGViewBox, and SVGMotionPathUtils (which all parse comma-wsp-separated lists/pairs of floats or lengths), so that they now reject repeated commas.  (Technically, they consider repeated commas to mean we've got an empty-string token between the two commas, and then the code for parsing the token rejects the empty string.)  They also reject commas at the end of the value / value-list. (since they're supposed to contain comma-*separated* values)
 - Changes nsSMILParserUtils::ParseValuesGeneric() to use a nsCharSeparatedTokenizer<';'> for parsing the SMIL "values" attribute.
 - Adds some reftests for valid/invalid viewBox values & invalid points-list values. (which were included in the earlier incarnation of this bug's patch, SVGParserUtils)
 - Tweaks the SMIL aniamteMotion mochitests to indicate that some |todo()| invalid values are now correctly recognized as invalid.
Attachment #442826 - Flags: review?(roc)
Comment on attachment 442809 [details] [diff] [review]
Part 1: Rename nsCommaSeparatedTokenizer.h

(In reply to comment #12)
> I'm not sure who I should ask for review on ns[Comma|Char]SeparatedTokenizer.h
> changes -- peterv, since he originally wrote nsWhitespaceTokenizer? or sicking,
> who forked that to nsCommaSeparatedTokenizer.h?  Or roc, who touched
> nsCommaSeparatedTokenizer.h most recently?

Picking roc, since he touched it most recently. (roc, let me know if someone else would be more appropriate as reviewer for these changes.)

Requesting SR as well, since this is a mildly cross-component change.
Attachment #442809 - Attachment is patch: true
Attachment #442809 - Attachment mime type: application/octet-stream → text/plain
Attachment #442809 - Flags: superreview?(roc)
Attachment #442809 - Flags: review?(roc)
Attachment #442809 - Flags: review?
Attachment #442813 - Flags: superreview?(roc)
Attachment #442813 - Flags: review?(roc)
Attachment #442813 - Flags: review?
Attachment #442816 - Flags: superreview?(roc)
Attachment #442816 - Flags: review?(roc)
Attachment #442816 - Flags: review?
Note: For symmetry, I imagine we'll want to apply Part 2 (& Part 3?) to the other class in ns[Comma|Char]SeparatedTokenizer.h: "nsCCommaSeparatedTokenizer".
(which would then become nsCCharSeparatedTokenizer<SeparatorChar>(PRBool aIsSeparatorRequired))

Once the already-posted patches get reviewed & fixed as-necessary, I can add a few more patch(es) to make identical changes to that class.
(In reply to comment #15)
> I'm not sure whether this flag is better as a member variable or a templated
> variable -- currently it's a member variable, but I'm happy to change it if
> requested.

I'd have gone for the latter. There aren't cases where you switch from one to the other so it's part of the DNA of the tokenizer. I guess we'll see what roc says.
Comment on attachment 442809 [details] [diff] [review]
Part 1: Rename nsCommaSeparatedTokenizer.h

You can't get sr from me for the same patch. However, I will assert that this doesn't need further super-review.
Attachment #442809 - Flags: superreview?(roc)
Attachment #442809 - Flags: review?(roc)
Attachment #442809 - Flags: review+
I don't think we should use template parameters here for flags or the separator char in part 2. That's code bloat (and complexity) for which we have no known performance benefit. Does that make sense?

If so, please fix that and refresh the patches.
(In reply to comment #23)
> I don't think we should use template parameters here for flags or the separator
> char in part 2. That's code bloat (and complexity) for which we have no known
> performance benefit. Does that make sense?

Yeah, I think so. It only would cause bloat in cases where a single .cpp file uses multiple types of nsCharSeparatedTokenizer, but that's a situation worth worrying about.  And the template-argument probably doesn't buy us that much optimization vs. a member variable.

So, here's a version of Part 2 with the SeparatorChar as a constructor argument, instead of a template argument.  (refreshed versions of other patches coming up next)
Attachment #442813 - Attachment is obsolete: true
Attachment #443041 - Flags: review?(roc)
Attachment #442813 - Flags: superreview?(roc)
Attachment #442813 - Flags: review?(roc)
Attachment #443041 - Attachment description: Rename class to nsCharSeparatedTokenizer, and take the SeparatorChar as a parameter in constructor → Part 2: Rename class to nsCharSeparatedTokenizer, and take the SeparatorChar as a parameter in constructor
Attachment #442816 - Attachment is obsolete: true
Attachment #443043 - Flags: review?(roc)
Attachment #442816 - Flags: superreview?(roc)
Attachment #442816 - Flags: review?(roc)
Attachment #443044 - Flags: review?(roc)
Attachment #442826 - Attachment is obsolete: true
Attachment #443045 - Flags: review?(roc)
Attachment #442826 - Flags: review?(roc)
Attachment #442818 - Attachment is obsolete: true
Attachment #442818 - Flags: review?(roc)
In part 3, I would simply have two constructors, one which takes a separator char and one which doesn't; the one which doesn't has no separator.
Comment on attachment 443044 [details] [diff] [review]
Part 4: Remove nsCommaSeparatedTokenizer typedef

r+ assuming you do what I suggested for part 3 and remove the PR_TRUE parameters here
Attachment #443044 - Flags: review?(roc) → review+
oh, never mind comment #28. I misunderstood.

OK, instead of a boolean parameter I'd rather have a flags word. Even though there's only one flag, it's easier to read code with stuff like nsCharSeparatedTokenizer::SEPARATOR_OPTIONAL than "PR_FALSE'.
+  for (PRInt32 i = 0; i < count; ++i) {
+    AppendElement(points.ObjectAt(i));
+  }

use AppendObjects?

+    NS_ConvertUTF16toUTF8 utf8Token(tokenizer.nextToken());
+    const char *token = utf8Token.get();
+    if (*token == '\0') {
+      return NS_ERROR_DOM_SYNTAX_ERR; // empty string (e.g. two commas in a row)
+    }
 
     char *end;
     vals[i] = float(PR_strtod(token, &end));

Don't we have the equivalent of PR_strtod for a UTF-16 string, so we can avoid UTF8 conversion here?
(In reply to comment #31)
> use AppendObjects?

We don't have that method available.  Note that this "AppendElement" method isn't on an nsXXXArray object -- rather, it's a nsSVGPointList helper-method.  It appends to a private nsTAutoArray, with a little bit of additional magic (it issues WillModify/DidModify notifications, & it adds us as an observer on the appended point)
See implementation:
http://mxr.mozilla.org/mozilla-central/source/content/svg/content/src/nsSVGPointList.cpp#104

(Note: this chunk is verbatim from the original code -- my patch only changes indentation.)

> +    NS_ConvertUTF16toUTF8 utf8Token(tokenizer.nextToken());
> Don't we have the equivalent of PR_strtod for a UTF-16 string, so we can avoid
> UTF8 conversion here?

I don't know -- do we?  If we do, our SVG code appears to shun it. :)  We currently convert to UTF8 before parsing floats in nsSMILParserUtils.cpp, in nsSVGLength2.cpp, and in code replaced by this patch (which uses calls to  ToNewUTF8String & ToNewCString to get out of UTF 16)
References:
http://mxr.mozilla.org/mozilla-central/source/content/svg/content/src/nsSVGLength2.cpp#171
http://mxr.mozilla.org/mozilla-central/source/content/smil/nsSMILParserUtils.cpp#444
Here's an updated "Part 3" using a flags word (which defaults to 0), as suggested in comment 30.
Attachment #443043 - Attachment is obsolete: true
Attachment #443043 - Flags: review?(roc)
Attachment #443234 - Flags: review?(roc)
Attachment #443234 - Attachment description: Part 3: Allow separator to be optional (using 'flags' word) → Part 3 v3: Allow separator to be optional (using 'flags' word)
Rebasing Part 4 off of the updated Part 3, using 'flags' word now instead of PR_TRUE/PR_FALSE. Carrying forward r=roc
Attachment #443044 - Attachment is obsolete: true
Attachment #443235 - Flags: review+
Rebasing Part 5 off of the updated Part 3, using 'flags' word now instead of
PR_TRUE/PR_FALSE.
Attachment #443045 - Attachment is obsolete: true
Attachment #443236 - Flags: review?(roc)
Attachment #443045 - Flags: review?(roc)
(In reply to comment #32)
> > Don't we have the equivalent of PR_strtod for a UTF-16 string, so we can avoid
> > UTF8 conversion here?
> 
> I don't know -- do we?  If we do, our SVG code appears to shun it. :)

Hm, it looks like nsAttrValue uses "PromiseFlatString(aString).ToFloat(&ec);" -- looking into the internals of what that does...
http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsAttrValue.cpp#1161
Nope, looks like "PromiseFlatString().ToFloat()" isn't what we want:
 - It's defined in nsStringObsolete.cpp (which sounds like it should be avoided)
 - It's implementation is just:
     {  return NS_LossyConvertUTF16toASCII(*this).ToFloat(aErrorCode); }
   and that "NS_LossyConvertUTF16toASCII" call is just a wrapper for NS_UTF16ToCString(). So it ends up converting out of UTF16 under the hood anyway.

References:
http://mxr.mozilla.org/mozilla-central/source/xpcom/string/src/nsStringObsolete.cpp#1134
http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsStringAPI.h#1027

Let me know if there's some other magic float-from-utf16 parsing API I could be using...
(In reply to comment #20)
> Note: For symmetry, I imagine we'll want to apply Part 2 (& Part 3?) to the
> other class in ns[Comma|Char]SeparatedTokenizer.h:
> "nsCCommaSeparatedTokenizer".

Here's the final patch, to genericize nsCCommaSeparatedTokenizer as well, for symmetry & so that class still matches the spirit of the file that it's in.

This just adds the choose-your-separator-char extension to the class, since that's trivial, but it doesn't add the SEPARATOR_OPTIONAL flag. (There's no reason to introduce that extra code & complexity if no one needs it.)
Attachment #443280 - Flags: review?(roc)
Attachment #443280 - Attachment description: Part 6: Fix → Part 6: Fix nsCCommaSeparatedTokenizer too
Depends on: 573489
You need to log in before you can comment on or make changes to this bug.