Closed Bug 785140 Opened 8 years ago Closed 6 years ago

NS_IsAsciiWhitespace in SVG code should probably be replaced with IsSVGWhitespace()

Categories

(Core :: SVG, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla27

People

(Reporter: dholbert, Assigned: longsonr)

References

Details

Attachments

(2 files, 7 obsolete files)

Noticed that we've got a number of "if (NS_IsAsciiWhitespace(*str))" checks in SVG code, which I think are mostly from https://hg.mozilla.org/mozilla-central/rev/78468c5c6a40 (bug 399642).

We should probably be using IsSVGWhitespace there instead.
In my defence IsSVGWhitespace came after bug 399642 ;-)

What we really need is some way to parse float values directly from an nsAString really to avoid all this dancing about. There's nsString::ToDouble but that expects only a number with no trailing units and anyway does an internal conversion first. nsCSSScanner has something but it's embedded in that class.
Attached patch whitespace.txt (obsolete) — Splinter Review
Assignee: nobody → longsonr
Attachment #798303 - Flags: review?(dholbert)
Comment on attachment 798303 [details] [diff] [review]
whitespace.txt

Review of attachment 798303 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/svg/content/src/SVGTransform.cpp
@@ +8,4 @@
>  #include "nsSVGAnimatedTransformList.h"
>  #include "nsSVGAttrTearoffTable.h"
> +#include "mozilla/dom/SVGMatrix.h"
> +#include "mozilla/dom/SVGTransform.h"

SVGTransform.h should remain first
Attached patch whitespace.txt (obsolete) — Splinter Review
Attachment #798303 - Attachment is obsolete: true
Attachment #798303 - Flags: review?(dholbert)
Attachment #798312 - Flags: review?(dholbert)
>+++ b/content/svg/content/src/SVGMotionSMILAnimationFunction.cpp
>@@ -1,23 +1,22 @@
[...]
> #include "mozilla/dom/SVGPathElement.h" // for nsSVGPathList
> #include "mozilla/dom/SVGMPathElement.h"
>-#include "nsAttrValueInlines.h"

This #include-removal triggers a FAIL_ON_WARNINGS build error on my machine:
{
nsAttrValue.h:186:18: error: inline function 'int16_t nsAttrValue::GetEnumValue() const' used but never defined [-Werror]
}

If I add it back, I can build successfully.  So: please leave that line in place. :)

>diff --git a/content/svg/content/src/nsSVGAngle.cpp b/content/svg/content/src/nsSVGAngle.cpp
> #include "mozilla/Util.h"
> 
> #include "nsSVGAngle.h"
[...]
>+#include "nsSVGAttrTearoffTable.h"
>+#include "nsTextFormatter.h"
>+#include "prdtoa.h"
> #include "SVGAngle.h"
> #include "SVGAnimatedAngle.h"
>+#include "mozilla/dom/SVGMarkerElement.h"
>+#include "SVGOrientSMILType.h"

Why did the "mozilla/dom/SVGMarkerElement.h" include end up at the bottom here? In earlier .cpp files in this patch (e.g. SVGAnimatedPreserveAspectRatio.cpp), you move "mozilla/" includes up to the top of the list (right after the .cpp file's own .h file). Seems like we should do the same here.

(Same in several later files, too)

That's all for now (just did a quick look-see); more comments (if any) coming on Tuesday, when I'm back at work.
(In reply to Daniel Holbert [:dholbert] from comment #5)
> This #include-removal triggers a FAIL_ON_WARNINGS build error on my machine:
> {
> nsAttrValue.h:186:18: error: inline function 'int16_t
> nsAttrValue::GetEnumValue() const' used but never defined [-Werror]
> }

Builds for me without this include. I'll try to figure out why.

> 
> >diff --git a/content/svg/content/src/nsSVGAngle.cpp b/content/svg/content/src/nsSVGAngle.cpp
> > #include "mozilla/Util.h"
> > 
> > #include "nsSVGAngle.h"
> [...]
> >+#include "nsSVGAttrTearoffTable.h"
> >+#include "nsTextFormatter.h"
> >+#include "prdtoa.h"
> > #include "SVGAngle.h"
> > #include "SVGAnimatedAngle.h"
> >+#include "mozilla/dom/SVGMarkerElement.h"
> >+#include "SVGOrientSMILType.h"
> 
> Why did the "mozilla/dom/SVGMarkerElement.h" include end up at the bottom
> here? In earlier .cpp files in this patch (e.g.
> SVGAnimatedPreserveAspectRatio.cpp), you move "mozilla/" includes up to the
> top of the list (right after the .cpp file's own .h file). Seems like we
> should do the same here.
> 

I put everything in alphabetical order of file name (ignoring any path). You want it in alphabetical order including the path?
The inlines issue must be because I'm making a debug build. It seems like Microsoft C++ ignores the inline keyword when you compile with debug and optimisations off. I'll put that back.
Attached patch address review comments (obsolete) — Splinter Review
Attachment #798312 - Attachment is obsolete: true
Attachment #798312 - Flags: review?(dholbert)
Attachment #798464 - Flags: review?(dholbert)
(In reply to Robert Longson from comment #6)
> I put everything in alphabetical order of file name (ignoring any path). You
> want it in alphabetical order including the path?

Maybe -- so, the SpiderMonkey coding style guide actually has /mozilla includes in a separate block, at the top of the file:  https://wiki.mozilla.org/JavaScript:SpiderMonkey:Coding_Style#.23include_ordering

The general Gecko coding style guide just says non-standard headers should just be "sorted alphabetically"; it doesn't mention whether that should take the directory-name into account, but I tend to think it should, or else it'll look pretty gross/arbitrary in files that #include headers from several different directories.  (Note also that the SpiderMonkey coding style guide is clearer (in its examples at least) that general includes are sorted *including the directory name* -- specifically this sample-code:
 #include "ds/Baz.h"
 #include "js/Bar.h"
 #include "vm/Bat.h"
We might as well try to be consistent with the JS code about that sorting behavior, in the absence of a clearly-defined rule saying the opposite.

(Sorry for being nitpicky; I know we definitely aren't very consistent about this across the codebase.  I'm only nitpicking about #include-ordering here because it's the majority of what this patch is changing, lines-of-code-wise.)

SO: I'd prefer either "mozilla/" includes at the top, just after the .cpp file's own header (approximately matching the SpiderMonkey coding style), or simply sorted *including* the directory (matching the general Coding Style, at least my understanding of it).
Comment on attachment 798464 [details] [diff] [review]
address review comments

On closer inspection, this doesn't make "mozilla/" ordering any less consistent than it already is, so feel free to disregard comment 9 if you like.

> static nsresult
> ToPreserveAspectRatio(const nsAString &aString,
>                       SVGPreserveAspectRatio *aValue)
> {
>-  if (aString.IsEmpty() || NS_IsAsciiWhitespace(aString[0])) {
>+  if (aString.IsEmpty() || IsSVGWhitespace(aString[0])) {
>     return NS_ERROR_DOM_SYNTAX_ERR;
>   }
> 
>   nsWhitespaceTokenizer tokenizer(aString);

So, this code is only partly fixed; we still use nsWhitespaceTokenizer, which is still broken -- it has its own isWhitespace method that appears equivalent to NS_IsAsciiWhitespace.

We probably need to make nsWhitespaceTokenizer more like nsCharSeparatedTokenizer -- we need it to take an "is whitespace" helper-function. Mind filing a followup on that?

>+++ b/content/svg/content/src/SVGMotionSMILAnimationFunction.cpp
[...]
>+++ b/content/svg/content/src/SVGTransform.cpp
[...]
>+++ b/content/svg/content/src/nsSVGViewBox.cpp

In these three files, you're just cleaning up #includes -- there's nothing IsAsciiWhitespace-related.  So, their changes don't really seem to belong in the same patch. Maybe fix those in a separate followup patch? (either as "(no bug)" or as a second patch on this bug)

r=me with that. Thanks!
Attachment #798464 - Flags: review?(dholbert) → review+
Attachment #800206 - Flags: review?(dholbert)
Comment on attachment 800206 [details] [diff] [review]
try to keep everything in the one bug

Please split out the changes to nsWhitespaceTokenizer.h into a separate patch, to be landed as a first (non-functional) patch on this bug.  The rest of the changes including the changes to clients of nsWhitespaceTokenizer) can then layer on top of it.

In any case, the nsWhitespaceTokenizer.h changes look good to me! Once it's split out into its own patch, you should have someone with more ownership over that file review it. (perhaps sicking, who originally wrote it? or smaug, who co-reviewed (with sicking) the corresponding patch to nsCharSeparatedTokenizer, in http://hg.mozilla.org/mozilla-central/rev/a7a688b6e106 )
Attachment #800206 - Flags: review?(dholbert) → feedback+
(Also: While you're fixing nsWhitespaceTokenizer and nsCWhitespaceTokenizer, it might be worth similarly fixing nsCCharSeparatedTokenizer.)
Attachment #800826 - Attachment description: part 1 improve nsWhitespaceTokenizer and nsCharSeparatedTokenizer → part 1 - improve nsWhitespaceTokenizer and nsCharSeparatedTokenizer
Attachment #800826 - Flags: review?(bugs)
RE part 1: To be clear: the new ns[C]WhitespaceTokenizer::lastTokenEndedWithWhitespace will be trivially true for all tokens except for the final one, right? It's probably worth adding a comment (in both WhitespaceTokenizer classes) to mention that. It's kind of obvious, but it's obvious in a way that makes the reader wonder if there's something subtle that he's missing.

RE part 2:

>@@ -167,17 +164,18 @@ ToPreserveAspectRatio(const nsAString &a
[...]
>-  if (tokenizer.hasMoreTokens()) {
>+  if (tokenizer.lastTokenEndedWithWhitespace() ||
>+      tokenizer.hasMoreTokens()) {
>     return NS_ERROR_DOM_SYNTAX_ERR;
>   }

So, this is actually a substantial behavior-change, beyond just "are we using the right whitespace function".

From my local testing, it looks like we currently allow trailing whitespace (i.e. we accept |preserveAspectRatio="none "|), though we reject leading whitespace.  You're making us consistent on this, which is great.

It'd be worth adding a regression test to demonstrate that change in behavior and keep us from regressing on that, though. Maybe by adding a few lines to content/svg/content/test/test_dataTypes.html?  (That's the closest to an exhaustive preserveAspectRatio-validity-test that I could find, from a quick skim of the tree.)
Attached patch part 3 - tests (obsolete) — Splinter Review
Attachment #801127 - Flags: review?(dholbert)
Comment on attachment 801127 [details] [diff] [review]
part 3 - tests

Look great, thanks! Just one nit:

>+  marker.setAttribute("preserveAspectRatio", "none "); // invalid, space at end
>+  is(basePreserveAspectRatio.align, SVGPreserveAspectRatio.SVG_PRESERVEASPECTRATIO_XMIDYMID,
>+     "default preserveAspectRatio attribute");

We should also test that " none" is rejected (w/ space at beginning).

r=me with that.

(also, an aside: so now this test will be using named constants instead of raw numerals, which is great for readability. I wonder if it's still worth having a test for the numeric values of these enums, though, e.g. checks like is(CONSTANT, 1, "constant should be 1").   Unlikely to break, but possibly worth testing, because there's surely web content out there using the numeric values, and we want to be aware that we can & will break such content if we ever adjust these constants' values. In any case, that doesn't have to happen here; just thinking out loud.)
Attachment #801127 - Flags: review?(dholbert) → review+
Attached patch part 2 - testsSplinter Review
address review comments
Attachment #801127 - Attachment is obsolete: true
Attachment #798464 - Attachment is obsolete: true
Comment on attachment 800826 [details] [diff] [review]
part 1 - improve nsWhitespaceTokenizer and nsCharSeparatedTokenizer

It is not clear to me why we have SVG specific white space handling.
Why we handle 0x000C in HTML but not in SVG?

Would be good to hear Henri's comments.

This patch is not exactly about that issue, but the bug is.
Attachment #800826 - Flags: review?(bugs) → review?(hsivonen)
I raised https://www.w3.org/Graphics/SVG/WG/track/issues/2449 to have SVG 2 include U+000C in its white space characters.
We can always raise a follow up if Cameron's suggestion gets traction in SVG2.
Comment on attachment 800826 [details] [diff] [review]
part 1 - improve nsWhitespaceTokenizer and nsCharSeparatedTokenizer

Unsetting review, since I'm not a reviewer for /xpcom.

(In reply to Olli Pettay [:smaug] from comment #20)
> It is not clear to me why we have SVG specific white space handling.
> Why we handle 0x000C in HTML but not in SVG?
> 
> Would be good to hear Henri's comments.

The reason for the difference is that SVG is of XML heritage and U+000C is not a whitespace character in XML, but it is a whitespace character in HTML for legacy reasons.  I'm unhappy about HTML not matching XML here, but it's not worthwhile to try to fight the legacy on the HTML side and XML can't change due to Draconian error handling being incompatible with any changes (i.e. no changes until/unless we do XML5/XML-ER).

The notion of whitespace on the SVG layer does not have to match the notion of whitespace on the XML layer, so SVG could align with HTML and CSS. I don't have a strong opinion on whether SVG code in Gecko should align with HTML and CSS proactively ahead of the SVG spec.
Attachment #800826 - Flags: review?(hsivonen)
Comment on attachment 800826 [details] [diff] [review]
part 1 - improve nsWhitespaceTokenizer and nsCharSeparatedTokenizer

Back to Olli given that Henri has commented.
Attachment #800826 - Flags: review?(bugs)
/me was hoping Henri to review this ;) But ok...
Comment on attachment 800826 [details] [diff] [review]
part 1 - improve nsWhitespaceTokenizer and nsCharSeparatedTokenizer

>+    // Flags -- only one for now. If we need more, they should be defined to
>+    // be 1<<1, 1<<2, etc. (They're masks, and aFlags/mFlags are bitfields.)
Nit, space before and after <<


>     bool hasMoreTokens()
>     {
>+        NS_ASSERTION(mIter == mEnd || !IsWhitespace(*mIter),
>+                     "Should be at beginning of token if there is one");
MOZ_ASSERT


>     const nsDependentCSubstring nextToken()
>     {
>         mozilla::RangedPtr<const char> tokenStart = mIter, tokenEnd = mIter;
> 
>-        // Search until we hit separator or end.
>+        NS_ASSERTION(mIter == mEnd || !IsWhitespace(*mIter),
>+                     "Should be at beginning of token if there is one");
MOZ_ASSERT


>+
>+        // Search until we hit separator or end (or whitespace, if separator
>+        // isn't required -- see clause with 'break' below).
a separator

>         while (mIter < mEnd && *mIter != mSeparatorChar) {
>+          // Skip to end of current word.
Should this be
Skip to the end of the current word.

>           while (mIter < mEnd &&
>-                 !isWhitespace(*mIter) && *mIter != mSeparatorChar) {
>+                 !IsWhitespace(*mIter) && *mIter != mSeparatorChar) {
>               ++mIter;
>           }
>           tokenEnd = mIter;
> 
>-          while (mIter < mEnd && isWhitespace(*mIter)) {
>+          // Skip whitespace after current word.
the current, right?

>+          mLastTokenEndedWithWhitespace = false;
>+          while (mIter < mEnd && IsWhitespace(*mIter)) {
>+              mLastTokenEndedWithWhitespace = true;
>               ++mIter;
>           }
>+          if (mFlags & SEPARATOR_OPTIONAL) {


>+            // We've hit (and skipped) whitespace, and that's sufficient to end
>+            // our token, regardless of whether we've reached a SeparatorChar.
>+            break;
>+          } // (else, we'll keep looping until we hit mEnd or SeparatorChar)
>         }
> 
>-        // Skip separator (and any whitespace after it).
>-        if (mIter < mEnd) {
>-            NS_ASSERTION(*mIter == mSeparatorChar, "Ended loop too soon");
>+        mLastTokenEndedWithSeparator = (mIter != mEnd &&
>+                                        *mIter == mSeparatorChar);
so mLastTokenEndedWithSeparator doesn't actually mean that, since white spaces have been
filtered out already.

Perhaps the mLast* variables should be
mSpaceAfterLastToken and mSeparatorAfterLastToken or some such.
Thought, it is not clear to me what behavior is wanted for space.
The class needs some comments about these various boolean flags


>     mozilla::RangedPtr<const char> mIter;
>     const mozilla::RangedPtr<const char> mEnd;
>+    bool mFirstTokenBeganWithWhitespace;
>+    bool mLastTokenEndedWithWhitespace;
>+    bool mLastTokenEndedWithSeparator;
>     char mSeparatorChar;
>+    uint32_t mFlags;
Why flags? Why not just bool since you anyway have just one flag.

>     bool hasMoreTokens()
>     {
>         return mIter < mEnd;
>     }
> 
>+    bool firstTokenBeganWithWhitespace() const
>+    {
>+        return mFirstTokenBeganWithWhitespace;
>+    }
This needs also some comment, especially that hasMoreTokens() may always return false, yet this returns true if there is just whitespace


>+    bool firstTokenBeganWithWhitespace() const
>+    {
>+        return mFirstTokenBeganWithWhitespace;
>+    }
ditto
Attachment #800826 - Flags: review?(bugs) → review-
Address review comments. I can't really split it up into parts any more now that the method names are changing.
Attachment #800826 - Attachment is obsolete: true
Attachment #800827 - Attachment is obsolete: true
Attachment #805215 - Flags: review?(bugs)
Attachment #801214 - Attachment description: part 3 - tests → part 2 - tests
Comment on attachment 805215 [details] [diff] [review]
part 1 - improve whitespace handling


>-          if (mFlags & SEPARATOR_OPTIONAL) {
>+          if (mSeparatorOptional) {
>             // We've hit (and skipped) whitespace, and that's sufficient to end
>             // our token, regardless of whether we've reached a SeparatorChar.
>             break;
>           } // (else, we'll keep looping until we hit mEnd or SeparatorChar)
>         }
> 
>-        mLastTokenEndedWithSeparator = (mIter != mEnd &&
>+        mSeparatorAfterCurrentToken = (mIter != mEnd &&
>                                         *mIter == mSeparatorChar);
align

>-        NS_ASSERTION((mFlags & SEPARATOR_OPTIONAL) ||
>-                     (mLastTokenEndedWithSeparator == (mIter < mEnd)),
>+        MOZ_ASSERT(mSeparatorOptional ||
>+                     (mSeparatorAfterCurrentToken == (mIter < mEnd)),
ditto

>                      "If we require a separator and haven't hit the end of "
>                      "our string, then we shouldn't have left the loop "
>                      "unless we hit a separator");
> 
>         // Skip separator (and any whitespace after it), if we're at one.
>-        if (mLastTokenEndedWithSeparator) {
>+        if (mSeparatorAfterCurrentToken) {
>             ++mIter;
> 
>             while (mIter < mEnd && IsWhitespace(*mIter)) {
>                 ++mIter;
>             }
>         }
so "foo, bar" leaves whitespaceAfterCurrentToken to false.
Doesn't look right to me.

>+        // Skip separator (and any whitespace after it), if we're at one.
>+        if (mSeparatorAfterCurrentToken) {
>             ++mIter;
> 
>-            while (mIter < mEnd && isWhitespace(*mIter)) {
>+            while (mIter < mEnd && IsWhitespace(*mIter)) {
>                 ++mIter;
>             }
Same here

Btw, could you file a followup to make these tokenizers to use the same base class.
(Couldn't we typedef types of few member variables)


>+    nsCCharSeparatedTokenizer(const nsCSubstring& aSource,
>+                              PRUnichar aSeparatorChar,
const char
Attachment #805215 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #28)
> so "foo, bar" leaves whitespaceAfterCurrentToken to false.
> Doesn't look right to me.

I believe that's the intended behavior. "foo , bar" has whitespace after the first token, but your example "foo, bar" does not.  It technically has whitespace at the *beginning* of the *second* token - after the separator.

(Looks like we don't have an API to tell us that -- i.e. to let us distinguish between "foo,bar" and "foo, bar" -- probably because there aren't any parsers that care & that we implement with this code.)
Well, then the comment about whitespaceAfterCurrentToken needs to be clarified.
It doesn't matter which of the two possibilities we implement; we're really only interested in two conditions:

a) nothing at all is allowed after the last token
b) whitespace is allowed after the last token but not a separator

So a) involves both tests and b) just the separator test

That means that whether you adjust the logic or not makes no difference to any of the existing code.
I made the change suggested in comment 28 about whitespace handling as it makes no difference to any existing usage

https://tbpl.mozilla.org/?tree=Try&rev=c74df935c669
Flags: in-testsuite+
https://hg.mozilla.org/mozilla-central/rev/0de9b1dfddb4
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
(In reply to Cameron McCormack (:heycam) from comment #21)
> I raised https://www.w3.org/Graphics/SVG/WG/track/issues/2449 to have SVG 2
> include U+000C in its white space characters.

The WG resolved last week to treat U+000C in attributes as white space.
http://www.w3.org/2013/09/19-svg-minutes.html#item06
You need to log in before you can comment on or make changes to this bug.