Last Comment Bug 829816 - css3-syntax says U+0000 and \0 should map to U+FFFD
: css3-syntax says U+0000 and \0 should map to U+FFFD
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla25
Assigned To: Corey Ford [:coyotebush]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-11 15:33 PST by Zack Weinberg (:zwol)
Modified: 2013-06-27 03:48 PDT (History)
2 users (show)
ryanvm: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
various occurrences handled separately (9.19 KB, patch)
2013-06-19 13:30 PDT, Corey Ford [:coyotebush]
no flags Details | Diff | Review
WIP updated patch (15.24 KB, patch)
2013-06-20 11:12 PDT, Corey Ford [:coyotebush]
no flags Details | Diff | Review
Revised patch with updated tests (15.05 KB, patch)
2013-06-20 16:49 PDT, Corey Ford [:coyotebush]
dbaron: review+
Details | Diff | Review
Further revised patch (16.55 KB, patch)
2013-06-21 09:54 PDT, Corey Ford [:coyotebush]
dbaron: review+
Details | Diff | Review
With r= message (16.56 KB, patch)
2013-06-26 15:19 PDT, Corey Ford [:coyotebush]
no flags Details | Diff | Review

Description Zack Weinberg (:zwol) 2013-01-11 15:33:34 PST
The handling of U+0000 (whether or not it resulted from an escape sequence) was explicitly undefined in CSS 2.1.  The current draft of css3-syntax says that it should be mapped to U+FFFD REPLACEMENT CHARACTER whenever it appears: see
<http://dev.w3.org/csswg/css3-syntax/#preprocessing-the-input-stream> and <http://dev.w3.org/csswg/css3-syntax/#consume-an-escaped-character>.

Because of bug 228856, we currently treat \ followed by one to six zeros as having the effect of appending that many zeros to the current token (as \ does for non-hexadecimal characters).  Literal U+0000 is currently treated as a normal SYMBOL token.  We should either change what we do or get the css3-syntax draft changed.  I have a slight preference for changing what we do; it would simplify the escape parser and does not reintroduce bug 228856.
Comment 1 Zack Weinberg (:zwol) 2013-01-11 15:34:18 PST
I do not anticipate compatibility problems if we change, due to the explicit undefinedness in css2.1.
Comment 2 Corey Ford [:coyotebush] 2013-06-17 16:06:24 PDT
Looking into this.
Comment 3 Zack Weinberg (:zwol) 2013-06-17 18:00:14 PDT
Should probably not do anything till css3-syntax hits CR.
Comment 4 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-17 18:02:59 PDT
I think changing this now is fine; the working group discussed it the week before last in Tokyo and agreed to the change.
Comment 5 Corey Ford [:coyotebush] 2013-06-19 09:35:23 PDT
I have a patch for the \0 case, but U+0000 is decidedly trickier given that we don't explicitly do the sort of preprocessing contemplated in the spec.

U+FFFD should be acceptable inside strings/identifiers/URLs, thus U+0000 should too, unless I'm missing something. So, it looks like the substitution ought to be handled either directly in nsCSSScanner::Peek, or at each of the several points where characters are placed into the token (GatherEscape and GatherText in particular).
Comment 6 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-19 09:47:06 PDT
I think there are a very small number of places where we can accept a U+0000 literal:  as a token on its own (at the end of the fallback path), and then as part of a string, URL token, or identifier.
Comment 7 Corey Ford [:coyotebush] 2013-06-19 10:07:13 PDT
Yes, plus the case of backslash-U+0000. And IsOpenCharClass needs to accept it as a string/identifier/URL character. It seems a bit more fragile to handle the substitution in 3-4 places, though complicating Peek doesn't seem ideal either.
Comment 8 Simon Sapin (:SimonSapin) 2013-06-19 10:20:17 PDT
Note that changing U+0000 to U+FFFD makes it a "name character" (being non-ASCII) unlike other control characters like U+0001. Name characters outside of quoted strings are (roughly) what identifiers, at-keywords, hash and units are made of. So for example, a U+0000 (transformed to U+FFFD) on its own would be an <ident>, not a <delim>.

I understand this can be annoying to implement without a pre-processing step, and I’m not opposed to changing the spec if there is an alternative proposal. I think the idea was not to have any NULLs in stylesheets, for implementations that use it as a string terminator.
Comment 9 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-19 13:27:55 PDT
I guess the identifier part is trickier.  Perhaps it makes sense to change the escape handling first and worry about the rest later?
Comment 10 Corey Ford [:coyotebush] 2013-06-19 13:30:55 PDT
Created attachment 764971 [details] [diff] [review]
various occurrences handled separately

Taking the handle-it-in-several-places approach, the full patch might go something like this.
Comment 11 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-19 14:26:00 PDT
Comment on attachment 764971 [details] [diff] [review]
various occurrences handled separately

So when you post patches for review, they should start with both committer information (i.e., your name/email as you'd like to be credited, which need not be your @mozilla.com email, though can be) and a commit message describing the change.  See https://developer.mozilla.org/en-US/docs/Creating_a_patch_that_can_be_checked_in and perhaps also https://hg.mozilla.org/mozilla-central/ (although not all commit messages are good -- they should in general describe what's changing, and it's important that the summary be on the first line, and more detailed information can follow, such as in https://hg.mozilla.org/mozilla-central/rev/830111e10951 ).

Also, if you want somebody to review the patch, you should request that explicitly using the review flags.

>diff --git a/layout/reftests/bugs/reftest.list b/layout/reftests/bugs/reftest.list
>--- a/layout/reftests/bugs/reftest.list
>+++ b/layout/reftests/bugs/reftest.list
>@@ -198,18 +198,18 @@ skip-if(B2G) fails-if(Android) != 192767
> == 212563-1.html 212563-1-ref.html
> == 212563-2.html 212563-2-ref.html
> == 213834-1.html 213834-1-ref.html
> == 214077-1a.html 214077-1-ref.html
> == 214077-1b.html 214077-1-ref.html
> == 218473-1.html 218473-1-ref.html
> == 220165-1.svg 220165-1-ref.svg
> == 223809-1.html 223809-1-ref.html
>-== 228856-1.html 228856-1-ref.html
>-== 228856-2.html 228856-2-ref.html
>+# == 228856-1.html 228856-1-ref.html
>+# == 228856-2.html 228856-2-ref.html

You should adjust the reference instead of commenting out the test.

nsCSSScanner.cpp:

>+static const int32_t REPLACEMENT_CHAR = 0xFFFD;

probably better to use the existing macro at http://hg.mozilla.org/mozilla-central/file/c12150cfdfef/xpcom/string/public/nsCharTraits.h#l88

> /**
>  * True if 'ch' is in character class 'cls', which should be one of
>  * the constants above or some combination of them.  All characters
>- * above U+007F are considered to be in 'cls'.  EOF is never in 'cls'.
>+ * above U+007F are considered to be in 'cls', as is U+0000 because
>+ * it is treated as U+FFFD.  EOF is never in 'cls'.
>  */
> static inline bool
> IsOpenCharClass(int32_t ch, uint8_t cls) {
>-  return ch >= 0 && (ch >= 128 || (gLexTable[ch] & cls) != 0);
>+  return ch >= 0 && (ch == 0 || ch >= 128 || (gLexTable[ch] & cls) != 0);
> }

I think it's preferable to adjust gLexTable for the adjustments you need; this seems to be U+0000 in all char classes, which I don't think we want.

>@@ -579,23 +586,21 @@ nsCSSScanner::GatherEscape(nsString& aOu
>   int i = 0;
>   do {
>     val = val * 16 + HexDigitValue(ch);
>     i++;
>     Advance();
>     ch = Peek();
>   } while (i < 6 && IsHexDigit(ch));
> 
>-  // Silently deleting \0 opens a content-filtration loophole (see
>-  // bug 228856), so what we do instead is pretend the "cancels the
>-  // meaning of special characters" rule applied.
>+  // "Interpret the hex digits as a hexadecimal number. If this number is zero,
>+  // or is greater than the maximum allowed codepoint, return U+FFFD
>+  // REPLACEMENT CHARACTER" -- CSS Syntax Level 3
>   if (MOZ_UNLIKELY(val == 0)) {
>-    do {
>-      aOutput.Append('0');
>-    } while (--i);
>+    aOutput.Append(REPLACEMENT_CHAR);

You can entirely remove |i| here.

>   uint32_t start = mOffset;
>   bool inString = aClass == IS_STRING;
> 
>   for (;;) {
>-    // Consume runs of unescaped characters in one go.
>-    uint32_t n = mOffset;
>-    while (n < mCount && IsOpenCharClass(mBuffer[n], aClass)) {
>-      n++;
>+    int32_t ch = Peek();
>+    while (ch >= 0 && IsOpenCharClass(ch, aClass)) {
>+      if (ch == 0) {
>+        aText.Append(REPLACEMENT_CHAR);
>+      } else {
>+        aText.Append(ch);
>+      }
>+      Advance();
>+      ch = Peek();
>     }
>-    if (n > mOffset) {
>-      aText.Append(&mBuffer[mOffset], n - mOffset);
>-      mOffset = n;
>-    }
>-    if (n == mCount) {
>+    if (ch < 0) {
>       break;
>     }
> 
>-    int32_t ch = Peek();
>+    ch = Peek();
>     MOZ_ASSERT(!IsOpenCharClass(ch, aClass),
>                "should not have exited the inner loop");
> 

So I didn't look into this too closely - however, I think the structure is probably wrong.  The while loop at the top is intentionally the "fast path".  I think it's better to make the U+0000 handling part of the slow path below it rather than slowing the fast path down.

>diff --git a/layout/style/test/file_bug829816.css b/layout/style/test/file_bug829816.css
>new file mode 100644
>index 0000000000000000000000000000000000000000..8f12ba6f56396b3f2560d18c380280658ddb45b0
>GIT binary patch
>literal 76
>xc$`a8s8&eM&nrpIE3r~gVo<UM@=Af+BHg0Y;#8m*gF-D=5{jZ2EUIEa$^e#}7kvN#
>

Hrmmph.  I guess I should apply this at some point to see what lives here, but I haven't yet.


>diff --git a/layout/style/test/test_bug829816.html b/layout/style/test/test_bug829816.html
>new file mode 100644
>--- /dev/null
>+++ b/layout/style/test/test_bug829816.html
>@@ -0,0 +1,56 @@
>+<!DOCTYPE HTML>
>+<html>
>+<!--
>+https://bugzilla.mozilla.org/show_bug.cgi?id=829816
>+-->
>+<head>
>+  <meta charset="utf-8">
>+  <title>Test for Bug 829816</title>
>+  <script type="application/javascript" src="/tests/SimpleTest/SimpleTest.js"></script>
>+  <link rel="stylesheet" type="text/css" href="/tests/SimpleTest/test.css"/>
>+
>+  <style type="text/css">
>+    b { content: "\0";      counter-reset: \0      }
>+    b { content: "\00";     counter-reset: \00     }
>+    b { content: "\000";    counter-reset: \000    }
>+    b { content: "\0000";   counter-reset: \0000   }
>+    b { content: "\00000";  counter-reset: \00000  }
>+    b { content: "\000000"; counter-reset: \000000 }
>+  </style>
>+
>+  <!-- U+0000 characters in <style> will already be replaced by the HTML parser (?) -->
>+  <link rel="stylesheet" type="text/css" href="file_bug829816.css"/>
>+
>+  <script type="application/javascript">
>+
>+  /** Test for Bug 829816 **/
>+  var ss = document.styleSheets[1];
>+
>+  for (var i = 0; i < 6; i++) {
>+    is(ss.cssRules[i].style.content, "\"\uFFFD\"",
>+        "\\0 in strings should be converted to U+FFFD");
>+    is(ss.cssRules[i].style.counterReset, "\uFFFD",
>+        "\\0 in identifiers should be converted to U+FFFD");
>+  }
>+
>+  is(document.styleSheets[2].cssRules[0].style.content, "\"\uFFFD\"",
>+      "U+0000 in strings should be converted to U+FFFD");
>+  is(document.styleSheets[2].cssRules[0].style.counterReset, "\uFFFD",
>+      "U+0000 in identifiers should be converted to U+FFFD");
>+  is(document.styleSheets[2].cssRules[1].style.content, "\"\uFFFD\"",
>+      "U+0000 in strings should be converted to U+FFFD");
>+  is(document.styleSheets[2].cssRules[1].style.counterReset, "\uFFFD",
>+      "U+0000 in identifiers should be converted to U+FFFD");
>+
>+

This works, though it might be somewhat cleaner to create the style sheet in script too, so that you don't have to edit two different parts of the file to make a test.  That said, this is ok.
Comment 12 Corey Ford [:coyotebush] 2013-06-19 14:55:04 PDT
(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #11)
> So when you post patches for review, they should start with both committer
> information (i.e., your name/email as you'd like to be credited, which need
> not be your @mozilla.com email, though can be) and a commit message
> describing the change.

I'll mess around with HG a bit more.

> You should adjust the reference instead of commenting out the test.

Okay, although we're basically doing away with the behavior from bug 228856 here.

> I think it's preferable to adjust gLexTable for the adjustments you need;
> this seems to be U+0000 in all char classes, which I don't think we want.

That would also affect IsClosedCharClass in theory, though that's never used for the character classes in question anyway. I'll change that.

> You can entirely remove |i| here.

Still used in the previous hex-digit-gathering do-while loop.

> So I didn't look into this too closely - however, I think the structure is
> probably wrong.  The while loop at the top is intentionally the "fast path".
> I think it's better to make the U+0000 handling part of the slow path below
> it rather than slowing the fast path down.

Right, I didn't like slowing that down. Though, now that we've let U+0000 into the various character classes, the fast path while loop can't use IsOpenCharClass alone.

> Hrmmph.  I guess I should apply this at some point to see what lives here,
> but I haven't yet.

NULs are tricky that way.

> This works, though it might be somewhat cleaner to create the style sheet in
> script too, so that you don't have to edit two different parts of the file
> to make a test.  That said, this is ok.

Do you happen to know an existing test that does it the way you're suggesting?

Thanks for the review :)
Comment 13 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-19 15:13:41 PDT
(In reply to Corey Ford from comment #12)
> > This works, though it might be somewhat cleaner to create the style sheet in
> > script too, so that you don't have to edit two different parts of the file
> > to make a test.  That said, this is ok.
> 
> Do you happen to know an existing test that does it the way you're
> suggesting?

simpler:               layout/style/test/test_parse_url.html
a little more complex: layout/style/test/test_page_rule.html
Comment 14 Corey Ford [:coyotebush] 2013-06-19 17:06:13 PDT
What should we do with layout/reftests/bugs/228856-2.html (also previously discussed in bug 566280)? Now that the HTML and CSS parsers are back to agreeing on their treatment of U+0000 (and \0, &0;) certain of those selectors should in fact be expected to match. Perhaps even more than I currently see matching.
Comment 15 Corey Ford [:coyotebush] 2013-06-19 17:21:08 PDT
Also, am I correct that http://hg.mozilla.org/mozilla-central/file/8ea92aeab783/layout/style/nsCSSScanner.cpp#l669 doesn't need to special-case U+0000, since it seems it's only reached in the case of an invalid escape sequence (so ought to only be considering a backslash there)?
Comment 16 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-19 17:43:33 PDT
(In reply to Corey Ford from comment #14)
> What should we do with layout/reftests/bugs/228856-2.html (also previously
> discussed in bug 566280)? Now that the HTML and CSS parsers are back to
> agreeing on their treatment of U+0000 (and \0, &0;) certain of those
> selectors should in fact be expected to match. Perhaps even more than I
> currently see matching.

Oh, 228856-2 is harder than 228856-1.  Probably change the selectors that should match to be color:green, and change the reference to match?

(In reply to Corey Ford from comment #15)
> Also, am I correct that
> http://hg.mozilla.org/mozilla-central/file/8ea92aeab783/layout/style/
> nsCSSScanner.cpp#l669 doesn't need to special-case U+0000, since it seems
> it's only reached in the case of an invalid escape sequence (so ought to
> only be considering a backslash there)?

It's not clear to me if that if is even reachable.  Do you see when we'd hit it?
Comment 17 Corey Ford [:coyotebush] 2013-06-19 17:54:51 PDT
(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #16)
> (In reply to Corey Ford from comment #15)
> > Also, am I correct that
> > http://hg.mozilla.org/mozilla-central/file/8ea92aeab783/layout/style/
> > nsCSSScanner.cpp#l669 doesn't need to special-case U+0000, since it seems
> > it's only reached in the case of an invalid escape sequence (so ought to
> > only be considering a backslash there)?
> 
> It's not clear to me if that if is even reachable.  Do you see when we'd hit
> it?

When an (apparent) identifier starts with an invalid escape sequence, which appears to mean backslash-newline outside of a string (the only way GatherEscape returns false). Confirmed that it does hit that line, resulting in a backslash symbol.
Comment 18 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-19 18:00:51 PDT
Then it sounds like you don't need a special case there.
Comment 19 Corey Ford [:coyotebush] 2013-06-20 09:49:02 PDT
(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #11)
> > /**
> >  * True if 'ch' is in character class 'cls', which should be one of
> >  * the constants above or some combination of them.  All characters
> >- * above U+007F are considered to be in 'cls'.  EOF is never in 'cls'.
> >+ * above U+007F are considered to be in 'cls', as is U+0000 because
> >+ * it is treated as U+FFFD.  EOF is never in 'cls'.
> >  */
> > static inline bool
> > IsOpenCharClass(int32_t ch, uint8_t cls) {
> >-  return ch >= 0 && (ch >= 128 || (gLexTable[ch] & cls) != 0);
> >+  return ch >= 0 && (ch == 0 || ch >= 128 || (gLexTable[ch] & cls) != 0);
> > }
> 
> I think it's preferable to adjust gLexTable for the adjustments you need;
> this seems to be U+0000 in all char classes, which I don't think we want.

On further thought, this way actually reads better to me, because it handles U+0000 and non-ASCII with the same logic. But I'm pretty certain that gLexTable[0] == SUIJ is functionally equivalent, so it's a minor point.

(In reply to Corey Ford from comment #15)
> Also, am I correct that
> http://hg.mozilla.org/mozilla-central/file/8ea92aeab783/layout/style/
> nsCSSScanner.cpp#l669 doesn't need to special-case U+0000, since it seems
> it's only reached in the case of an invalid escape sequence (so ought to
> only be considering a backslash there)?

Might it be prudent to add an assertion there to clarify/ensure that it really is always a backslash? I suppose that's kind of out of scope of this bug, though.
Comment 20 Zack Weinberg (:zwol) 2013-06-20 10:30:43 PDT
(In reply to Corey Ford from comment #19)
> > > static inline bool
> > > IsOpenCharClass(int32_t ch, uint8_t cls) {
> > >-  return ch >= 0 && (ch >= 128 || (gLexTable[ch] & cls) != 0);
> > >+  return ch >= 0 && (ch == 0 || ch >= 128 || (gLexTable[ch] & cls) != 0);
> > > }
> > 
> > I think it's preferable to adjust gLexTable for the adjustments you need;
> > this seems to be U+0000 in all char classes, which I don't think we want.
> 
> On further thought, this way actually reads better to me, because it handles
> U+0000 and non-ASCII with the same logic. But I'm pretty certain that
> gLexTable[0] == SUIJ is functionally equivalent, so it's a minor point.

I think neither of these is really the right approach. U+0000 will basically never occur in practice, but you've completely removed the fast path in GatherText for its sake.  Instead, I think it would be better to leave gLexTable and IsOpenCharClass unmodified (so U+0000 continues to be treated as not in any character class) and then modify the *slow* path in GatherText to handle it.  Something like this:

  for (;;) {
    // Consume runs of unescaped characters in one go.
+   // Assumes '\\' and '\0' are not in any character class.
    uint32_t n = mOffset;
    while (n < mCount && IsOpenCharClass(mBuffer[n], aClass)) {
      n++;
    }
    if (n > mOffset) {
      aText.Append(&mBuffer[mOffset], n - mOffset);
      mOffset = n;
    }
    if (n == mCount) {
      break;
    }

    int32_t ch = Peek();
    MOZ_ASSERT(!IsOpenCharClass(ch, aClass),
               "should not have exited the inner loop");

+   if (ch == '\0') {
+     // U+0000 is mapped to U+FFFD, which is valid in
+     // any context for which this function is used.
+     Advance();
+     aText.Append(REPLACEMENT_CHAR);
+     continue;
+   }
    if (ch != '\\') {
      break;
    }
    if (!GatherEscape(aText, inString)) {
      break;
    }
  }

That plus what you've done to GatherEscape plus a special case at the very bottom of Next() ...

   // Otherwise, a symbol (DELIM).
-  aToken.mSymbol = ch;
+  if (ch == '\0') {
+    aToken.mSymbol = REPLACEMENT_CHAR;
+  } else {
+    aToken.mSymbol = ch;
+  }
   Advance();
   return true;
 }

... should be all that is required.  (But unfortunately I don't have time right now to go through the scanner carefully and make sure of that.)
Comment 21 Corey Ford [:coyotebush] 2013-06-20 10:43:30 PDT
(In reply to Zack Weinberg (:zwol) from comment #20)
> I think neither of these is really the right approach. U+0000 will basically
> never occur in practice, but you've completely removed the fast path in
> GatherText for its sake.  Instead, I think it would be better to leave
> gLexTable and IsOpenCharClass unmodified (so U+0000 continues to be treated
> as not in any character class) and then modify the *slow* path in GatherText
> to handle it.

The problem with that is that IsIdentChar/IsIdentStart/StartsIdent, which are used in several places, should return true for U+0000. That is, if U+0000 occurs after a #, @, number, or on its own in Next, it should be treated as the start of an identifier, not as a symbol.

> +   if (ch == '\0') {
> +     // U+0000 is mapped to U+FFFD, which is valid in
> +     // any context for which this function is used.
> +     Advance();
> +     aText.Append(REPLACEMENT_CHAR);
> +     continue;
> +   }

Yes, that's basically what I'm working with now.
Comment 22 Zack Weinberg (:zwol) 2013-06-20 10:58:57 PDT
(In reply to Corey Ford from comment #21)
> The problem with that is that IsIdentChar/IsIdentStart/StartsIdent, which
> are used in several places, should return true for U+0000. That is, if
> U+0000 occurs after a #, @, number, or on its own in Next, it should be
> treated as the start of an identifier, not as a symbol.

If you special case U+0000 in StartsIdent, what still falls through the cracks?
Comment 23 Corey Ford [:coyotebush] 2013-06-20 11:08:18 PDT
(In reply to Zack Weinberg (:zwol) from comment #22)
> (In reply to Corey Ford from comment #21)
> > The problem with that is that IsIdentChar/IsIdentStart/StartsIdent, which
> > are used in several places, should return true for U+0000. That is, if
> > U+0000 occurs after a #, @, number, or on its own in Next, it should be
> > treated as the start of an identifier, not as a symbol.
> 
> If you special case U+0000 in StartsIdent, what still falls through the
> cracks?

Next[1] and ScanHash[2], I believe. The logic in Next could probably be reworked to use StartsIdent instead of IsIdentStart, and maybe ScanHash could too.

1: http://hg.mozilla.org/mozilla-central/file/8ea92aeab783/layout/style/nsCSSScanner.cpp#l1112
2: http://hg.mozilla.org/mozilla-central/file/8ea92aeab783/layout/style/nsCSSScanner.cpp#l724
Comment 24 Corey Ford [:coyotebush] 2013-06-20 11:12:07 PDT
Created attachment 765462 [details] [diff] [review]
WIP updated patch

Just for reference, this is what I have at the moment. No need for a thorough review until we sort things out more.
Comment 25 Corey Ford [:coyotebush] 2013-06-20 11:13:57 PDT
Comment on attachment 765462 [details] [diff] [review]
WIP updated patch

># HG changeset patch
># User Corey Ford <cford@mozilla.com>
># Date 1371751796 25200
>#      Thu Jun 20 11:09:56 2013 -0700
># Node ID 8064641498813056dce8ae8c4bd3beb87cd0fda6
># Parent  c12150cfdfef0afcba76233da774e5f64eadc06f
>Bug 829816 - Treat \0 and U+0000 as U+FFFD.
>
>diff --git a/layout/reftests/bugs/228856-1-ref.html b/layout/reftests/bugs/228856-1-ref.html
>--- a/layout/reftests/bugs/228856-1-ref.html
>+++ b/layout/reftests/bugs/228856-1-ref.html
>@@ -1,27 +1,27 @@
> <!doctype html>
> <html><head>
> <!-- This is a test for behavior made up by Mozilla in the absence of
>      specification.  Future CSS specifications may define the behavior
>      differently.  -->
> <title>\0 in CSS</title>
> <style>
>-p#a:before { content: "0x" }
>-p#b:before { content: "00x" }
>-p#c:before { content: "000x" }
>-p#d:before { content: "0000x" }
>-p#e:before { content: "00000x" }
>-p#f:before { content: "000000x" }
>-p#g:before { content: "0 x" }
>-p#h:before { content: "00 x" }
>-p#i:before { content: "000 x" }
>-p#j:before { content: "0000 x" }
>-p#k:before { content: "00000 x" }
>-p#l:before { content: "000000 x" }
>+p#a:before { content: "\FFFDx" }
>+p#b:before { content: "\FFFDx" }
>+p#c:before { content: "\FFFDx" }
>+p#d:before { content: "\FFFDx" }
>+p#e:before { content: "\FFFDx" }
>+p#f:before { content: "\FFFDx" }
>+p#g:before { content: "\FFFD  x" }
>+p#h:before { content: "\FFFD  x" }
>+p#i:before { content: "\FFFD  x" }
>+p#j:before { content: "\FFFD  x" }
>+p#k:before { content: "\FFFD  x" }
>+p#l:before { content: "\FFFD  x" }
> </style>
> </head><body>
> <p id="a">(a)</p>
> <p id="b">(b)</p>
> <p id="c">(c)</p>
> <p id="d">(d)</p>
> <p id="e">(e)</p>
> <p id="f">(f)</p>
>diff --git a/layout/reftests/bugs/228856-1.html b/layout/reftests/bugs/228856-1.html
>--- a/layout/reftests/bugs/228856-1.html
>+++ b/layout/reftests/bugs/228856-1.html
>@@ -6,22 +6,22 @@
> <title>\0 in CSS</title>
> <style>
> p#a:before { content: "\0x" }
> p#b:before { content: "\00x" }
> p#c:before { content: "\000x" }
> p#d:before { content: "\0000x" }
> p#e:before { content: "\00000x" }
> p#f:before { content: "\000000x" }
>-p#g:before { content: "\0 x" }
>-p#h:before { content: "\00 x" }
>-p#i:before { content: "\000 x" }
>-p#j:before { content: "\0000 x" }
>-p#k:before { content: "\00000 x" }
>-p#l:before { content: "\000000 x" }
>+p#g:before { content: "\0  x" }
>+p#h:before { content: "\00  x" }
>+p#i:before { content: "\000  x" }
>+p#j:before { content: "\0000  x" }
>+p#k:before { content: "\00000  x" }
>+p#l:before { content: "\000000  x" }
> </style>
> </head><body>
> <p id="a">(a)</p>
> <p id="b">(b)</p>
> <p id="c">(c)</p>
> <p id="d">(d)</p>
> <p id="e">(e)</p>
> <p id="f">(f)</p>
>diff --git a/layout/reftests/bugs/228856-2-ref.html b/layout/reftests/bugs/228856-2-ref.html
>index 38c02d51902742ec4aead2eaf1f4adaf77c5dd41..ede243db8c11fab966c4879236629b175724b46c
>GIT binary patch
>literal 6195
>zc$|%x>yFwu6oB)ar*Ijm+G?AGJ1oN>TBfUx)OM>KMyr+3NG360H6&?bprfkqz9)_o
>zY_PrXqiW*sT>RO=878mp|C%rU{rUlYTzvk7zCOMG^e{(F*Z2SF&VB!WabFJfyjI)y
>zKYVF6Q2l9c8B52$e>@(&W7kWvjlcNY-?ClQ^K+KbfO#Pco72XmxHp|_2@c_6M;Jy$
>zs7t=@>EW(9Phv)5<}O}R(nLXdb=PF%2jkgpQLx2XPT1Yk;!k%}Bz)z%XtAX^TGNQ|
>z5KQA7k==@fAqml!r%yTh?dkUxeBO8nE$}zMa7NH^OJYPq%4obnEGegxW+W%sfgrXe
>zy8_=5KnpxhPy#nIUL#8)-U<#F$#Rrr(p8x#*|VI61P$F4W$4}G{NcfClta8-MskK|
>zd`O}L$%~xeX&<m09Vy$QeH@ZCjiLV)+2R9*0yIYR$4Auh+FqIFV{#-1lD!2uqA;mW
>zXEf>$-1gg|cZBzBTSW2PYv?3qa33?Y-m`s1$`pA@0=lLFW;BUWvgVO>MYt~Qfhj)o
>zh{F7ER804nQ3i{{yDf7qU6KblCHWh;8AafWm8x(E>#t@eKcFxxX7>UE!UouO-D#sS
>zVU$H=`bZ*R60nQo5Y{D0NyaD<<^r4(w1!lx{YK1u;+KJqNzPtiFEH3dV($cb4toVw
>z7yaDfY(ryrl`xj<#x2-UNt9&cRfL0Yw~!Hb-;vPGmW@|Q6yBm=5XLg}6CH~30ghZ8
>z(M>#t9g~)WAq=KC3}G9&8IMFQuTOR`3d12iz<5~%-C&H%B4ad28QLXrlBYNzWgTF>
>z9cIDy6tT$&SpL<R_<T}@9-luwlT{MFz+4$#!R-mlivJf6RXTmRM|mZ0;-_%0Xo7}!
>z&DS@rTiS*vE5dnx*W|UD)0<_BhR8=ZukkLuEs${HLx>bei^#Tw(u%!A+XXA5Va1-K
>zj-{99nWhUzY2kUMDefqhpJ$hZ0*K)=MIB)1vUQD=&czI>K#QV|P*kH9MRG5CDV7#=
>ziaL%`S)-_<0D48;f(e(cD-_A7MM{@TxtK~7Xi+2=&?{0}&`Ie!mdYAMN<xF8-bKoy
>zxhsVAB-A3TCw5VdS`?Xk(F;>r&<X1~N@a~AC51szzt9NjuaMHeTp$r>QPi&<v<kH-
>zl6%ohv9zF5)OVE18by5t&?_47o%TGtLXnJGqznuXnh3Ngk_+e+DJ|%v40I1#Rav7*
>zNoY_s<jfU<PM}54u-Y^KyJJr2^fIJ!Iu%1ls!~rK4tO}ntFwGwsL~zh!eP;C&_+U~
>zw`>Iw<nwsX9t=ftRS^@2!@Q=03fI!&JU`J<g==U>)6DZs--e^~@I2F3ca+lV4yhr9
>zQ8tO}3C=YyShlV-)Tt*@#n?TGL;XbCJ;}`&4QYBXcq&eHskGixM`4Vfx&;>T!|e)B
>zGHo}+Px#Y}su;T`Ila-7(u2WJ*U^;Lds5PwJ@x8@MPt|5>PfWSmN?!|wB6HXGe%oV
>z4+dMs`7V{#dr}gaJ@pHnQ0F>B{nIju7`vx__1smo-ILsm(U7JGgQvcul-7IdD~!?8
>zfNwi~)?MLArtO9XXV0C8v3ru!8$BsK7z_;@O=-O+C7s#RkYm?bI<a;~!)g~@vxiPe
>zjb@~}1}{TLDtGcBpZ=edjF|qGhqn(|3Oc<_dDWjKp6S&^igmPsnisK=v0U|XW-d3<
>zm#YJ%mmkHDfa^CsF6^fIUcS-Q)5bbyf%@Gy(wD2YqqW>fQ?3q_rk75?*1u$_>zTq!
>z%(BHl;<ulST)Z=;oeDx02#&8nq%uRj8bW3WzPNZDkxdT`6bR{|p@7Cn<kCT-Gf121
>zj~c4l(m?OdA*6xcpCA&7T+&L-+-x6-Wl~mq2TL^*{EmbaBOK`-@g1zzY)1rX;kujy
>vK%geBS8<g@1J|!P90^j(4J0l)7vS?RA)jSa_=o?>2v_`9#|i&2e%kmCQfdl2
>
>diff --git a/layout/reftests/bugs/228856-2-style-1.css b/layout/reftests/bugs/228856-2-style-1.css
>index a6d6acf36a1f9e09007d24dfbf9d57b016d870c5..bb7e68084bfe79d6007c320692fcf5a5c67449bb
>GIT binary patch
>literal 1069
>zc${^T%TB{E5JkK0S6nHJN=P-~RoZj~-@vjP#~u=k#I_t$K!|_GX+jcO$4ehy-=oni
>z0gLaNyF%%8ZoD^jmhK@;Wn7k5T)ge^YP4T}kSa=_<))H*{44?Q@QoumuLQ5wyi$eE
>z;EnB|4A62Zl`huqltjuNs17vX>nr5p>!y8vbsa9GI)*^kiiC6CwPx5@kJ@O{ST1Nk
>zq+d#E2n|&vym1Yh(!5s^HTR+feRAQeFBnY-FKW^r#6`)_l_iH%-HdgGBhn}9L>*Gx
>z#>rhkF|bhg@xbCCe!p47?;n9n5C{v!BC5;8oGueHx@;k$x=h4%nTYE$F{hiTEt?sY
>z+Oe6>w#kgH*d>w3bO)?6wp~IRmOvYioWgz*_awm^topnoxf+(hwlh7Xr!(@-)ZPvZ
>z7woPT>VxO(eq=aj!+`tjlx_YqOxfc&fId00?Kzv|b|d?%9N5zZ!@!;=jAWl|r@s6J
>D6Ye@m
>
>diff --git a/layout/reftests/bugs/228856-2.html b/layout/reftests/bugs/228856-2.html
>index 0bc8ebb5c525c30f0c3f586c223e2ac4b7054455..ac02a54da98ac063d5498d16e72caca89e54d179
>GIT binary patch
>literal 6156
>zc$}4(ZExE)6ovKKzk+puV#O-S@6L;2xOF--K-v|HgJFOXK#}N(s4RIBZ8sS9-*-ug
>zGU<?<>Ht9`oqG=XkQNyRZ*G2HEdToU34LCExkKL`KHl9fP}lYSKZgt7zggZ?4@O?E
>z@B5#=b~~u~bQ4Oqm%e{E9K6HO%Zq1!`G=p-O*-;RT9Al(F^#*kPQdQXf&_;#Tx>86
>z5i51^-#fCu?k@6-;*7e>mo4t1sJgoDQv8E*w<{DSVNqgw{jmJaJ!gh*To)}9Qld3U
>zF)s;8RwBH4!f}ja^!4GcM87=z+JnzK520oF9b{pF(ILSZ!ZD#Fdqy;`y0a~CiHki(
>zG{GAtPcTTCJj+oIHw&Io<SB1T7*Sl5C@-X|$|&E_lEfHI-6ulPhx^6tt=Fj@@#a!o
>zQbe+Sp6+qUJb}|Lq9r;InxI`4<2A{k{ZBXv_XGk+h8FktXyEm|%I0%^z<XT02XRWE
>zQ-i_z`FP^?-?O%(a7PnX#l6?il8nOrkfQaD?h0I4lv^B;HHkt>@(kr`UfCqVbwv+I
>z@t&sy`WL3GyN7~M7#!Z(%4_A~G77i2d<Qqv6nN1JRT#tgtIqfb1XI?#7kE(90NSoQ
>z>vRH2Xo_d|I0a1(bi*u$amly1paj?CqOinh4OUOP=ep;CUln$Olw{vgfz#`fzNENJ
>zFs3Le@cO!2Ga9)3iQD(0vh1=<HjTdK8LlCKd9j01R3`Z@Wut)bdJXL%Ff{;Gn7yD4
>z%$*&~5!UE{`_0gN)#D%bEJPt-_y>8LKKzdIG%qfpRq+-27k&c2Dmj*Ja9u2y4qBE%
>zO<j$e+F|=zR2PfV(NC0yX<b~Ej(@6D)bi4at+W&P{IwB0<?#zR(eMdw2ee>+v$x51
>zc6)<LF>3;U3-@BOM<l-PzCG(*k$!h}w(5}>`RMFzxY=H@n{nWSNRkq!m8C>zUrYN;
>zD@$XcJ(W62`r}-a>nNd*b4{G1WIfJbD-=Qf_yofMG-TClgF%*BFbrx8jnslcw$x)#
>zLUkAhj*_Lppja3%44Ecg^;%$%r4|fBfk8+u7-UO51|?L7VdyAX8Vrhs0mBG1tN|@B
>z$WjZ2QH`OIS}@3#dJIaa4#UV%vNRYJ3j>BRLk0FNFvwC1hOw9pLTbSvTk0_=p*jp>
>zN6FG)P%I1>CVa*`&RbxRr4|g6=1WmaEf{1=Jq9IIhhgF<SsDzAg#p8qYg#}!3M~kx
>z4TApzaJ18RUy9Rpe(FdmP1j-Jg!NIZk&84eOWiUVu?JQEBUnbfRWC}x$>Xi{;ASf4
>z)#|8TEE{zs<XUyTULMt{kZaVA(u~Ksz5$Ms{y5j?I!dTID;klaQ6`4W2G$lYTJ>78
>zk=1rK^&)yy+u6v*Mm9>ifel+trBFQ^#mLNN$Yd#Ba9XgD)pj;R!A7X<Y-D328ztSq
>zX6PuPdNzuYnGLK$yfN+A$Z9*AdbvHS?QCRYBO4{%z=o~6QmCGdVq|7BW(Xmt9UEC~
>zXEPS_MyTy<WMd;6CEdVg>?omnHj0s%&4f=czFxIpBdhIfCe3%MR@>Rg#zr<ux`EBa
>zQ9|`>6eBa6DVMb)bJW_oOdBpOs2oj=Or%@`k*Ooa9eK#B_LscCrvGipc~^$Qw*a2l
>z^n0ewhE(O8lxQ|h7n%sMg)&E6oJKSy(&j`a0c{d{FplI*yj9P187EcFNr`@DU7SXU
>zNoI|>P$HTVsT<qbx8{gS9iJ_1Eb3E^<Kf0BtL;FDkVOPbO;lT%MWaT9%p!RC@SNIB
>zFPbPKq!&$V(WN*;(uvMbL|UhdMkJc05q&rnA&uzcQH0~#OB$)U%MbVXFe&4_fuVYJ
>zgr}Jg8yKxu=XE;Wkn5zwIfHH_bkaebPB(6J$5S_*ZX)T@Qvg1Hsra;*!7qM_23P!4
>KE#Rl@v(A6BX77Li
>
>diff --git a/layout/style/nsCSSScanner.cpp b/layout/style/nsCSSScanner.cpp
>--- a/layout/style/nsCSSScanner.cpp
>+++ b/layout/style/nsCSSScanner.cpp
>@@ -38,17 +38,17 @@ static const uint8_t IS_STRING     = 0x4
> #define SU    S|U
> #define SUI   S|U|I
> #define SUIJ  S|U|I|J
> #define SUIX  S|U|I|X
> #define SUIJX S|U|I|J|X
> 
> static const uint8_t gLexTable[] = {
> // 00    01    02    03    04    05    06    07
>-    0,    S,    S,    S,    S,    S,    S,    S,
>+ SUIJ,    S,    S,    S,    S,    S,    S,    S,
> // 08   TAB    LF    0B    FF    CR    0E    0F
>     S,   SH,    V,    S,    V,    V,    S,    S,
> // 10    11    12    13    14    15    16    17
>     S,    S,    S,    S,    S,    S,    S,    S,
> // 18    19    1A    1B    1C    1D    1E    1F
>     S,    S,    S,    S,    S,    S,    S,    S,
> //SPC     !     "     #     $     %     &     '
>    SH,   SU,    0,   SU,   SU,   SU,   SU,    0,
>@@ -534,17 +534,17 @@ nsCSSScanner::GatherEscape(nsString& aOu
>   MOZ_ASSERT(Peek() == '\\', "should not have been called");
>   int32_t ch = Peek(1);
>   if (ch < 0) {
>     // If we are in a string (or a url() containing a string), we want to drop
>     // the backslash on the floor.  Otherwise, we want to treat it as a U+FFFD
>     // character.
>     Advance();
>     if (!aInString) {
>-      aOutput.Append(0xFFFD);
>+      aOutput.Append(UCS2_REPLACEMENT_CHAR);
>     }
>     return true;
>   }
>   if (IsVertSpace(ch)) {
>     if (aInString) {
>       // In strings (and in url() containing a string), escaped
>       // newlines are completely removed, to allow splitting over
>       // multiple lines.
>@@ -556,17 +556,21 @@ nsCSSScanner::GatherEscape(nsString& aOu
>     return false;
>   }
> 
>   if (!IsHexDigit(ch)) {
>     // "Any character (except a hexadecimal digit, linefeed, carriage
>     // return, or form feed) can be escaped with a backslash to remove
>     // its special meaning." -- CSS2.1 section 4.1.3
>     Advance(2);
>-    aOutput.Append(ch);
>+    if (ch == 0) {
>+      aOutput.Append(UCS2_REPLACEMENT_CHAR);
>+    } else {
>+      aOutput.Append(ch);
>+    }
>     return true;
>   }
> 
>   // "[at most six hexadecimal digits following a backslash] stand
>   // for the ISO 10646 character with that number, which must not be
>   // zero. (It is undefined in CSS 2.1 what happens if a style sheet
>   // does contain a character with Unicode codepoint zero.)"
>   //   -- CSS2.1 section 4.1.3
>@@ -579,32 +583,31 @@ nsCSSScanner::GatherEscape(nsString& aOu
>   int i = 0;
>   do {
>     val = val * 16 + HexDigitValue(ch);
>     i++;
>     Advance();
>     ch = Peek();
>   } while (i < 6 && IsHexDigit(ch));
> 
>-  // Silently deleting \0 opens a content-filtration loophole (see
>-  // bug 228856), so what we do instead is pretend the "cancels the
>-  // meaning of special characters" rule applied.
>+  // "Interpret the hex digits as a hexadecimal number. If this number is zero,
>+  // or is greater than the maximum allowed codepoint, return U+FFFD
>+  // REPLACEMENT CHARACTER" -- CSS Syntax Level 3
>   if (MOZ_UNLIKELY(val == 0)) {
>-    do {
>-      aOutput.Append('0');
>-    } while (--i);
>+    aOutput.Append(UCS2_REPLACEMENT_CHAR);
>   } else {
>     AppendUCS4ToUTF16(ENSURE_VALID_CHAR(val), aOutput);
>-    // Consume exactly one whitespace character after a nonzero
>-    // hexadecimal escape sequence.
>-    if (IsVertSpace(ch)) {
>-      AdvanceLine();
>-    } else if (IsHorzSpace(ch)) {
>-      Advance();
>-    }
>+  }
>+
>+  // Consume exactly one whitespace character after a
>+  // hexadecimal escape sequence.
>+  if (IsVertSpace(ch)) {
>+    AdvanceLine();
>+  } else if (IsHorzSpace(ch)) {
>+    Advance();
>   }
>   return true;
> }
> 
> /**
>  * Consume a run of "text" beginning with the current read position,
>  * consisting of characters in the class |aClass| (which must be a
>  * suitable argument to IsOpenCharClass) plus escape sequences.
>@@ -625,28 +628,34 @@ nsCSSScanner::GatherText(uint8_t aClass,
>              "possibly-inappropriate character class");
> 
>   uint32_t start = mOffset;
>   bool inString = aClass == IS_STRING;
> 
>   for (;;) {
>     // Consume runs of unescaped characters in one go.
>     uint32_t n = mOffset;
>-    while (n < mCount && IsOpenCharClass(mBuffer[n], aClass)) {
>+    while (n < mCount && IsOpenCharClass(mBuffer[n], aClass)
>+        && mBuffer[n] != 0) {
>       n++;
>     }
>     if (n > mOffset) {
>       aText.Append(&mBuffer[mOffset], n - mOffset);
>       mOffset = n;
>     }
>     if (n == mCount) {
>       break;
>     }
> 
>     int32_t ch = Peek();
>+    if (ch == 0) {
>+      Advance();
>+      aText.Append(UCS2_REPLACEMENT_CHAR);
>+      continue;
>+    }
>     MOZ_ASSERT(!IsOpenCharClass(ch, aClass),
>                "should not have exited the inner loop");
> 
>     if (ch != '\\') {
>       break;
>     }
>     if (!GatherEscape(aText, inString)) {
>       break;
>diff --git a/layout/style/test/Makefile.in b/layout/style/test/Makefile.in
>--- a/layout/style/test/Makefile.in
>+++ b/layout/style/test/Makefile.in
>@@ -79,16 +79,18 @@ MOCHITEST_FILES =	test_acid3_test46.html
> 		test_bug664955.html \
> 		test_bug667520.html \
> 		test_bug645998.html \
> 		file_bug645998-1.css \
> 		file_bug645998-2.css \
> 		test_bug716226.html \
> 		test_bug765590.html \
> 		test_bug798567.html \
>+		test_bug829816.html \
>+		file_bug829816.css \
> 		test_cascade.html \
> 		test_ch_ex_no_infloops.html \
> 		test_compute_data_with_start_struct.html \
> 		test_computed_style.html \
> 		test_computed_style_no_pseudo.html \
> 		test_condition_text.html \
> 		test_condition_text_assignment.html \
> 		test_default_computed_style.html \
>diff --git a/layout/style/test/file_bug829816.css b/layout/style/test/file_bug829816.css
>new file mode 100644
>index 0000000000000000000000000000000000000000..8f12ba6f56396b3f2560d18c380280658ddb45b0
>GIT binary patch
>literal 76
>xc$`a8s8&eM&nrpIE3r~gVo<UM@=Af+BHg0Y;#8m*gF-D=5{jZ2EUIEa$^e#}7kvN#
>
>diff --git a/layout/style/test/test_bug829816.html b/layout/style/test/test_bug829816.html
>new file mode 100644
>--- /dev/null
>+++ b/layout/style/test/test_bug829816.html
>@@ -0,0 +1,56 @@
>+<!DOCTYPE HTML>
>+<html>
>+<!--
>+https://bugzilla.mozilla.org/show_bug.cgi?id=829816
>+-->
>+<head>
>+  <meta charset="utf-8">
>+  <title>Test for Bug 829816</title>
>+  <script type="application/javascript" src="/tests/SimpleTest/SimpleTest.js"></script>
>+  <link rel="stylesheet" type="text/css" href="/tests/SimpleTest/test.css"/>
>+
>+  <style type="text/css">
>+    b { content: "\0";      counter-reset: \0      }
>+    b { content: "\00";     counter-reset: \00     }
>+    b { content: "\000";    counter-reset: \000    }
>+    b { content: "\0000";   counter-reset: \0000   }
>+    b { content: "\00000";  counter-reset: \00000  }
>+    b { content: "\000000"; counter-reset: \000000 }
>+  </style>
>+
>+  <!-- U+0000 characters in <style> would be replaced by the HTML parser -->
>+  <link rel="stylesheet" type="text/css" href="file_bug829816.css"/>
>+
>+  <script type="application/javascript">
>+
>+  /** Test for Bug 829816 **/
>+  var ss = document.styleSheets[1];
>+
>+  for (var i = 0; i < 6; i++) {
>+    is(ss.cssRules[i].style.content, "\"\uFFFD\"",
>+        "\\0 in strings should be converted to U+FFFD");
>+    is(ss.cssRules[i].style.counterReset, "\uFFFD",
>+        "\\0 in identifiers should be converted to U+FFFD");
>+  }
>+
>+  is(document.styleSheets[2].cssRules[0].style.content, "\"\uFFFD\"",
>+      "U+0000 in strings should be converted to U+FFFD");
>+  is(document.styleSheets[2].cssRules[0].style.counterReset, "\uFFFD",
>+      "U+0000 in identifiers should be converted to U+FFFD");
>+  is(document.styleSheets[2].cssRules[1].style.content, "\"\uFFFD\"",
>+      "U+0000 in strings should be converted to U+FFFD");
>+  is(document.styleSheets[2].cssRules[1].style.counterReset, "\uFFFD",
>+      "U+0000 in identifiers should be converted to U+FFFD");
>+
>+
>+  </script>
>+</head>
>+<body>
>+<a target="_blank" href="https://bugzilla.mozilla.org/show_bug.cgi?id=829816">Mozilla Bug 829816</a>
>+<p id="display"></p>
>+<div id="content" style="display: none">
>+</div>
>+<pre id="test">
>+</pre>
>+</body>
>+</html>
Comment 26 Corey Ford [:coyotebush] 2013-06-20 11:16:23 PDT
Oops. I'll figure this out soon enough...
Comment 27 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-20 16:45:16 PDT
Just chatted with Corey; we concluded Zack's right in comment 20, and we should special case null in IsIdentChar/IsIdentStart/GatherText.
Comment 28 Corey Ford [:coyotebush] 2013-06-20 16:49:58 PDT
Created attachment 765692 [details] [diff] [review]
Revised patch with updated tests

Changed the character-class handling as discussed, and updated the bug 228856 tests (unfortunately, more binary diffs here). Also fixed the consumption of whitespace after a hex escape, in both the code and tests.
Comment 29 Zack Weinberg (:zwol) 2013-06-20 18:12:48 PDT
Comment on attachment 765692 [details] [diff] [review]
Revised patch with updated tests

Review of attachment 765692 [details] [diff] [review]:
-----------------------------------------------------------------

This mostly looks good to me.  I haven't looked at the files with embedded NULs.  I'm willing to do that on Monday if dbaron doesn't get to it first.  (I'm traveling between now and then.)

One major issue remains, though: it doesn't appear that you changed nsCSSScanner::Next()'s final ("Otherwise, a symbol (DELIM)") case.  Is there a reason why we can't hit that with U+0000?  (Can you think of a way to test it?)

::: layout/reftests/bugs/228856-1-ref.html
@@ +1,5 @@
>  <!doctype html>
>  <html><head>
>  <!-- This is a test for behavior made up by Mozilla in the absence of
>       specification.  Future CSS specifications may define the behavior
>       differently.  -->

Should correct this comment, now there is a (draft) spec.  (Same applies to any other occurrences of the same comment.)

::: layout/style/nsCSSScanner.cpp
@@ +649,5 @@
> +    if (ch == '\0') {
> +      Advance();
> +      aText.Append(UCS2_REPLACEMENT_CHAR);
> +      continue;
> +    }

Swap this chunk with the MOZ_ASSERT immediately below it.
Comment 30 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-20 18:18:25 PDT
Comment on attachment 765692 [details] [diff] [review]
Revised patch with updated tests

Your commit message should say something about CSS.  Perhaps:

Bug 829816 - Treat \0 and U+0000 in CSS style sheets as U+FFFD.



In the 228856-1 tests, I actually might not have changed the spacing
at all on tests #g-#l, though it's ok as is.

In 228856-2-ref.html (the reference) I think you should avoid having
other bits in the class attribute along with match.  I think you should
probably just remove all the class and id attributes from the reference
except for class="match".  Otherwise it seems possible for the test to
pass for the wrong reasons.

You also might as well add back the 7th i7 case (in both test and reference,
of course).  


In nsCSSScanner.cpp, you should probably include the comment that's above
IsIdentChar above IsIdentStart as well.  Also, in both functions, I'd
test equality against int32_t(0) rather than '\0' because then you're
comparing the same types.


r=dbaron with that
Comment 31 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-20 18:21:28 PDT
Here's a try push:
https://tbpl.mozilla.org/?tree=Try&rev=3b7830406b5c
(which also caught the two most recent changesets on mozilla-central).
Comment 32 Corey Ford [:coyotebush] 2013-06-20 18:43:58 PDT
(In reply to Zack Weinberg (:zwol) from comment #29)
> One major issue remains, though: it doesn't appear that you changed
> nsCSSScanner::Next()'s final ("Otherwise, a symbol (DELIM)") case.  Is there
> a reason why we can't hit that with U+0000?  (Can you think of a way to test
> it?)

Won't U+0000 be caught by
>  // identifier family
>  if (IsIdentStart(ch)) {
>    return ScanIdent(aToken);
>  }
earlier in Next()?

(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #30)
> In 228856-2-ref.html (the reference) I think you should avoid having
> other bits in the class attribute along with match.  I think you should
> probably just remove all the class and id attributes from the reference
> except for class="match".  Otherwise it seems possible for the test to
> pass for the wrong reasons.

Good point.

> Also, in both functions, I'd
> test equality against int32_t(0) rather than '\0' because then you're
> comparing the same types.

I was following Zack's example in comment 20, and we certainly compare with lots of other character constants in this file. But that's fine.

I'll address the other comments in an updated patch.
Comment 33 Corey Ford [:coyotebush] 2013-06-21 09:54:54 PDT
Created attachment 765955 [details] [diff] [review]
Further revised patch

Addressed feedback, and fixed test_parser_diagnostics_unprintables.html.
Comment 34 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2013-06-26 15:09:10 PDT
Comment on attachment 765955 [details] [diff] [review]
Further revised patch

In general, when I mark review+, that means you don't need to ask for review again as long as you've done the things specified in the review comment.  But this looks fine (though I didn't look at the test that requires the binary diff again).  Sorry for taking so long to get to it a second time.

(In hindsight, maybe it was a bad idea to ask to change the '\0'.  But no need to change it back, either.)
Comment 35 Corey Ford [:coyotebush] 2013-06-26 15:19:39 PDT
Created attachment 768035 [details] [diff] [review]
With r= message
Comment 36 Ryan VanderMeulen [:RyanVM] 2013-06-26 18:30:50 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/713f1c85c114
Comment 37 Ed Morley [:emorley] 2013-06-27 03:48:36 PDT
https://hg.mozilla.org/mozilla-central/rev/713f1c85c114

Note You need to log in before you can comment on or make changes to this bug.