Last Comment Bug 316394 - CSS and HTML parsers shouldn't allow high/low surrogates via escapes
: CSS and HTML parsers shouldn't allow high/low surrogates via escapes
Status: RESOLVED FIXED
: fixed1.8.1.17
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: x86 Linux
: -- normal (vote)
: mozilla1.9alpha1
Assigned To: Boris Zbarsky [:bz]
: Hixie (not reading bugmail)
Mentors:
Depends on: 316859 469463
Blocks: 316338 448166
  Show dependency treegraph
 
Reported: 2005-11-14 07:40 PST by Boris Zbarsky [:bz]
Modified: 2009-02-09 11:14 PST (History)
11 users (show)
dveditz: blocking1.8.1.17+
asac: wanted1.8.0.x+
bzbarsky: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Like so (17.27 KB, patch)
2005-11-14 14:02 PST, Boris Zbarsky [:bz]
mrbkap: review+
dbaron: superreview+
Details | Diff | Splinter Review
Fix that issue (23.75 KB, patch)
2005-11-15 08:56 PST, Boris Zbarsky [:bz]
dbaron: superreview+
Details | Diff | Splinter Review
Updated to David's comments (5.57 KB, patch)
2005-11-15 18:15 PST, Boris Zbarsky [:bz]
smontagu: review+
Details | Diff | Splinter Review
Fix for that bustage that I just checked in (2.14 KB, patch)
2005-11-15 20:30 PST, Boris Zbarsky [:bz]
no flags Details | Diff | Splinter Review
Fix escaped newline case too (5.63 KB, patch)
2005-11-16 07:05 PST, Boris Zbarsky [:bz]
no flags Details | Diff | Splinter Review
n-th time is the charm... ;) (5.67 KB, patch)
2005-11-16 14:29 PST, Boris Zbarsky [:bz]
dbaron: superreview+
Details | Diff | Splinter Review
1.8 branch merge (20.77 KB, patch)
2008-08-22 22:40 PDT, Boris Zbarsky [:bz]
dveditz: approval1.8.1.17+
Details | Diff | Splinter Review
regerated patch for 1.8.0 branch (13.50 KB, patch)
2009-01-13 02:48 PST, Martin Stránský
asac: approval1.8.0.next+
Details | Diff | Splinter Review

Description Boris Zbarsky [:bz] 2005-11-14 07:40:44 PST
See bug 316338 -- this covers the CSS part.
Comment 1 Boris Zbarsky [:bz] 2005-11-14 14:02:45 PST
Created attachment 203043 [details] [diff] [review]
Like so

I can remove the "extra" changes, but I figured that if we have these macros we should use them.....
Comment 2 Boris Zbarsky [:bz] 2005-11-14 14:03:45 PST
Comment on attachment 203043 [details] [diff] [review]
Like so

Blake, could you give the nsHTMLTokens change a look?
Comment 3 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2005-11-14 14:49:58 PST
Note that I think you *may* be able to shoehorn assertions into the macros where you had a comment asking so (and perhaps into some of the other macros), by using the "," operator.  However, I question whether it's worthwhile, because you'd have to decrease the readability of the code so much to do it.
Comment 4 Blake Kaplan (:mrbkap) 2005-11-14 14:51:34 PST
Comment on attachment 203043 [details] [diff] [review]
Like so

r=mrbkap for the nsHTMLTokens.cpp change.
Comment 5 Boris Zbarsky [:bz] 2005-11-14 14:59:30 PST
Yeah, I considered the ',' thing and didn't really feel like it was worth it.
Comment 6 Simon Montagu :smontagu 2005-11-15 07:14:29 PST
Comment on attachment 203043 [details] [diff] [review]
Like so

>@@ -890,25 +892,26 @@ PRInt32 nsCSSScanner::ParseEscape(nsresu
>  */
> PRBool nsCSSScanner::GatherIdent(nsresult& aErrorCode, PRInt32 aChar,
>                                  nsString& aIdent)
> {
>   if (aChar == CSS_ESCAPE) {
>     aChar = ParseEscape(aErrorCode);
>   }
>   if (0 < aChar) {
>-    aIdent.Append(PRUnichar(aChar));
>+    // XXXbz what if it's \000000 ?  Should we really put that in the ident?
>+    AppendUCS4ToUTF16(aChar, aIdent);
>   }

Shouldn't the AppendUCS4ToUTF16() be inside the if (aChar == CSS_ESCAPE) here? If we get a non-BMP character without escaping (e.g. in a UTF-8 stylesheet) it's going to come in here as two surrogates, and AppendUCS4toUTF16 will assert.

>@@ -1115,13 +1118,13 @@ PRBool nsCSSScanner::ParseString(nsresul
>     }
>     if (ch == CSS_ESCAPE) {
>       ch = ParseEscape(aErrorCode);
>       if (ch < 0) {
>         return PR_FALSE;
>       }
>     }
>     if (0 < ch) {
>-      aToken.mIdent.Append(PRUnichar(ch));
>+      AppendUCS4ToUTF16(ch, aToken.mIdent);

ditto.

Alternatively, you could make nsCSSScanner::Read() surrogate-aware (I'm not sure why it isn't already, since it returns a PRInt32), though that would require lots more changes.

More comments shortly, after I test something.
Comment 7 Boris Zbarsky [:bz] 2005-11-15 08:32:16 PST
> If we get a non-BMP character without escaping (e.g. in a UTF-8 stylesheet)
> it's going to come in here as two surrogates

Oh, good catch!

Along these lines, I'm a little worried about ParseEscape when a '\' is followed by a high or low surrogate (instead of a number).  Let me poke at this a bit.

> Alternatively, you could make nsCSSScanner::Read() surrogate-aware

I don't think that's really desirable...
Comment 8 Boris Zbarsky [:bz] 2005-11-15 08:56:39 PST
Created attachment 203132 [details] [diff] [review]
Fix that issue

Turns out that ParseEscape never really returned negative numbers anyway...  David, I made some changes to scanner from the previous patch that really do need your review.
Comment 9 Simon Montagu :smontagu 2005-11-15 09:05:19 PST
I know it was I who suggested using U+FFFD, but I'm having second thoughts: like this, the selectors .class\d800 .class\110000 and .class\fffd are all equivalent. I can't find anything clear in the CSS spec about how to handle identifiers containing non-characters, but I suspect they should make the whole declaration invalid.
Comment 10 Simon Montagu :smontagu 2005-11-15 09:39:29 PST
Comment on attachment 203132 [details] [diff] [review]
Fix that issue

r=smontagu on all the macro changes if you want to check them in already.
Comment 11 Boris Zbarsky [:bz] 2005-11-15 11:54:01 PST
I checked in the non-CSS parts of this, plus a bustage fix.
Comment 12 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2005-11-15 11:58:25 PST
Comment on attachment 203132 [details] [diff] [review]
Fix that issue

If you change ParseAndAppendEscape so that it doesn't append anything in the cases where it previously returned zero, the sr=dbaron.
Comment 13 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2005-11-15 12:07:23 PST
Although yeah, I am a little suspicious of ENSURE_VALID_CHAR in the CSS scanner as well.  It might be worth emailing www-style.
Comment 14 Boris Zbarsky [:bz] 2005-11-15 12:13:22 PST
I did.  See thread at http://lists.w3.org/Archives/Public/www-style/2005Nov/0069.html
Comment 15 Boris Zbarsky [:bz] 2005-11-15 18:15:32 PST
Created attachment 203210 [details] [diff] [review]
Updated to David's comments
Comment 16 cls 2005-11-15 20:17:59 PST
FYI, this checkin broke the AIX tinderbox on the ports page.  IS_SURROGATE is being multiply defined (and with different values).  See http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#79 .
Comment 17 Boris Zbarsky [:bz] 2005-11-15 20:30:33 PST
Created attachment 203218 [details] [diff] [review]
Fix for that bustage that I just checked in
Comment 18 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2005-11-15 22:33:10 PST
You need to get the last one too; in fact, it's probably the most important (escaped newlines).
Comment 19 Simon Montagu :smontagu 2005-11-16 06:59:00 PST
Comment on attachment 203210 [details] [diff] [review]
Updated to David's comments

Ok, given the response on www-style, and given that the results are the same as inserting the values of the escapes as raw characters in UTF-32, I'll buy this.
Comment 20 Boris Zbarsky [:bz] 2005-11-16 07:05:14 PST
Created attachment 203265 [details] [diff] [review]
Fix escaped newline case too

This should be good now...
Comment 21 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2005-11-16 11:08:31 PST
You still don't know that ch is nonzero in that last case.
Comment 22 Boris Zbarsky [:bz] 2005-11-16 11:46:47 PST
Hmm.  Ok, true.  Do you think we should append CSS_ESCAPE if we have '\' followed by zero bytes?  Or just append nothing?
Comment 23 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2005-11-16 12:02:12 PST
We currently append nothing; changing that should probably be done in bug 228856.
Comment 24 Boris Zbarsky [:bz] 2005-11-16 14:29:57 PST
Created attachment 203308 [details] [diff] [review]
n-th time is the charm... ;)
Comment 25 Simon Montagu :smontagu 2005-11-17 04:32:18 PST
Comment on attachment 203043 [details] [diff] [review]
Like so

>Index: xpcom/string/public/nsUTF8Utils.h
>@@ -456,23 +453,21 @@ class ConvertUTF8toUTF16

>-                  *out++ = (PRUnichar)(ucs4 >> 10) | 0xd800u;
>-                  *out++ = (PRUnichar)(ucs4 & 0x3ff) | 0xdc00u;
>+                  *out++ = (value_type)H_SURROGATE(ucs4);
>+                  *out++ = (value_type)L_SURROGATE(ucs4);

Ow, ow, ow. This should be (buffer_type). Sorry to have missed it first time round.
Comment 26 Boris Zbarsky [:bz] 2005-11-17 07:59:47 PST
Fixed (including smontagu's catch in comment 25).
Comment 27 Boris Zbarsky [:bz] 2008-08-22 22:40:23 PDT
Created attachment 335154 [details] [diff] [review]
1.8 branch merge
Comment 28 Samuel Sidler (old account; do not CC) 2008-08-22 23:48:48 PDT
Comment on attachment 335154 [details] [diff] [review]
1.8 branch merge

Boris, I think you meant to request approval1.8.1.17. Switching flags to that.
Comment 29 Daniel Veditz [:dveditz] 2008-08-23 23:01:47 PDT
Comment on attachment 335154 [details] [diff] [review]
1.8 branch merge

Approved for 1.8.1.17, a=dveditz for release-drivers
Comment 30 Boris Zbarsky [:bz] 2008-08-24 17:29:06 PDT
> Boris, I think you meant to request approval1.8.1.17.

Yeah, sorry about that.  Good catch!
Comment 31 Boris Zbarsky [:bz] 2008-08-25 08:25:04 PDT
Checked in on the 1.8.1 branch.
Comment 32 Al Billings [:abillings] 2008-09-02 15:54:45 PDT
(In reply to comment #0)
> See bug 316338 -- this covers the CSS part.

I've tried using the test cases from there with the 2.0.0.17 candidate build and nothing crashes. Does the crash only occur in debug builds?
Comment 33 Boris Zbarsky [:bz] 2008-09-02 17:54:50 PDT
The patch in this bug mostly fixes the crashes in those testcases, I think.
Comment 34 Al Billings [:abillings] 2008-09-02 18:07:41 PDT
I don't get a crash on 2.0.0.16 (or 2.0.0.17) with the test case from bug 316338 on either Ubuntu or Windows XP.
Comment 35 Martin Stránský 2009-01-13 02:48:04 PST
Created attachment 356697 [details] [diff] [review]
regerated patch for 1.8.0 branch
Comment 36 Alexander Sack 2009-02-09 11:14:15 PST
Comment on attachment 356697 [details] [diff] [review]
regerated patch for 1.8.0 branch

a=asac for 1.8.0

Note You need to log in before you can comment on or make changes to this bug.