CSS and HTML parsers shouldn't allow high/low surrogates via escapes

RESOLVED FIXED in mozilla1.9alpha1

Status

()

Core
CSS Parsing and Computation
RESOLVED FIXED
12 years ago
9 years ago

People

(Reporter: bz, Assigned: bz)

Tracking

(Blocks: 1 bug, {fixed1.8.1.17})

Trunk
mozilla1.9alpha1
x86
Linux
fixed1.8.1.17
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.8.1.17 +
wanted1.8.0.x +
in-testsuite ?

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 4 obsolete attachments)

See bug 316338 -- this covers the CSS part.
(Assignee)

Updated

12 years ago
Summary: CSS parser shouldn't allow high/low surrogates via CSS escapes → CSS and HTML parsesr shouldn't allow high/low surrogates via escapes
Created attachment 203043 [details] [diff] [review]
Like so

I can remove the "extra" changes, but I figured that if we have these macros we should use them.....
Attachment #203043 - Flags: superreview?(dbaron)
Attachment #203043 - Flags: review?(smontagu)
Comment on attachment 203043 [details] [diff] [review]
Like so

Blake, could you give the nsHTMLTokens change a look?
Attachment #203043 - Flags: review?(mrbkap)
Summary: CSS and HTML parsesr shouldn't allow high/low surrogates via escapes → CSS and HTML parsers shouldn't allow high/low surrogates via escapes
Attachment #203043 - Flags: superreview?(dbaron) → superreview+
Note that I think you *may* be able to shoehorn assertions into the macros where you had a comment asking so (and perhaps into some of the other macros), by using the "," operator.  However, I question whether it's worthwhile, because you'd have to decrease the readability of the code so much to do it.
Comment on attachment 203043 [details] [diff] [review]
Like so

r=mrbkap for the nsHTMLTokens.cpp change.
Attachment #203043 - Flags: review?(mrbkap) → review+
Yeah, I considered the ',' thing and didn't really feel like it was worth it.
Comment on attachment 203043 [details] [diff] [review]
Like so

>@@ -890,25 +892,26 @@ PRInt32 nsCSSScanner::ParseEscape(nsresu
>  */
> PRBool nsCSSScanner::GatherIdent(nsresult& aErrorCode, PRInt32 aChar,
>                                  nsString& aIdent)
> {
>   if (aChar == CSS_ESCAPE) {
>     aChar = ParseEscape(aErrorCode);
>   }
>   if (0 < aChar) {
>-    aIdent.Append(PRUnichar(aChar));
>+    // XXXbz what if it's \000000 ?  Should we really put that in the ident?
>+    AppendUCS4ToUTF16(aChar, aIdent);
>   }

Shouldn't the AppendUCS4ToUTF16() be inside the if (aChar == CSS_ESCAPE) here? If we get a non-BMP character without escaping (e.g. in a UTF-8 stylesheet) it's going to come in here as two surrogates, and AppendUCS4toUTF16 will assert.

>@@ -1115,13 +1118,13 @@ PRBool nsCSSScanner::ParseString(nsresul
>     }
>     if (ch == CSS_ESCAPE) {
>       ch = ParseEscape(aErrorCode);
>       if (ch < 0) {
>         return PR_FALSE;
>       }
>     }
>     if (0 < ch) {
>-      aToken.mIdent.Append(PRUnichar(ch));
>+      AppendUCS4ToUTF16(ch, aToken.mIdent);

ditto.

Alternatively, you could make nsCSSScanner::Read() surrogate-aware (I'm not sure why it isn't already, since it returns a PRInt32), though that would require lots more changes.

More comments shortly, after I test something.
> If we get a non-BMP character without escaping (e.g. in a UTF-8 stylesheet)
> it's going to come in here as two surrogates

Oh, good catch!

Along these lines, I'm a little worried about ParseEscape when a '\' is followed by a high or low surrogate (instead of a number).  Let me poke at this a bit.

> Alternatively, you could make nsCSSScanner::Read() surrogate-aware

I don't think that's really desirable...
Created attachment 203132 [details] [diff] [review]
Fix that issue

Turns out that ParseEscape never really returned negative numbers anyway...  David, I made some changes to scanner from the previous patch that really do need your review.
Attachment #203043 - Attachment is obsolete: true
Attachment #203132 - Flags: superreview?(dbaron)
Attachment #203132 - Flags: review?(smontagu)
Attachment #203043 - Flags: review?(smontagu)
I know it was I who suggested using U+FFFD, but I'm having second thoughts: like this, the selectors .class\d800 .class\110000 and .class\fffd are all equivalent. I can't find anything clear in the CSS spec about how to handle identifiers containing non-characters, but I suspect they should make the whole declaration invalid.
Comment on attachment 203132 [details] [diff] [review]
Fix that issue

r=smontagu on all the macro changes if you want to check them in already.
I checked in the non-CSS parts of this, plus a bustage fix.
Comment on attachment 203132 [details] [diff] [review]
Fix that issue

If you change ParseAndAppendEscape so that it doesn't append anything in the cases where it previously returned zero, the sr=dbaron.
Attachment #203132 - Flags: superreview?(dbaron) → superreview+
Although yeah, I am a little suspicious of ENSURE_VALID_CHAR in the CSS scanner as well.  It might be worth emailing www-style.
I did.  See thread at http://lists.w3.org/Archives/Public/www-style/2005Nov/0069.html
Created attachment 203210 [details] [diff] [review]
Updated to David's comments
Attachment #203132 - Attachment is obsolete: true
Attachment #203210 - Flags: review?(smontagu)
Attachment #203132 - Flags: review?(smontagu)

Comment 16

12 years ago
FYI, this checkin broke the AIX tinderbox on the ports page.  IS_SURROGATE is being multiply defined (and with different values).  See http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#79 .
Created attachment 203218 [details] [diff] [review]
Fix for that bustage that I just checked in
You need to get the last one too; in fact, it's probably the most important (escaped newlines).
Comment on attachment 203210 [details] [diff] [review]
Updated to David's comments

Ok, given the response on www-style, and given that the results are the same as inserting the values of the escapes as raw characters in UTF-32, I'll buy this.
Attachment #203210 - Flags: review?(smontagu) → review+
Created attachment 203265 [details] [diff] [review]
Fix escaped newline case too

This should be good now...
Attachment #203210 - Attachment is obsolete: true
Attachment #203265 - Flags: superreview?(dbaron)
You still don't know that ch is nonzero in that last case.
Hmm.  Ok, true.  Do you think we should append CSS_ESCAPE if we have '\' followed by zero bytes?  Or just append nothing?
We currently append nothing; changing that should probably be done in bug 228856.
Created attachment 203308 [details] [diff] [review]
n-th time is the charm... ;)
Attachment #203265 - Attachment is obsolete: true
Attachment #203308 - Flags: superreview?(dbaron)
Attachment #203265 - Flags: superreview?(dbaron)
Attachment #203308 - Flags: superreview?(dbaron) → superreview+
Comment on attachment 203043 [details] [diff] [review]
Like so

>Index: xpcom/string/public/nsUTF8Utils.h
>@@ -456,23 +453,21 @@ class ConvertUTF8toUTF16

>-                  *out++ = (PRUnichar)(ucs4 >> 10) | 0xd800u;
>-                  *out++ = (PRUnichar)(ucs4 & 0x3ff) | 0xdc00u;
>+                  *out++ = (value_type)H_SURROGATE(ucs4);
>+                  *out++ = (value_type)L_SURROGATE(ucs4);

Ow, ow, ow. This should be (buffer_type). Sorry to have missed it first time round.
Fixed (including smontagu's catch in comment 25).
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → FIXED
(Assignee)

Updated

11 years ago
Target Milestone: --- → mozilla1.9alpha
Flags: blocking1.8.1.17?
Flags: blocking1.8.1.17? → blocking1.8.1.17+
Blocks: 448166
Created attachment 335154 [details] [diff] [review]
1.8 branch merge
Attachment #335154 - Flags: approval1.9.0.2?
Flags: in-testsuite?
Comment on attachment 335154 [details] [diff] [review]
1.8 branch merge

Boris, I think you meant to request approval1.8.1.17. Switching flags to that.
Attachment #335154 - Flags: approval1.9.0.2? → approval1.8.1.17?
Comment on attachment 335154 [details] [diff] [review]
1.8 branch merge

Approved for 1.8.1.17, a=dveditz for release-drivers
Attachment #335154 - Flags: approval1.8.1.17? → approval1.8.1.17+
> Boris, I think you meant to request approval1.8.1.17.

Yeah, sorry about that.  Good catch!
Checked in on the 1.8.1 branch.
Keywords: fixed1.8.1.17
Depends on: 316859
(In reply to comment #0)
> See bug 316338 -- this covers the CSS part.

I've tried using the test cases from there with the 2.0.0.17 candidate build and nothing crashes. Does the crash only occur in debug builds?
The patch in this bug mostly fixes the crashes in those testcases, I think.
I don't get a crash on 2.0.0.16 (or 2.0.0.17) with the test case from bug 316338 on either Ubuntu or Windows XP.
Depends on: 469463

Comment 35

9 years ago
Created attachment 356697 [details] [diff] [review]
regerated patch for 1.8.0 branch

Updated

9 years ago
Attachment #356697 - Flags: approval1.8.0.next?

Updated

9 years ago
Flags: wanted1.8.0.x+

Updated

9 years ago
Attachment #356697 - Flags: approval1.8.0.next? → approval1.8.0.next+

Comment 36

9 years ago
Comment on attachment 356697 [details] [diff] [review]
regerated patch for 1.8.0 branch

a=asac for 1.8.0
You need to log in before you can comment on or make changes to this bug.