CSS and HTML parsers shouldn't allow high/low surrogates via escapes

RESOLVED FIXED in mozilla1.9alpha1

Status

()

Core
CSS Parsing and Computation
RESOLVED FIXED
12 years ago
9 years ago

People

(Reporter: bz, Assigned: bz)

Tracking

(Blocks: 1 bug, {fixed1.8.1.17})

Trunk
mozilla1.9alpha1
x86
Linux
fixed1.8.1.17
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.8.1.17 +
wanted1.8.0.x +
in-testsuite ?

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 4 obsolete attachments)

(Assignee)

Description

12 years ago
See bug 316338 -- this covers the CSS part.
(Assignee)

Updated

12 years ago
Summary: CSS parser shouldn't allow high/low surrogates via CSS escapes → CSS and HTML parsesr shouldn't allow high/low surrogates via escapes
(Assignee)

Comment 1

12 years ago
Created attachment 203043 [details] [diff] [review]
Like so

I can remove the "extra" changes, but I figured that if we have these macros we should use them.....
Attachment #203043 - Flags: superreview?(dbaron)
Attachment #203043 - Flags: review?(smontagu)
(Assignee)

Comment 2

12 years ago
Comment on attachment 203043 [details] [diff] [review]
Like so

Blake, could you give the nsHTMLTokens change a look?
Attachment #203043 - Flags: review?(mrbkap)
Summary: CSS and HTML parsesr shouldn't allow high/low surrogates via escapes → CSS and HTML parsers shouldn't allow high/low surrogates via escapes
Attachment #203043 - Flags: superreview?(dbaron) → superreview+
Note that I think you *may* be able to shoehorn assertions into the macros where you had a comment asking so (and perhaps into some of the other macros), by using the "," operator.  However, I question whether it's worthwhile, because you'd have to decrease the readability of the code so much to do it.
Comment on attachment 203043 [details] [diff] [review]
Like so

r=mrbkap for the nsHTMLTokens.cpp change.
Attachment #203043 - Flags: review?(mrbkap) → review+
(Assignee)

Comment 5

12 years ago
Yeah, I considered the ',' thing and didn't really feel like it was worth it.
Comment on attachment 203043 [details] [diff] [review]
Like so

>@@ -890,25 +892,26 @@ PRInt32 nsCSSScanner::ParseEscape(nsresu
>  */
> PRBool nsCSSScanner::GatherIdent(nsresult& aErrorCode, PRInt32 aChar,
>                                  nsString& aIdent)
> {
>   if (aChar == CSS_ESCAPE) {
>     aChar = ParseEscape(aErrorCode);
>   }
>   if (0 < aChar) {
>-    aIdent.Append(PRUnichar(aChar));
>+    // XXXbz what if it's \000000 ?  Should we really put that in the ident?
>+    AppendUCS4ToUTF16(aChar, aIdent);
>   }

Shouldn't the AppendUCS4ToUTF16() be inside the if (aChar == CSS_ESCAPE) here? If we get a non-BMP character without escaping (e.g. in a UTF-8 stylesheet) it's going to come in here as two surrogates, and AppendUCS4toUTF16 will assert.

>@@ -1115,13 +1118,13 @@ PRBool nsCSSScanner::ParseString(nsresul
>     }
>     if (ch == CSS_ESCAPE) {
>       ch = ParseEscape(aErrorCode);
>       if (ch < 0) {
>         return PR_FALSE;
>       }
>     }
>     if (0 < ch) {
>-      aToken.mIdent.Append(PRUnichar(ch));
>+      AppendUCS4ToUTF16(ch, aToken.mIdent);

ditto.

Alternatively, you could make nsCSSScanner::Read() surrogate-aware (I'm not sure why it isn't already, since it returns a PRInt32), though that would require lots more changes.

More comments shortly, after I test something.
(Assignee)

Comment 7

12 years ago
> If we get a non-BMP character without escaping (e.g. in a UTF-8 stylesheet)
> it's going to come in here as two surrogates

Oh, good catch!

Along these lines, I'm a little worried about ParseEscape when a '\' is followed by a high or low surrogate (instead of a number).  Let me poke at this a bit.

> Alternatively, you could make nsCSSScanner::Read() surrogate-aware

I don't think that's really desirable...
(Assignee)

Comment 8

12 years ago
Created attachment 203132 [details] [diff] [review]
Fix that issue

Turns out that ParseEscape never really returned negative numbers anyway...  David, I made some changes to scanner from the previous patch that really do need your review.
Attachment #203043 - Attachment is obsolete: true
Attachment #203132 - Flags: superreview?(dbaron)
Attachment #203132 - Flags: review?(smontagu)
Attachment #203043 - Flags: review?(smontagu)
I know it was I who suggested using U+FFFD, but I'm having second thoughts: like this, the selectors .class\d800 .class\110000 and .class\fffd are all equivalent. I can't find anything clear in the CSS spec about how to handle identifiers containing non-characters, but I suspect they should make the whole declaration invalid.
Comment on attachment 203132 [details] [diff] [review]
Fix that issue

r=smontagu on all the macro changes if you want to check them in already.
(Assignee)

Comment 11

12 years ago
I checked in the non-CSS parts of this, plus a bustage fix.
Comment on attachment 203132 [details] [diff] [review]
Fix that issue

If you change ParseAndAppendEscape so that it doesn't append anything in the cases where it previously returned zero, the sr=dbaron.
Attachment #203132 - Flags: superreview?(dbaron) → superreview+
Although yeah, I am a little suspicious of ENSURE_VALID_CHAR in the CSS scanner as well.  It might be worth emailing www-style.
(Assignee)

Comment 14

12 years ago
I did.  See thread at http://lists.w3.org/Archives/Public/www-style/2005Nov/0069.html
(Assignee)

Comment 15

12 years ago
Created attachment 203210 [details] [diff] [review]
Updated to David's comments
Attachment #203132 - Attachment is obsolete: true
Attachment #203210 - Flags: review?(smontagu)
Attachment #203132 - Flags: review?(smontagu)

Comment 16

12 years ago
FYI, this checkin broke the AIX tinderbox on the ports page.  IS_SURROGATE is being multiply defined (and with different values).  See http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#79 .
(Assignee)

Comment 17

12 years ago
Created attachment 203218 [details] [diff] [review]
Fix for that bustage that I just checked in
You need to get the last one too; in fact, it's probably the most important (escaped newlines).
Comment on attachment 203210 [details] [diff] [review]
Updated to David's comments

Ok, given the response on www-style, and given that the results are the same as inserting the values of the escapes as raw characters in UTF-32, I'll buy this.
Attachment #203210 - Flags: review?(smontagu) → review+
(Assignee)

Comment 20

12 years ago
Created attachment 203265 [details] [diff] [review]
Fix escaped newline case too

This should be good now...
Attachment #203210 - Attachment is obsolete: true
Attachment #203265 - Flags: superreview?(dbaron)
You still don't know that ch is nonzero in that last case.
(Assignee)

Comment 22

12 years ago
Hmm.  Ok, true.  Do you think we should append CSS_ESCAPE if we have '\' followed by zero bytes?  Or just append nothing?
We currently append nothing; changing that should probably be done in bug 228856.
(Assignee)

Comment 24

12 years ago
Created attachment 203308 [details] [diff] [review]
n-th time is the charm... ;)
Attachment #203265 - Attachment is obsolete: true
Attachment #203308 - Flags: superreview?(dbaron)
Attachment #203265 - Flags: superreview?(dbaron)
Attachment #203308 - Flags: superreview?(dbaron) → superreview+
Comment on attachment 203043 [details] [diff] [review]
Like so

>Index: xpcom/string/public/nsUTF8Utils.h
>@@ -456,23 +453,21 @@ class ConvertUTF8toUTF16

>-                  *out++ = (PRUnichar)(ucs4 >> 10) | 0xd800u;
>-                  *out++ = (PRUnichar)(ucs4 & 0x3ff) | 0xdc00u;
>+                  *out++ = (value_type)H_SURROGATE(ucs4);
>+                  *out++ = (value_type)L_SURROGATE(ucs4);

Ow, ow, ow. This should be (buffer_type). Sorry to have missed it first time round.
(Assignee)

Comment 26

12 years ago
Fixed (including smontagu's catch in comment 25).
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → FIXED
(Assignee)

Updated

11 years ago
Target Milestone: --- → mozilla1.9alpha
Flags: blocking1.8.1.17?
Flags: blocking1.8.1.17? → blocking1.8.1.17+
Blocks: 448166
(Assignee)

Comment 27

9 years ago
Created attachment 335154 [details] [diff] [review]
1.8 branch merge
Attachment #335154 - Flags: approval1.9.0.2?
(Assignee)

Updated

9 years ago
Flags: in-testsuite?
Comment on attachment 335154 [details] [diff] [review]
1.8 branch merge

Boris, I think you meant to request approval1.8.1.17. Switching flags to that.
Attachment #335154 - Flags: approval1.9.0.2? → approval1.8.1.17?
Comment on attachment 335154 [details] [diff] [review]
1.8 branch merge

Approved for 1.8.1.17, a=dveditz for release-drivers
Attachment #335154 - Flags: approval1.8.1.17? → approval1.8.1.17+
(Assignee)

Comment 30

9 years ago
> Boris, I think you meant to request approval1.8.1.17.

Yeah, sorry about that.  Good catch!
(Assignee)

Comment 31

9 years ago
Checked in on the 1.8.1 branch.
Keywords: fixed1.8.1.17
(Assignee)

Updated

9 years ago
Depends on: 316859
(In reply to comment #0)
> See bug 316338 -- this covers the CSS part.

I've tried using the test cases from there with the 2.0.0.17 candidate build and nothing crashes. Does the crash only occur in debug builds?
(Assignee)

Comment 33

9 years ago
The patch in this bug mostly fixes the crashes in those testcases, I think.
I don't get a crash on 2.0.0.16 (or 2.0.0.17) with the test case from bug 316338 on either Ubuntu or Windows XP.
Depends on: 469463

Comment 35

9 years ago
Created attachment 356697 [details] [diff] [review]
regerated patch for 1.8.0 branch

Updated

9 years ago
Attachment #356697 - Flags: approval1.8.0.next?

Updated

9 years ago
Flags: wanted1.8.0.x+

Updated

9 years ago
Attachment #356697 - Flags: approval1.8.0.next? → approval1.8.0.next+

Comment 36

9 years ago
Comment on attachment 356697 [details] [diff] [review]
regerated patch for 1.8.0 branch

a=asac for 1.8.0
You need to log in before you can comment on or make changes to this bug.