Last Comment Bug 752230 - control characters above 0x7e should not be allowed in unquoted url()
: control characters above 0x7e should not be allowed in unquoted url()
Status: RESOLVED FIXED
: dev-doc-complete
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla15
Assigned To: Kang-Hao (Kenny) Lu [:kennyluck]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-05 08:34 PDT by Kang-Hao (Kenny) Lu [:kennyluck]
Modified: 2012-09-28 02:20 PDT (History)
3 users (show)
ryanvm: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
test case (1.31 KB, application/xml)
2012-05-05 08:34 PDT, Kang-Hao (Kenny) Lu [:kennyluck]
no flags Details
proposed fix (7.17 KB, patch)
2012-05-05 08:44 PDT, Kang-Hao (Kenny) Lu [:kennyluck]
dbaron: review+
Details | Diff | Splinter Review
comments addressed (6.48 KB, patch)
2012-05-08 19:15 PDT, Kang-Hao (Kenny) Lu [:kennyluck]
dbaron: review+
Details | Diff | Splinter Review
interdiff (5.31 KB, text/plain)
2012-05-08 19:23 PDT, Kang-Hao (Kenny) Lu [:kennyluck]
no flags Details
final patch (6.62 KB, patch)
2012-05-14 02:52 PDT, Kang-Hao (Kenny) Lu [:kennyluck]
kennyluck: review+
Details | Diff | Splinter Review

Description Kang-Hao (Kenny) Lu [:kennyluck] 2012-05-05 08:34:35 PDT
Created attachment 621307 [details]
test case

CSS2.1 has

URI 	url\({w}{string}{w}\)
        |url\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}\)

nonascii	[^\0-\237]

where \237 is the octal representation of 0x9f and the last control character of all the u+??

This is a leftout case of bug 604179 patch 5. I have a patch already.
Comment 1 Kang-Hao (Kenny) Lu [:kennyluck] 2012-05-05 08:44:40 PDT
Created attachment 621308 [details] [diff] [review]
proposed fix

I was originally trying to improve performance of url() tokenization by expanding the lexing table, but I don't think I am experienced enough to say this is really gaining performance. Fixing this is just an accident. :p
Comment 2 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2012-05-08 00:48:03 PDT
Comment on attachment 621308 [details] [diff] [review]
proposed fix

>+#define SI   IS_IDENT|START_IDENT
>+#define XI   IS_IDENT            |IS_HEX_DIGIT

>+#define XSI  IS_IDENT|START_IDENT|IS_HEX_DIGIT

These three don't look like they're needed anymore (though maybe I missed something), so remove them.


> static const PRUint8 gLexTable[] = {
>-//                                     TAB LF      FF  CR
>-   0,  0,  0,  0,  0,  0,  0,  0,  0,  W,  W,  0,  W,  W,  0,  0,
>+//                                              TAB  LF        FF   CR
>+   0,   0,   0,   0,   0,   0,   0,   0,   0,   W,   W,   0,   W,   W,   0,   0,

I'd rather you not expand the table width, since it pushes it over 80 characters.  Just make the line with ABCDEF break the alignment instead.

>+static inline bool
>+IsURLChar(PRInt32 ch) {
>+  return ch >= 256 || (gLexTable[ch] & IS_URL_CHAR) != 0;
>+}

This needs to check ch >= 0, just like IsIdent does.  In fact, the function should look exactly like IsIdent except for checking a different bit.


r=dbaron with those changes
Comment 3 :Ms2ger (⌚ UTC+1/+2) 2012-05-08 00:51:36 PDT
Comment on attachment 621308 [details] [diff] [review]
proposed fix

Review of attachment 621308 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/style/nsCSSScanner.cpp
@@ +947,4 @@
>      }
> +
> +    ch = Read();
> +    if (ch < 0) break;

if (ch < 0) {
  break;
}
Comment 4 Kang-Hao (Kenny) Lu [:kennyluck] 2012-05-08 19:15:38 PDT
Created attachment 622259 [details] [diff] [review]
comments addressed

but I identified a flaw so I am asking for a check.
Comment 5 Kang-Hao (Kenny) Lu [:kennyluck] 2012-05-08 19:22:10 PDT
(In reply to David Baron [:dbaron] from comment #2)
> Comment on attachment 621308 [details] [diff] [review]
> proposed fix
> 
> >+#define SI   IS_IDENT|START_IDENT
> >+#define XI   IS_IDENT            |IS_HEX_DIGIT
> 
> >+#define XSI  IS_IDENT|START_IDENT|IS_HEX_DIGIT
> 
> These three don't look like they're needed anymore (though maybe I missed
> something), so remove them.

Ah, yeah, I missed that. And I realized I didn't change SI to USI for U+A0-U+FF. This is fixed in the updated patch (with a test). I am asking for a check because of this.

Also, I forgot the #undef's.
 
> > static const PRUint8 gLexTable[] = {
> >-//                                     TAB LF      FF  CR
> >-   0,  0,  0,  0,  0,  0,  0,  0,  0,  W,  W,  0,  W,  W,  0,  0,
> >+//                                              TAB  LF        FF   CR
> >+   0,   0,   0,   0,   0,   0,   0,   0,   0,   W,   W,   0,   W,   W,   0,   0,
> 
> I'd rather you not expand the table width, since it pushes it over 80
> characters.  Just make the line with ABCDEF break the alignment instead.

Done.

> >+static inline bool
> >+IsURLChar(PRInt32 ch) {
> >+  return ch >= 256 || (gLexTable[ch] & IS_URL_CHAR) != 0;
> >+}
> 
> This needs to check ch >= 0, just like IsIdent does.  In fact, the function
> should look exactly like IsIdent except for checking a different bit.

I actually examined this pretty clearly and IsURLChar is the only function where the ch >= 0 check isn't necessary. But anyway I am adding this check back for now and hopefully someone in the future can refactor the whole file somehow.


(In reply to Ms2ger from comment #3)
> Comment on attachment 621308 [details] [diff] [review]
> proposed fix
> 
> Review of attachment 621308 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: layout/style/nsCSSScanner.cpp
> @@ +947,4 @@
> >      }
> > +
> > +    ch = Read();
> > +    if (ch < 0) break;
> 
> if (ch < 0) {
>   break;
> }

Done (although various places in this file still have the former style). I should spend some time reading the style guideline.
Comment 6 Kang-Hao (Kenny) Lu [:kennyluck] 2012-05-08 19:23:45 PDT
Created attachment 622260 [details]
interdiff
Comment 7 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2012-05-14 01:24:03 PDT
Comment on attachment 622259 [details] [diff] [review]
comments addressed

r=dbaron

(Ah, the mistake was missing the U bit for U+00A0 to U+00FF.)

> // @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
>-   0,  XSI,XSI,XSI,XSI,XSI,XSI,SI, SI, SI, SI, SI, SI, SI, SI, SI,
>+U,UXSI,UXSI,UXSI,UXSI,UXSI,UXSI,USI,USI,USI,USI,USI,USI,USI,USI,USI,

Do re-align the comment to match the values for this line, though.

Sorry for the delay.
Comment 8 Kang-Hao (Kenny) Lu [:kennyluck] 2012-05-14 02:52:05 PDT
Created attachment 623620 [details] [diff] [review]
final patch

Carry over r=dbaron
Comment 9 Kang-Hao (Kenny) Lu [:kennyluck] 2012-05-14 02:55:30 PDT
(In reply to David Baron [:dbaron] from comment #7)
> > // @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
> >-   0,  XSI,XSI,XSI,XSI,XSI,XSI,SI, SI, SI, SI, SI, SI, SI, SI, SI,
> >+U,UXSI,UXSI,UXSI,UXSI,UXSI,UXSI,USI,USI,USI,USI,USI,USI,USI,USI,USI,
> 
> Do re-align the comment to match the values for this line, though.

Ah, right. Done. (I indented those two lines again and they have 71 characters. I think this is in the acceptable range.)

> Sorry for the delay.

That was no delay as compared to my experience sending patch to WebKit! I am pretty sure most people on #whatwg share similar experience interacting with Mozilla folks on Bugzilla.
Comment 10 Ryan VanderMeulen [:RyanVM] 2012-05-14 16:02:12 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/14abca4e7378
Comment 11 Ed Morley [:emorley] 2012-05-15 06:34:21 PDT
https://hg.mozilla.org/mozilla-central/rev/14abca4e7378
Comment 12 Frédéric Bourgeon [:FredB] 2012-08-29 12:15:46 PDT
The documentation has been updated with a note on the syntax as can be read here: https://developer.mozilla.org/en-US/docs/CSS/uri#Syntax

If that's enough, then this bug can be marked as dev-doc-complete.
Comment 13 Florian Scholz [:fscholz] (MDN) 2012-09-28 02:20:52 PDT
(In reply to Frédéric Bourgeon [:FredB] from comment #12)
> The documentation has been updated with a note on the syntax as can be read
> here: https://developer.mozilla.org/en-US/docs/CSS/uri#Syntax
> 
> If that's enough, then this bug can be marked as dev-doc-complete.

I don't think this will need more documentation. Marking as complete.

Note You need to log in before you can comment on or make changes to this bug.