Closed
Bug 820909
Opened 12 years ago
Closed 12 years ago
nsCharTraits<PRUnichar>::ASCIIToLower special-cases some non-ASCII stuff
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla20
People
(Reporter: bzbarsky, Assigned: bzbarsky)
References
Details
Attachments
(1 file, 1 obsolete file)
7.26 KB,
patch
|
benjamin
:
review+
dbaron
:
superreview+
|
Details | Diff | Splinter Review |
Specifically:
241 /**
242 * Convert c to its lower-case form, but only if the lower-case form is
243 * ASCII. Otherwise leave it alone.
244 *
245 * There are only two non-ASCII Unicode characters whose lowercase
246 * equivalents are ASCII: KELVIN SIGN and LATIN CAPITAL LETTER I WITH
247 * DOT ABOVE. So it's a simple matter to handle those explicitly.
248 */
What this means is that calling LowerCaseEqualsASCII or LowerCaseEqualsLiteral on a string does not actually do ASCII-case-insensitive compares. This is somewhat unfortunate. Should we fix the string code here, or remove all use of the above functions in DOM code? :(
In particular, this came up because nsAttrValue::ParseEnumValue uses LowerCaseEqualsASCII so some non-ASCII values of @type work on <input> yet again...
![]() |
Assignee | |
Comment 1•12 years ago
|
||
This also makes nsContentUtils::IsJavascriptMIMEType not do what's expected, and various other cases....
![]() |
Assignee | |
Comment 2•12 years ago
|
||
This seems like surprising behavior, which is never good.
Do we have any callers that actually want this behavior? I would be surprised if that's the case.
![]() |
Assignee | |
Comment 4•12 years ago
|
||
So the only consumers of nsCharTraits<T>::ASCIIToLower are nsCharTraits<T>::compareLowerCaseToASCII() and nsCharTraits<T>::compareLowerCaseToASCIINullTerminated().
These are both only called from nsTSubstring_CharT::LowerCaseEqualsASCII (the two different overrides of it).
LowerCaseEqualsASCII is called from LowerCaseEqualsLiteral and various other code.
I guess I'll go through and try to make a list of what all the consumers are here and what they might really want.
Or we just change it and hope that we hear if it breaks anyone :)
The turkish I seems like the biggest risk. If we take this approach then maybe we can reach out to turkish community and ask them to hammer on an aurora build extra hard.
![]() |
Assignee | |
Comment 6•12 years ago
|
||
OK, so callers of LowerCaseEqualsLiteral that pass a literal string containing 'k' in it are passing these strings: "-moz-keyframes", ".lnk", "_blank", "allow-pointer-lock", "backward", "desktop.ini", "keyboardevent", "keyevents", "keyframes", "popupblockedevents", "set-cookie", "track".
I don't think any of those care about matching Kelvin sign.
The same list for 'i' is: "-moz-fixed", "-moz-image-rect", "-moz-linear-gradient", "-moz-radial-gradient", "-moz-repeating-linear-gradient", "-moz-repeating-radial-gradient", "_content", "afterbegin", "allow-pointer-lock", "allow-same-origin", "allow-scripts", "allow-top-navigation", "animationevent", "application/x-www-form-urlencoded", "authorization", "beforebegin", "compositionevent", "content-disposition", "content-primary", "cubic-bezier", "cursive", "datacontainerevent", "datacontainerevents", "description", "desktop.ini", "devicemotionevent", "deviceorientationevent", "dialogheight", "dialogleft", "dialogtop", "dialogwidth", "domain", "dpi", "file", "filename", "gif", "hidden", "horizontal", "ibm862", "image/svg+xml", "import", "important", "inherit", "initial", "inline", "iso-8859-8", "javascript", "javascript1.0", "javascript1.1", "javascript1.2", "javascript1.3", "javascript1.4", "javascript1.5", "javascript1.6", "javascript1.7", "javascript1.8", "li", "line", "linear-gradient", "lineboundary", "livescript", "magic", "media", "mozaudioavailableevent", "multipart/form-data", "mutationevent", "mutationevents", "notifypaintevent", "pagetransition", "pluginspage", "portrait", "radial-gradient", "repeating-linear-gradient", "repeating-radial-gradient", "resizable", "right", "sameorigin", "sans-serif", "serif", "set-cookie", "sha1-digest", "sha1-digest-manifest", "simplegestureevent", "size", "text/plain", "timeevent", "timeevents", "title", "title*", "transitionevent", "uievent", "uievents", "url-prefix", "view-source", "window", "x-imap4-modified-utf7", "moz-nullprincipal", "expires", "application/octet-stream".
Again, I don't think any of them care about matching Turkish dotted I.
None of that is surprising. Next to check the LowerCaseEqualsASCII callsites.
![]() |
Assignee | |
Comment 7•12 years ago
|
||
For LowerCaseEqualsASCII, I'm ignoring all the ipc/chromium stuff because that uses its own version.
I'm also ignoring all the places that use it on literal strings not containing 'k' or 'i', and also on the following literal strings: ".ico", ".lnk", "set-cookie", "set-cookie2", "application/guess-from-ext" since I think it's clear those don't want to match Kelvin sign or Turkish dotted I.
That leaves the following callers:
1) nsContentUtils::IsJavascriptMIMEType. This definitely wants ASCII-CI matching.
2) nsAttrValue::ParseEnumValue. That's what led me to file this bug.
3) nsXMLHttpRequest::GetResponseHeader when comparing to kCrossOriginSafeHeaders
entries. DEFINITELY wants ASCII-CI. What we have now looks like a security bug.
4) nsXMLHttpRequest::SetRequestHeader when comparing to kInvalidHeaders entries. Wants
ASCII-CI.
5) nsXMLHttpRequest::SetRequestHeader when comparing to kCrossOriginSafeHeaders
entries. DEFINITELY wants ASCII-CI. What we have now looks like a security bug.
6) CSSParserImpl::ParseAttributeSelector when comparing to caseInsensitiveHTMLAttribute
entries. This wants ASCII-CI.
7) CSSParserImpl::TranslateDimension comparing a string to length unit names. I think
this wants ASCII-CI. It only matters for "in", "vmin", and "khz", and I doubt
writing the first two with dotted I or the last with Kelvin sign is desirable.
8) nsCSSProps::LookupProperty comparing the property name to our list of aliased
properties. I think we probably want ASCII-CI here.
9) nsProtocolProxyService::NewProxyInfo comparing the proxy type to "socks", "socks4",
"direct". I think ASCII-CI is fine here.
10) nsHtml5Portability::lowerCaseLiteralEqualsIgnoreAsciiCaseString, which wants
ASCII-CI per spec and is buggy as it stands.
11) nsExternalHelperAppService::ApplyDecodingForExtension comparing extension to "zip"
and type to "application/zip" or "application/x-gip" or "application/x-compress".
These are fine with ASCII-CI.
12) nsExternalHelperAppService::GetTypeFromExtension comparing extension to "gif" and
"xpi". I think ASCII-CI is fine here.
13) nsStaticCaseInsensitiveNameTable::Lookup. The only consumers are CSS keyword
parsing, CSS property name parsing, and named colors. I think ASCII-CI is probably
fine for all of these.
David, do you have any objections to items 6, 7, 8, 13 on that list becoming ASCII-CI instead of the hackery we have now?
Assignee: nobody → bzbarsky
Flags: needinfo?(dbaron)
![]() |
Assignee | |
Comment 8•12 years ago
|
||
I'm slightly tempted to also factor out all the common code between the two nsCharTraits specializations.... The copy/paste is bugging me. Let me know if you think I shouldn't?
Attachment #692690 -
Flags: review?(benjamin)
![]() |
Assignee | |
Updated•12 years ago
|
Whiteboard: [need review]
![]() |
Assignee | |
Comment 9•12 years ago
|
||
Comment on attachment 692690 [details] [diff] [review]
Make LowerCaseEqualsASCII and LowerCaseEqualsLiteral actually do ASCII-case-insensitive matching instead of doing odd things with KELVIN SIGN and LATIN CAPITAL LETTER I WITH DOT ABOVE. s
I still need to write some tests...
Attachment #692690 -
Flags: superreview?(dbaron)
![]() |
Assignee | |
Comment 10•12 years ago
|
||
Ah, looks like we're already ok on comment 7 item 13, because while the matchEntry callback uses this weird matcher the hashEntry just takes out 0x20 from each char, so you get the wrong hashcode if you're not actually ASCII.
![]() |
Assignee | |
Comment 11•12 years ago
|
||
Attachment #693181 -
Flags: review?(benjamin)
![]() |
Assignee | |
Updated•12 years ago
|
Attachment #692690 -
Attachment is obsolete: true
Attachment #692690 -
Flags: superreview?(dbaron)
Attachment #692690 -
Flags: review?(benjamin)
![]() |
Assignee | |
Updated•12 years ago
|
Attachment #693181 -
Flags: superreview?(dbaron)
Updated•12 years ago
|
Attachment #693181 -
Flags: review?(benjamin) → review+
(In reply to Boris Zbarsky (:bz) from comment #7)
> 6) CSSParserImpl::ParseAttributeSelector when comparing to caseInsensitiveHTMLAttribute
> entries. This wants ASCII-CI.
> 7) CSSParserImpl::TranslateDimension comparing a string to length unit names. I think
> this wants ASCII-CI. It only matters for "in", "vmin", and "khz", and I doubt
> writing the first two with dotted I or the last with Kelvin sign is desirable.
> 8) nsCSSProps::LookupProperty comparing the property name to our list of aliased
> properties. I think we probably want ASCII-CI here.
> 13) nsStaticCaseInsensitiveNameTable::Lookup. The only consumers are CSS keyword
> parsing, CSS property name parsing, and named colors. I think ASCII-CI is probably
> fine for all of these.
>
> David, do you have any objections to items 6, 7, 8, 13 on that list becoming
> ASCII-CI instead of the hackery we have now?
Nope, this change will be correct; http://www.w3.org/TR/CSS21/syndata.html#characters describes CSS syntax as case-insensitive within the ASCII range.
Flags: needinfo?(dbaron)
Comment on attachment 693181 [details] [diff] [review]
Now with some tests
sr=dbaron
Attachment #693181 -
Flags: superreview?(dbaron) → superreview+
![]() |
Assignee | |
Comment 14•12 years ago
|
||
Perfect, thanks.
https://hg.mozilla.org/integration/mozilla-inbound/rev/e364512a658f
Flags: in-testsuite+
Whiteboard: [need review]
Target Milestone: --- → mozilla20
OS: Mac OS X → All
Hardware: x86 → All
Comment 15•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•4 years ago
|
Component: String → XPCOM
You need to log in
before you can comment on or make changes to this bug.
Description
•