Closed Bug 820909 Opened 12 years ago Closed 12 years ago

nsCharTraits<PRUnichar>::ASCIIToLower special-cases some non-ASCII stuff

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla20

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

Details

Attachments

(1 file, 1 obsolete file)

Specifically: 241 /** 242 * Convert c to its lower-case form, but only if the lower-case form is 243 * ASCII. Otherwise leave it alone. 244 * 245 * There are only two non-ASCII Unicode characters whose lowercase 246 * equivalents are ASCII: KELVIN SIGN and LATIN CAPITAL LETTER I WITH 247 * DOT ABOVE. So it's a simple matter to handle those explicitly. 248 */ What this means is that calling LowerCaseEqualsASCII or LowerCaseEqualsLiteral on a string does not actually do ASCII-case-insensitive compares. This is somewhat unfortunate. Should we fix the string code here, or remove all use of the above functions in DOM code? :( In particular, this came up because nsAttrValue::ParseEnumValue uses LowerCaseEqualsASCII so some non-ASCII values of @type work on <input> yet again...
This also makes nsContentUtils::IsJavascriptMIMEType not do what's expected, and various other cases....
This seems like surprising behavior, which is never good. Do we have any callers that actually want this behavior? I would be surprised if that's the case.
So the only consumers of nsCharTraits<T>::ASCIIToLower are nsCharTraits<T>::compareLowerCaseToASCII() and nsCharTraits<T>::compareLowerCaseToASCIINullTerminated(). These are both only called from nsTSubstring_CharT::LowerCaseEqualsASCII (the two different overrides of it). LowerCaseEqualsASCII is called from LowerCaseEqualsLiteral and various other code. I guess I'll go through and try to make a list of what all the consumers are here and what they might really want.
Or we just change it and hope that we hear if it breaks anyone :) The turkish I seems like the biggest risk. If we take this approach then maybe we can reach out to turkish community and ask them to hammer on an aurora build extra hard.
OK, so callers of LowerCaseEqualsLiteral that pass a literal string containing 'k' in it are passing these strings: "-moz-keyframes", ".lnk", "_blank", "allow-pointer-lock", "backward", "desktop.ini", "keyboardevent", "keyevents", "keyframes", "popupblockedevents", "set-cookie", "track". I don't think any of those care about matching Kelvin sign. The same list for 'i' is: "-moz-fixed", "-moz-image-rect", "-moz-linear-gradient", "-moz-radial-gradient", "-moz-repeating-linear-gradient", "-moz-repeating-radial-gradient", "_content", "afterbegin", "allow-pointer-lock", "allow-same-origin", "allow-scripts", "allow-top-navigation", "animationevent", "application/x-www-form-urlencoded", "authorization", "beforebegin", "compositionevent", "content-disposition", "content-primary", "cubic-bezier", "cursive", "datacontainerevent", "datacontainerevents", "description", "desktop.ini", "devicemotionevent", "deviceorientationevent", "dialogheight", "dialogleft", "dialogtop", "dialogwidth", "domain", "dpi", "file", "filename", "gif", "hidden", "horizontal", "ibm862", "image/svg+xml", "import", "important", "inherit", "initial", "inline", "iso-8859-8", "javascript", "javascript1.0", "javascript1.1", "javascript1.2", "javascript1.3", "javascript1.4", "javascript1.5", "javascript1.6", "javascript1.7", "javascript1.8", "li", "line", "linear-gradient", "lineboundary", "livescript", "magic", "media", "mozaudioavailableevent", "multipart/form-data", "mutationevent", "mutationevents", "notifypaintevent", "pagetransition", "pluginspage", "portrait", "radial-gradient", "repeating-linear-gradient", "repeating-radial-gradient", "resizable", "right", "sameorigin", "sans-serif", "serif", "set-cookie", "sha1-digest", "sha1-digest-manifest", "simplegestureevent", "size", "text/plain", "timeevent", "timeevents", "title", "title*", "transitionevent", "uievent", "uievents", "url-prefix", "view-source", "window", "x-imap4-modified-utf7", "moz-nullprincipal", "expires", "application/octet-stream". Again, I don't think any of them care about matching Turkish dotted I. None of that is surprising. Next to check the LowerCaseEqualsASCII callsites.
For LowerCaseEqualsASCII, I'm ignoring all the ipc/chromium stuff because that uses its own version. I'm also ignoring all the places that use it on literal strings not containing 'k' or 'i', and also on the following literal strings: ".ico", ".lnk", "set-cookie", "set-cookie2", "application/guess-from-ext" since I think it's clear those don't want to match Kelvin sign or Turkish dotted I. That leaves the following callers: 1) nsContentUtils::IsJavascriptMIMEType. This definitely wants ASCII-CI matching. 2) nsAttrValue::ParseEnumValue. That's what led me to file this bug. 3) nsXMLHttpRequest::GetResponseHeader when comparing to kCrossOriginSafeHeaders entries. DEFINITELY wants ASCII-CI. What we have now looks like a security bug. 4) nsXMLHttpRequest::SetRequestHeader when comparing to kInvalidHeaders entries. Wants ASCII-CI. 5) nsXMLHttpRequest::SetRequestHeader when comparing to kCrossOriginSafeHeaders entries. DEFINITELY wants ASCII-CI. What we have now looks like a security bug. 6) CSSParserImpl::ParseAttributeSelector when comparing to caseInsensitiveHTMLAttribute entries. This wants ASCII-CI. 7) CSSParserImpl::TranslateDimension comparing a string to length unit names. I think this wants ASCII-CI. It only matters for "in", "vmin", and "khz", and I doubt writing the first two with dotted I or the last with Kelvin sign is desirable. 8) nsCSSProps::LookupProperty comparing the property name to our list of aliased properties. I think we probably want ASCII-CI here. 9) nsProtocolProxyService::NewProxyInfo comparing the proxy type to "socks", "socks4", "direct". I think ASCII-CI is fine here. 10) nsHtml5Portability::lowerCaseLiteralEqualsIgnoreAsciiCaseString, which wants ASCII-CI per spec and is buggy as it stands. 11) nsExternalHelperAppService::ApplyDecodingForExtension comparing extension to "zip" and type to "application/zip" or "application/x-gip" or "application/x-compress". These are fine with ASCII-CI. 12) nsExternalHelperAppService::GetTypeFromExtension comparing extension to "gif" and "xpi". I think ASCII-CI is fine here. 13) nsStaticCaseInsensitiveNameTable::Lookup. The only consumers are CSS keyword parsing, CSS property name parsing, and named colors. I think ASCII-CI is probably fine for all of these. David, do you have any objections to items 6, 7, 8, 13 on that list becoming ASCII-CI instead of the hackery we have now?
Assignee: nobody → bzbarsky
Flags: needinfo?(dbaron)
I'm slightly tempted to also factor out all the common code between the two nsCharTraits specializations.... The copy/paste is bugging me. Let me know if you think I shouldn't?
Attachment #692690 - Flags: review?(benjamin)
Whiteboard: [need review]
Comment on attachment 692690 [details] [diff] [review] Make LowerCaseEqualsASCII and LowerCaseEqualsLiteral actually do ASCII-case-insensitive matching instead of doing odd things with KELVIN SIGN and LATIN CAPITAL LETTER I WITH DOT ABOVE. s I still need to write some tests...
Attachment #692690 - Flags: superreview?(dbaron)
Ah, looks like we're already ok on comment 7 item 13, because while the matchEntry callback uses this weird matcher the hashEntry just takes out 0x20 from each char, so you get the wrong hashcode if you're not actually ASCII.
Attachment #693181 - Flags: review?(benjamin)
Attachment #692690 - Attachment is obsolete: true
Attachment #692690 - Flags: superreview?(dbaron)
Attachment #692690 - Flags: review?(benjamin)
Attachment #693181 - Flags: superreview?(dbaron)
Attachment #693181 - Flags: review?(benjamin) → review+
(In reply to Boris Zbarsky (:bz) from comment #7) > 6) CSSParserImpl::ParseAttributeSelector when comparing to caseInsensitiveHTMLAttribute > entries. This wants ASCII-CI. > 7) CSSParserImpl::TranslateDimension comparing a string to length unit names. I think > this wants ASCII-CI. It only matters for "in", "vmin", and "khz", and I doubt > writing the first two with dotted I or the last with Kelvin sign is desirable. > 8) nsCSSProps::LookupProperty comparing the property name to our list of aliased > properties. I think we probably want ASCII-CI here. > 13) nsStaticCaseInsensitiveNameTable::Lookup. The only consumers are CSS keyword > parsing, CSS property name parsing, and named colors. I think ASCII-CI is probably > fine for all of these. > > David, do you have any objections to items 6, 7, 8, 13 on that list becoming > ASCII-CI instead of the hackery we have now? Nope, this change will be correct; http://www.w3.org/TR/CSS21/syndata.html#characters describes CSS syntax as case-insensitive within the ASCII range.
Flags: needinfo?(dbaron)
Comment on attachment 693181 [details] [diff] [review] Now with some tests sr=dbaron
Attachment #693181 - Flags: superreview?(dbaron) → superreview+
Flags: in-testsuite+
Whiteboard: [need review]
Target Milestone: --- → mozilla20
OS: Mac OS X → All
Hardware: x86 → All
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Depends on: 825191
Component: String → XPCOM
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: