Closed Bug 507485 Opened 10 years ago Closed 10 years ago
Character range matching code is potentially dodgy
See bug 332173 comment 69. This applies only to mozilla-central's code, not to any release branches.
I'm not sure about the locale stuff, so let's just hack it in ourselves. Doing a build of this now, will report back on results when it finishes...
Attachment #391717 - Flags: review?(nelson)
Comment on attachment 391717 [details] [diff] [review] Fix I've given a bunch of thought to this problem over the last few weeks, while I was contemplating a fix for pilepicker's copy of this code. In order to review this patch (or write another one) one must know the requirements for the functions being modified here. Fundamentally, the questions are: - what is the correct definition of this function for non-ASCII character sets? - What are the requirements with respect to case-insensitivity and character ranges for non-ASCII alphabets? Answers conceivably include: a) case-insensitivity applies only to ASCII (English) alphabetic characters and only ASCII English letters and numbers may bound a range. b) case-insensitivity depends on the local user's locale. The local's definition of "isalpha" determines which characters are subject to case- insensitive comparison, and "isalnum" determines which characters may bound a range. It is assumed that isalpha and isalnum work for any UTF16 character. c) case-insensitive comparison applies to all alphabets representable in UTF16. Functions are needed to answer isalpha, isalnum, tolower and toupper for ALL alphabetic UTF16 characters, regardless of the local user's locale. I think (c) is ideal, but several people have tried to convince me that it is not technically possible without information about the locale governing the characters being compared (even if it is a different locale than the local user's locale). The patch above implements (a). User's of English will be happy with it. To them, this patch represents no change from the behavior they've always gotten. But I suspect that users of many other alphabetic character sets, including all western and eastern European character sets, will find it disappointing. They simply will not be able to use ranges of the characters in their alphabets, and will not be able to do case-insensitive file name matching on names that use their native alphabetic characters. I presume that the UTF16 version of this function was created explicitly to deal with this very issue, but maybe that's a false presumption. Choice (b) is the compromise. It's a bit more work. It involves finding and testing locale-dependent versions of the is* and to* functions. However, I'm told that Firefox already has functions that do this. Is it acceptable for this function to handle only the English alphabet? Who can authoritatively answer that question on behalf of Firefox? If the authoritative answer is yes, then I will give r+ to this patch. Otherwise...
I believe the PRUnichar-ified version was only because the filepicker APIs took in nsString or similar, whose backing storage is PRUnichar. As for why an ASCII-only pattern matcher would be used, I suspect it's because it was easy to write and a copy was already written for elsewhere. If we are to worry about proper support here, I would rather do it (as I mentioned in the previous bug) via ES5 regular expressions or some other completely new pattern-matching syntax. What's here is small and more or less a hack; I'd rather not use a little lipstick when the only comprehensive solution is a whole new pig. I'm skeptical that the locale as determined by setlocale is actually meaningful for GUI applications, generally, so I am not particularly inclined to trust isalnum and friends. Also, C locale is a shared resource -- you change it for this, you change it for every other bit of code in the process (perhaps only for a short window of time, but it's the general principle that's the problem). Last, users of this code will likely want precisely defined, controllable behavior; we don't expose a way to affect setlocale, that I know of, and being required to call it (difficult for JS-only code) prior to doing a pattern-match to get precisely defined behavior seems quite wrong to me. (Note that C++ has character-based functions templatized on the type of the character -- but furthermore which take in a locale argument to let the behavior be precisely controlled without having side effects on the rest of the application.)
I'm not proposing to ever change the locale. I'm proposing to USE the locale (or nearly equivalent "code page" on windows) already set by the user. I believe that the old code did just that. It used functions whose behavior was defined to be affected by the user's preferences for locale, IINM. So, a Russian user could get case-insensitive matching and ranges for Cyrillic letters, and a Greek user for Greek letters, etc. I think the new code probably takes that away, and I'm asking if that is acceptable. Now, I can imagine that the answer may be that there is great pressure to come up with a mitigation to the vulnerabilities quickly, even if it means some short-term loss of functionality that can be restored later. But I think it would be good for us to hear some MoCo UX person confirm that.
It might be useful to find out: - when is the filepicker code used - are the patterns user-supplied? - or do they come from within the product? - and if so, are they L10n dependent?
Comment on attachment 391717 [details] [diff] [review] Fix AFAIK, the question remains open as to whether or not it will be acceptable to the users of nsWildCard for it to only do case-insensitive comparisons on English ASCII characters, and only recognized ranges when they are bounded by English ASCII characters. But I don't think we can get a timely answer now, and I gather the heat is on to get this vulnerability fixed ASAP. So, I suggest going forward with this patch. It appears to do what it intended to do correctly. Then if people complain about problems with ranges or case sensitivity, we can revisit this. There is some doubt about whether this filepicker code is used AT ALL. It has been suggested to me that FF uses OS-native file pickers on all FF platforms supported by MoCo. So, maybe we'll never ever hear any complaints about this.
Attachment #391717 - Flags: review?(nelson) → review+
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.2a1
You need to log in before you can comment on or make changes to this bug.