Closed Bug 507485 Opened 15 years ago Closed 15 years ago

Character range matching code is potentially dodgy

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla1.9.2a1

Tracking Flags:

Tracking

Status

status1.9.1

---

unaffected

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Attachments

(1 file)

Fix 15 years ago Jeff Walden [:Waldo] 3.42 KB, patch	nelson : review+	Details \| Diff \| Splinter Review

Jeff Walden [:Waldo]

Assignee

Description

•

15 years ago

See bug 332173 comment 69.  This applies only to mozilla-central's code, not to any release branches.

Jeff Walden [:Waldo]

Assignee

Comment 1

•

15 years ago

Attached patch Fix — Details — Splinter Review

I'm not sure about the locale stuff, so let's just hack it in ourselves.  Doing a build of this now, will report back on results when it finishes...

Attachment #391717 - Flags: review?(nelson)

Nelson Bolyard (seldom reads bugmail)

Comment 2

•

15 years ago

Comment on attachment 391717 [details] [diff] [review]
Fix

I've given a bunch of thought to this problem over the last few weeks, 
while I was contemplating a fix for pilepicker's copy of this code.
In order to review this patch (or write another one) one must know the 
requirements for the functions being modified here.

Fundamentally, the questions are: 
- what is the correct definition of this function for non-ASCII character sets?   
- What are the requirements with respect to case-insensitivity and character 
ranges for non-ASCII alphabets?  

Answers conceivably include:
a) case-insensitivity applies only to ASCII (English) alphabetic characters
and only ASCII English letters and numbers may bound a range.

b) case-insensitivity depends on the local user's locale. The local's 
definition of "isalpha" determines which characters are subject to case-
insensitive comparison, and "isalnum" determines which characters may 
bound a range.  It is assumed that isalpha and isalnum work for any 
UTF16 character.  

c) case-insensitive comparison applies to all alphabets representable in 
UTF16.  Functions are needed to answer isalpha, isalnum, tolower and toupper
for ALL alphabetic UTF16 characters, regardless of the local user's locale.

I think (c) is ideal, but several people have tried to convince me that it
is not technically possible without information about the locale governing
the characters being compared (even if it is a different locale than the 
local user's locale).  

The patch above implements (a).  User's of English will be happy with it.
To them, this patch represents no change from the behavior they've always 
gotten.  But I suspect that users of many other alphabetic character sets, 
including all western and eastern European character sets, will find it 
disappointing.  They simply will not be able to use ranges of the characters 
in their alphabets, and will not be able to do case-insensitive file name 
matching on names that use their native alphabetic characters.  I presume 
that the UTF16 version of this function was created explicitly to deal with 
this very issue, but maybe that's a false presumption.

Choice (b) is the compromise.  It's a bit more work.  It involves finding 
and testing locale-dependent versions of the is* and to* functions. 
However, I'm told that Firefox already has functions that do this.

Is it acceptable for this function to handle only the English alphabet?  
Who can authoritatively answer that question on behalf of Firefox?  

If the authoritative answer is yes, then I will give r+ to this patch.
Otherwise...

Nelson Bolyard (seldom reads bugmail)

Updated

•

15 years ago

Blocks: BH-2009

Jeff Walden [:Waldo]

Assignee

Comment 3

•

15 years ago

I believe the PRUnichar-ified version was only because the filepicker APIs took in nsString or similar, whose backing storage is PRUnichar.  As for why an ASCII-only pattern matcher would be used, I suspect it's because it was easy to write and a copy was already written for elsewhere.  If we are to worry about proper support here, I would rather do it (as I mentioned in the previous bug) via ES5 regular expressions or some other completely new pattern-matching syntax.  What's here is small and more or less a hack; I'd rather not use a little lipstick when the only comprehensive solution is a whole new pig.

I'm skeptical that the locale as determined by setlocale is actually meaningful for GUI applications, generally, so I am not particularly inclined to trust isalnum and friends.  Also, C locale is a shared resource -- you change it for this, you change it for every other bit of code in the process (perhaps only for a short window of time, but it's the general principle that's the problem).

Last, users of this code will likely want precisely defined, controllable behavior; we don't expose a way to affect setlocale, that I know of, and being required to call it (difficult for JS-only code) prior to doing a pattern-match to get precisely defined behavior seems quite wrong to me.  (Note that C++ has character-based functions templatized on the type of the character -- but furthermore which take in a locale argument to let the behavior be precisely controlled without having side effects on the rest of the application.)

Nelson Bolyard (seldom reads bugmail)

Comment 4

•

15 years ago

I'm not proposing to ever change the locale.  I'm proposing to USE the locale
(or nearly equivalent "code page" on windows) already set by the user.  

I believe that the old code did just that.  It used functions whose behavior
was defined to be affected by the user's preferences for locale, IINM.  
So, a Russian user could get case-insensitive matching and ranges for 
Cyrillic letters, and a Greek user for Greek letters, etc.  

I think the new code probably takes that away, and I'm asking if that is
acceptable.  Now, I can imagine that the answer may be that there is great
pressure to come up with a mitigation to the vulnerabilities quickly, even if
it means some short-term loss of functionality that can be restored later. 
But I think it would be good for us to hear some MoCo UX person confirm that.

Nelson Bolyard (seldom reads bugmail)

Comment 5

•

15 years ago

It might be useful to find out:
- when is the filepicker code used
- are the patterns user-supplied? 
- or do they come from within the product? 
- and if so, are they L10n dependent?

Nelson Bolyard (seldom reads bugmail)

Comment 6

•

15 years ago

Comment on attachment 391717 [details] [diff] [review]
Fix

AFAIK, the question remains open as to whether or not it will be acceptable 
to the users of nsWildCard for it to only do case-insensitive comparisons on 
English ASCII characters, and only recognized ranges when they are bounded 
by English ASCII characters.  

But I don't think we can get a timely answer now, and I gather the heat is
on to get this vulnerability fixed ASAP.  So, I suggest going forward with 
this patch.  It appears to do what it intended to do correctly.  Then if
people complain about problems with ranges or case sensitivity, we can 
revisit this.  

There is some doubt about whether this filepicker code is used AT ALL.
It has been suggested to me that FF uses OS-native file pickers on all 
FF platforms supported by MoCo.  So, maybe we'll never ever hear any 
complaints about this.

Attachment #391717 - Flags: review?(nelson) → review+

Jeff Walden [:Waldo]

Assignee

Comment 7

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/69f5bc3db4f4

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla1.9.2a1

Daniel Veditz [:dveditz]

Updated

•

15 years ago

status1.9.1: --- → unaffected

Daniel Veditz [:dveditz]

Updated

•

14 years ago

Group: core-security

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Character range matching code is potentially dodgy

Categories

(Core :: XPCOM, defect)

Tracking

()

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated

Attachment

General

Description

File Name

Content Type