Closed Bug 97224 Opened 23 years ago Closed 23 years ago

Update UTF-8 for Unicode 3.01 conformance

Categories

(Core :: Internationalization, defect)

x86
All
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla0.9.5

People

(Reporter: bobj, Assigned: tetsuroy)

Details

(Whiteboard: PDT+)

Attachments

(2 files)

Unicode 3.1 updated the definition of UTF-8 to help prevent security issues.
We need to modify the UTF-8 converter to  not interpret "non-shortest forms".

See http://www.unicode.org/unicode/reports/tr27/index.html:

The current conformance clause C12 in The Unicode Standard, Version 3.0 forbids
the generation of "non-shortest form" UTF-8, and forbids the interpretation of
illegal sequences, but not the interpretation of "non-shortest form". Where
software does interpret the non-shortest forms, security issues can arise. For
example:

    * Process A performs security checks, but does not check for non-shortest forms.
    * Process B accepts the byte sequence from process A, and transforms it into
UTF-16 while interpreting non-shortest forms.
    * The UTF-16 text may then contain characters that should have been filtered
out by process A.

To address this issue, the Unicode Technical Committee has modified the
definition of UTF-8 to forbid conformant implementations from interpreting
non-shortest forms for BMP characters, and clarified some of the conformance
clauses.
Nominating this for nsenterprise.
Keywords: nsenterprise
We also need to explore if there is a way to 
update the UTF-8 converter for earlier versions of 
Netscape 6.
Actually this may have been updated for Unicode 3.01:
   http://www.unicode.org/unicode/uni2errata/UTF-8_Corrigendum.html

Updated Summary from:
   Update UTF-8 for Unicode 3.1 conformance
to:
   Update UTF-8 for Unicode 3.01 conformance

Added nsbranch keyword.

momoi> We also need to explore if there is a way to 
momoi> update the UTF-8 converter for earlier versions of 
momoi> Netscape 6.

If the fix is just in the UTF-8 converter, we should be able to provide
updated an XPCOM converter module for Netscape 6.0 and 6.1.
Frank, Is that correct?
Keywords: nsbranch
Summary: Update UTF-8 for Unicode 3.1 conformance → Update UTF-8 for Unicode 3.01 conformance
In a Netscape internal bug, shanjian mentioned that
achieving efficiency with the proposed change in the UTF-8 
converter may take some additional work.
I didn't not realize unicode3.1 addressed this problem when I wrote that. I 
guess that unicode 3.1 must have a converting program in the companion CD. If 
that is the case, we can just borrow that code and implementation shouldn't take 
a lot of time. 
Isn't it just a matter of checking bit patterns as described in Table 3.1B in
    http://www.unicode.org/unicode/reports/tr27/index.html:


            Table 3.1B. Legal UTF-8 Byte Sequences
 Code Points         1st-Byte  2nd-Byte  3rd-Byte  4th-Byte
 U+0000..U+007F      00..7F      
 U+0080..U+07FF      C2..DF    80..BF     
 U+0800..U+0FFF      E0        A0..BF    80..BF   
 U+1000..U+FFFF      E1..EF    80..BF    80..BF   
 U+10000..U+3FFFF    F0        90..BF    80..BF    80..BF
 U+40000..U+FFFFF    F1..F3    80..BF    80..BF    80..BF
 U+100000..U+10FFFF  F4        80..8F    80..BF    80..BF

Table 3.1B. lists all of the byte sequences that are legal in UTF-8. A
range of byte values such as A0..BF indicates that any byte from A0 to
BF (inclusive) is legal in that position. Any byte value outside of the
ranges listed is illegal. For example, the byte sequence <C0 AF> is
illegal since C0 is not legal in the 1st Byte column. The byte sequence
<E0 9F 80> is illegal since in the row where E0 is legal as a first
byte, 9F is not legal as a second byte. The byte sequence <F4 80 83 92>
is legal, since every byte in that sequence matches a byte range in a
row of the table (the last row).

    * Cases where a trailing byte range is not 80..BF are underlined in
      the table to draw attention to them. These occur only in the
      second byte of a sequence.
The simplest implementation will be like that. But we probably want to optimize 
the code and try to achieve the same result with no or less extra performance 
cost. I think somebody in unicode society already did this for us, so we can 
just borrow the code or algorithm. 
security issue. also, easy to fix. moz0.9.4
Keywords: nsbranch+
Target Milestone: --- → mozilla0.9.4
QA Contact: andreasb → ylong
The security issue here is:

  Do we do to help *poorly* written 3rd party apps avoid parsing errors?

s/Do/What do/
since the code is poorly ident,  I also change a lot of tab and space and the
diff is a -uw .  Please ignore the the ugly looking of tab/space in the patch.
The check in will show nicely with the while file follow mozilla identification. 

jbetak/shanjian can you review this code?
Status: NEW → ASSIGNED
This get our attention because a real security hole exist somewhere
Keywords: nsbranch
The security hole in webmail has already been fixed by webmail team. The
importanance of this fix have been lowered now. 
Lowered, but still high.

If we fix this, then our client can foil any future similar exploits which
use non-shortest forms of UTF-8 for spoofing.
fully tested. Need code review. 
Two decisions we need to make:

(1) Do make new XPCOM converter modules available for 6.1?  (The solution for 6.0
    users should be to upgrade to 6.1 + the new XPCOM converter module.)
(2) If we do (1), should we create a 6.11 or do a silent upgrade?  If we do a
    silent upgrade, how do users know if they have the fix or not?  Do we
    have them check the the size and date of the converter module?


> If we fix this, then our client can foil any future similar 
> exploits which use non-shortest forms of UTF-8 for spoofing.

There are non-Netscape webmail services in which the exploit
is still problematical. (Hotmail & Yahoo, for example.) Mozilla/NS 6
users use these services and we should not be contributing to a security
problem.
/r=yokoyama
sr=waterson
I will be on vacation start from 9/6. If I don't got a approval of this by
tomorrow noon, then I will check it in after 9/17
Comment on attachment 47746 [details] [diff] [review]
Check for Unicode byte

waiting for /a
Attachment #47746 - Flags: superreview+
Attachment #47746 - Flags: review+
I'll assign this bug to myself while ftang is on vacation.
Assignee: ftang → yokoyama
Status: ASSIGNED → NEW
This has a r= and sr=. Just awaiting an a=, and FTang says it addresses a
secutiry issue (e.g. " . . .*poorly* written 3rd party apps avoid parsing
errors?"). 

Nominating for PDT+. Removing nsenterpise from keyword, as I do not believe it
is an enterprise issue.


Keywords: nsenterprise
Whiteboard: PDT
Adding PDT+.
Whiteboard: PDT → PDT+
Comment on attachment 47746 [details] [diff] [review]
Check for Unicode byte

Fully tested by ftang 
s comment on 2001-09-04 10:53
Attachment #47746 - Attachment description: patch- need more teesting. → Check for Unicode byte
Status: NEW → ASSIGNED
Whiteboard: PDT+ → PDT+, r and sr'd waiting for approval
0.9.4 is out the door.
Group: netscapeconfidential?
Target Milestone: mozilla0.9.4 → mozilla0.9.5
U got the PDT+. Pls check it in ASAP
Checked into 0_9_4_BRANCH. I'll check into the trunk once opened.
Whiteboard: PDT+, r and sr'd waiting for approval → PDT+
thanks. roy
someone said N4.x have the same problem. But I look at it, n4.x do not have the 
same problem as 6.1 and IE does. 
Attached file test cases
Frank's test case has passed on 09-20 Branch build / Windows2000.

Will verify it on Trunk build once it get checked in there.
> someone said N4.x have the same problem. But I look at it, n4.x 
> do not have the same problem as 6.1 and IE does. 

Both Takagi-san and I reproduced the problem with 4.78 at Netscape 
Webmail -- not once but several times. I also looked at Yahoo and 
Hotmail and they also had the problem. 

None of these sites exhibit the problem today with Comm 4.78 
and I gather that they fixed the problem. Is it possible that 
your test case only covers one possible way the exploit works?

checked into the trunk..
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Frank's testcase has passed on 09-24 trunk build / Win2000-CN.

Mark it as verified, please re-open if there is some other case(s) might cause 
the problem.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: