97224 - Update UTF-8 for Unicode 3.01 conformance

Reporter

Description

•

23 years ago

Unicode 3.1 updated the definition of UTF-8 to help prevent security issues.
We need to modify the UTF-8 converter to  not interpret "non-shortest forms".

See http://www.unicode.org/unicode/reports/tr27/index.html:

The current conformance clause C12 in The Unicode Standard, Version 3.0 forbids
the generation of "non-shortest form" UTF-8, and forbids the interpretation of
illegal sequences, but not the interpretation of "non-shortest form". Where
software does interpret the non-shortest forms, security issues can arise. For
example:

    * Process A performs security checks, but does not check for non-shortest forms.
    * Process B accepts the byte sequence from process A, and transforms it into
UTF-16 while interpreting non-shortest forms.
    * The UTF-16 text may then contain characters that should have been filtered
out by process A.

To address this issue, the Unicode Technical Committee has modified the
definition of UTF-8 to forbid conformant implementations from interpreting
non-shortest forms for BMP characters, and clarified some of the conformance
clauses.

Katsuhiko Momoi

Comment 1

•

23 years ago

Nominating this for nsenterprise.

Keywords: nsenterprise

Katsuhiko Momoi

Comment 2

•

23 years ago

We also need to explore if there is a way to 
update the UTF-8 converter for earlier versions of 
Netscape 6.

bobj

Reporter

Comment 3

•

23 years ago

Actually this may have been updated for Unicode 3.01:
   http://www.unicode.org/unicode/uni2errata/UTF-8_Corrigendum.html

Updated Summary from:
   Update UTF-8 for Unicode 3.1 conformance
to:
   Update UTF-8 for Unicode 3.01 conformance

Added nsbranch keyword.

momoi> We also need to explore if there is a way to 
momoi> update the UTF-8 converter for earlier versions of 
momoi> Netscape 6.

If the fix is just in the UTF-8 converter, we should be able to provide
updated an XPCOM converter module for Netscape 6.0 and 6.1.
Frank, Is that correct?

Keywords: nsbranch

Summary: Update UTF-8 for Unicode 3.1 conformance → Update UTF-8 for Unicode 3.01 conformance

Katsuhiko Momoi

Comment 4

•

23 years ago

In a Netscape internal bug, shanjian mentioned that
achieving efficiency with the proposed change in the UTF-8 
converter may take some additional work.

Shanjian Li

Comment 5

•

23 years ago

I didn't not realize unicode3.1 addressed this problem when I wrote that. I 
guess that unicode 3.1 must have a converting program in the companion CD. If 
that is the case, we can just borrow that code and implementation shouldn't take 
a lot of time.

bobj

Reporter

Comment 6

•

23 years ago

Isn't it just a matter of checking bit patterns as described in Table 3.1B in
    http://www.unicode.org/unicode/reports/tr27/index.html:


            Table 3.1B. Legal UTF-8 Byte Sequences
 Code Points         1st-Byte  2nd-Byte  3rd-Byte  4th-Byte
 U+0000..U+007F      00..7F      
 U+0080..U+07FF      C2..DF    80..BF     
 U+0800..U+0FFF      E0        A0..BF    80..BF   
 U+1000..U+FFFF      E1..EF    80..BF    80..BF   
 U+10000..U+3FFFF    F0        90..BF    80..BF    80..BF
 U+40000..U+FFFFF    F1..F3    80..BF    80..BF    80..BF
 U+100000..U+10FFFF  F4        80..8F    80..BF    80..BF

Table 3.1B. lists all of the byte sequences that are legal in UTF-8. A
range of byte values such as A0..BF indicates that any byte from A0 to
BF (inclusive) is legal in that position. Any byte value outside of the
ranges listed is illegal. For example, the byte sequence <C0 AF> is
illegal since C0 is not legal in the 1st Byte column. The byte sequence
<E0 9F 80> is illegal since in the row where E0 is legal as a first
byte, 9F is not legal as a second byte. The byte sequence <F4 80 83 92>
is legal, since every byte in that sequence matches a byte range in a
row of the table (the last row).

    * Cases where a trailing byte range is not 80..BF are underlined in
      the table to draw attention to them. These occur only in the
      second byte of a sequence.

Shanjian Li

Comment 7

•

23 years ago

The simplest implementation will be like that. But we probably want to optimize 
the code and try to achieve the same result with no or less extra performance 
cost. I think somebody in unicode society already did this for us, so we can 
just borrow the code or algorithm.

Frank Tang

Comment 8

•

23 years ago

security issue. also, easy to fix. moz0.9.4

Keywords: nsbranch+

Target Milestone: --- → mozilla0.9.4

Andreas Becker

Updated

•

23 years ago

QA Contact: andreasb → ylong

kill this account

Comment 9

•

23 years ago

The security issue here is:

  Do we do to help *poorly* written 3rd party apps avoid parsing errors?

kill this account

Comment 10

•

23 years ago

s/Do/What do/

Frank Tang

Comment 11

•

23 years ago

Attached patch Check for Unicode byte — Details — Splinter Review

Frank Tang

Comment 12

•

23 years ago

since the code is poorly ident,  I also change a lot of tab and space and the
diff is a -uw .  Please ignore the the ugly looking of tab/space in the patch.
The check in will show nicely with the while file follow mozilla identification. 

jbetak/shanjian can you review this code?

Status: NEW → ASSIGNED

Frank Tang

Comment 13

•

23 years ago

This get our attention because a real security hole exist somewhere

msanz

Updated

•

23 years ago

Keywords: nsbranch

Frank Tang

Comment 14

•

23 years ago

The security hole in webmail has already been fixed by webmail team. The
importanance of this fix have been lowered now.

bobj

Reporter

Comment 15

•

23 years ago

Lowered, but still high.

If we fix this, then our client can foil any future similar exploits which
use non-shortest forms of UTF-8 for spoofing.

Frank Tang

Comment 16

•

23 years ago

fully tested. Need code review.

bobj

Reporter

Comment 17

•

23 years ago

Two decisions we need to make:

(1) Do make new XPCOM converter modules available for 6.1?  (The solution for 6.0
    users should be to upgrade to 6.1 + the new XPCOM converter module.)
(2) If we do (1), should we create a 6.11 or do a silent upgrade?  If we do a
    silent upgrade, how do users know if they have the fix or not?  Do we
    have them check the the size and date of the converter module?

Katsuhiko Momoi

Comment 18

•

23 years ago

> If we fix this, then our client can foil any future similar 
> exploits which use non-shortest forms of UTF-8 for spoofing.

There are non-Netscape webmail services in which the exploit
is still problematical. (Hotmail & Yahoo, for example.) Mozilla/NS 6
users use these services and we should not be contributing to a security
problem.

Roy Yokoyama

Assignee

Comment 19

•

23 years ago

/r=yokoyama

Chris Waterson

Comment 20

•

23 years ago

sr=waterson

Frank Tang

Comment 21

•

23 years ago

I will be on vacation start from 9/6. If I don't got a approval of this by
tomorrow noon, then I will check it in after 9/17

Roy Yokoyama

Assignee

Comment 22

•

23 years ago

Comment on attachment 47746 [details] [diff] [review]
Check for Unicode byte

waiting for /a

Attachment #47746 - Flags: superreview+

Attachment #47746 - Flags: review+

Roy Yokoyama

Assignee

Comment 23

•

23 years ago

I'll assign this bug to myself while ftang is on vacation.

Assignee: ftang → yokoyama

Status: ASSIGNED → NEW

Jaime Rodriguez, Jr.

Comment 24

•

23 years ago

This has a r= and sr=. Just awaiting an a=, and FTang says it addresses a
secutiry issue (e.g. " . . .*poorly* written 3rd party apps avoid parsing
errors?"). 

Nominating for PDT+. Removing nsenterpise from keyword, as I do not believe it
is an enterprise issue.

Keywords: nsenterprise

Whiteboard: PDT

grega

Comment 25

•

23 years ago

Adding PDT+.

Whiteboard: PDT → PDT+

Roy Yokoyama

Assignee

Comment 26

•

23 years ago

Comment on attachment 47746 [details] [diff] [review]
Check for Unicode byte

Fully tested by ftang 
s comment on 2001-09-04 10:53

Attachment #47746 - Attachment description: patch- need more teesting. → Check for Unicode byte

Roy Yokoyama

Assignee

Updated

•

23 years ago

Status: NEW → ASSIGNED

bobj

Reporter

Updated

•

23 years ago

Whiteboard: PDT+ → PDT+, r and sr'd waiting for approval

Asa Dotzler [:asa]

Comment 27

•

23 years ago

0.9.4 is out the door.

Group: netscapeconfidential?

Target Milestone: mozilla0.9.4 → mozilla0.9.5

Jaime Rodriguez, Jr.

Comment 28

•

23 years ago

U got the PDT+. Pls check it in ASAP

Roy Yokoyama

Assignee

Comment 29

•

23 years ago

Checked into 0_9_4_BRANCH. I'll check into the trunk once opened.

Whiteboard: PDT+, r and sr'd waiting for approval → PDT+

Frank Tang

Comment 30

•

23 years ago

thanks. roy

Frank Tang

Comment 31

•

23 years ago

someone said N4.x have the same problem. But I look at it, n4.x do not have the 
same problem as 6.1 and IE does.

Frank Tang

Comment 32

•

23 years ago

Attached file test cases — Details

Yuying Long

Comment 33

•

23 years ago

Frank's test case has passed on 09-20 Branch build / Windows2000.

Will verify it on Trunk build once it get checked in there.

Katsuhiko Momoi

Comment 34

•

23 years ago

> someone said N4.x have the same problem. But I look at it, n4.x 
> do not have the same problem as 6.1 and IE does. 

Both Takagi-san and I reproduced the problem with 4.78 at Netscape 
Webmail -- not once but several times. I also looked at Yahoo and 
Hotmail and they also had the problem. 

None of these sites exhibit the problem today with Comm 4.78 
and I gather that they fixed the problem. Is it possible that 
your test case only covers one possible way the exploit works?

Roy Yokoyama

Assignee

Comment 35

•

23 years ago

checked into the trunk..

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Yuying Long

Comment 36

•

23 years ago

Frank's testcase has passed on 09-24 trunk build / Win2000-CN.

Mark it as verified, please re-open if there is some other case(s) might cause 
the problem.

Status: RESOLVED → VERIFIED

Check for Unicode byte 23 years ago Frank Tang 4.08 KB, patch	tetsuroy : review+ tetsuroy : superreview+	Details \| Diff \| Splinter Review
test cases 23 years ago Frank Tang 325 bytes, text/html		Details